Member-only story
K-Means clustering is a popular unsupervised machine learning algorithm used for partitioning a dataset into K distinct, non-overlapping subsets (clusters). In R, the kmeans
function is commonly used to implement K-Means clustering. Here's a step-by-step guide with example code:
Step 1: Generate Sample Data
Let’s start by generating a sample dataset for clustering.
set.seed(123) # For reproducibility
# Creating a sample dataset with two features (x, y)
data <- data.frame(
x = c(rnorm(50, mean = 0), rnorm(50, mean = 5)),
y = c(rnorm(50, mean = 0), rnorm(50, mean = 5))
)
Step 2: Visualize the Data
Before applying K-Means, it’s often useful to visualize the data.
plot(data$x, data$y, col = "blue", pch = 16, main = "Sample Data for K-Means Clustering")
Step 3: Implement K-Means Clustering
Now, let’s use the kmeans
function to perform clustering. We'll start with K=2 clusters.
# Perform K-Means clustering with K=2
kmeans_result <- kmeans(data, centers = 2, nstart = 20)
# Add cluster assignments to the original dataset
data$cluster <- kmeans_result$cluster
# Visualize the clusters
plot(data$x, data$y, col =…