Member-only story

Implementing K-Means Clustering in R

btd
3 min readNov 23, 2023

--

K-Means clustering is a popular unsupervised machine learning algorithm used for partitioning a dataset into K distinct, non-overlapping subsets (clusters). In R, the kmeans function is commonly used to implement K-Means clustering. Here's a step-by-step guide with example code:

Step 1: Generate Sample Data

Let’s start by generating a sample dataset for clustering.

set.seed(123)  # For reproducibility
# Creating a sample dataset with two features (x, y)
data <- data.frame(
x = c(rnorm(50, mean = 0), rnorm(50, mean = 5)),
y = c(rnorm(50, mean = 0), rnorm(50, mean = 5))
)

Step 2: Visualize the Data

Before applying K-Means, it’s often useful to visualize the data.

plot(data$x, data$y, col = "blue", pch = 16, main = "Sample Data for K-Means Clustering")

Step 3: Implement K-Means Clustering

Now, let’s use the kmeans function to perform clustering. We'll start with K=2 clusters.

# Perform K-Means clustering with K=2
kmeans_result <- kmeans(data, centers = 2, nstart = 20)

# Add cluster assignments to the original dataset
data$cluster <- kmeans_result$cluster

# Visualize the clusters
plot(data$x, data$y, col =…

--

--

btd
btd

No responses yet