K-Means is the simplest and most popular clustering algorithm. It uses iterative mathematics to find the gravity centers of your data groups.
1Centroid Clustering
K-Means is the workhorse of unsupervised learning. Its goal is simple: divide a massive, unlabeled dataset into 'K' distinct groups.
It does this by finding a 'Centroid'—a mathematical center point—for each group. Every data point in your dataset is then assigned to whichever centroid is physically closest to it, effectively carving your data into distinct territories.
from sklearn.cluster import KMeans
# Grouping customers into 3 segments
model = KMeans(n_clusters=3, random_state=42)2Convergence
K-Means doesn't know where the groups are immediately. It starts by randomly dropping 'K' centroids onto the data.
Then, the algorithm iterates. First, it assigns every point to the nearest random centroid. Second, it calculates the exact middle (the mean) of all the points assigned to a centroid and moves the centroid to that new middle. It repeats this assign-and-move process until the centroids stop moving—a state called 'Convergence'.
model.fit(X)
# The centroids move iteratively
# until they find the true center of the clusters.3Choosing K: The Elbow Method
The biggest challenge in K-Means is that 'K' is a hyperparameter—you have to tell the algorithm how many clusters to look for. If you pick the wrong number, the clusters won't make real-world sense.
To solve this, we use the 'Elbow Method'. We run K-Means multiple times (e.g., K=1 through 10) and calculate the 'Inertia'—the total distance between all points and their centroids. We plot this on a graph and look for the 'Elbow' bend, which indicates the optimal number of clusters where adding more stops being helpful.
# Finding the optimal K
k_values = range(1, 10)
inertias = [KMeans(n=k).fit(X).inertia_ for k in k_values]4Standard Scaler: Scaling Priority
K-Means is entirely based on distance calculations (specifically, Euclidean distance). Because of this, it is violently sensitive to the scale of your features.
If you cluster people by 'Age' (range 0-100) and 'Salary' (range $0-$100,000), the massive numbers in the Salary column will completely overpower the Age column in the math. You must always scale your features so that every column has equal weight before running K-Means.
from sklearn.preprocessing import StandardScaler
X_scaled = StandardScaler().fit_transform(X)
# Never cluster without scaling first!5Spherical Only: Shape Assumptions
K-Means is incredibly fast and interpretable, but it makes a massive mathematical assumption: it assumes all clusters are spherical and roughly the same size.
If your real-world data forms long, snake-like patterns, or if one cluster is huge while another is tiny, K-Means will fail. It will just blindly cut the space into circles. For complex, non-spherical shapes, you need density-based algorithms like DBSCAN.
# Assumption: Data is grouped in circles
# If data is shaped like moons or rings:
# Use DBSCAN or Spectral Clustering instead.