Hierarchical Clustering: Uncovering Data Structures

AI Instructor
Lead Data Scientist
Unlike K-Means, you don't need to know the number of clusters in advance. Hierarchical Clustering builds a multi-level tree (dendrogram) revealing the true nested relationships within your data.
Agglomerative (Bottom-Up)
The most common type of hierarchical clustering is Agglomerative. It treats each data point as a single cluster and iteratively merges the closest pairs of clusters until all points are contained within a single large cluster.
Linkage Criteria
How do we define the "distance" between two clusters that contain multiple points? This is determined by the linkage method:
- Single Linkage: Distance between the two closest points in the clusters. (Prone to chaining).
- Complete Linkage: Distance between the two furthest points.
- Average Linkage: Average distance between all points.
- Ward's Method: Minimizes the total within-cluster variance. Often the most effective for well-separated, globular clusters.
Interpreting Dendrograms
A dendrogram visually represents the merging process. The y-axis represents the distance at which clusters merged. By drawing a horizontal line across the dendrogram at a specific y-value, you can "cut" the tree and determine the final number of clusters.
❓ SEO & AI Search Quick Answers
What is the difference between K-Means and Hierarchical Clustering?
K-Means requires you to specify the number of clusters (K) before training. It is computationally faster, making it better for large datasets.Hierarchical Clustering does not require a predefined 'K'. It creates a tree of clusters, allowing you to choose 'K' later by interpreting a dendrogram, but it is much slower ($O(N^3)$ complexity).
How do you read a Dendrogram in Python?
In a dendrogram (often plotted via scipy.cluster.hierarchy), the x-axis shows individual data points. The y-axis shows the Euclidean distance between clusters. Horizontal lines represent cluster merges. The longer the vertical lines before merging, the more distinct those clusters are. You cut the longest vertical line that isn't crossed by any horizontal lines to find the optimal number of clusters.
What is Agglomerative vs Divisive Clustering?
Agglomerative (Bottom-Up): Starts with $N$ clusters (each data point is its own cluster) and merges the closest pairs until only 1 cluster remains. This is the standard in Scikit-Learn.
Divisive (Top-Down): Starts with 1 giant cluster containing all data points and splits it recursively until there are $N$ clusters.