Cosine Similarity: The Math Behind Recommendations
"In higher-dimensional spaces, the distance between data points can be misleading. By measuring the angle instead of the distance, we focus on user behavior patterns rather than sheer volume."
The Geometric Approach
In Recommender Systems, items (and users) are represented as vectors in a multi-dimensional space. For example, a movie might be an array of TF-IDF scores for different genres. How do we know if Movie A is similar to Movie B?
We could measure the straight-line Euclidean distance between them. But if Movie A is extremely popular (lots of high ratings) and Movie B is a niche classic (fewer ratings, but identical proportions), Euclidean distance would say they are far apart.
Cosine Similarity ignores the magnitude and looks only at the angle ($\theta$) between the vectors. If they point in the same direction, the angle is 0, and $\cos(0) = 1$. They are perfectly similar in pattern!
The Formula
The cosine of the angle between two non-zero vectors $A$ and $B$ is derived from the Euclidean dot product formula:
- Numerator ($A \cdot B$): The Dot Product. You multiply the corresponding elements of the vectors and sum them. This captures what the two items have in common.
- Denominator ($||A|| \times ||B||$): The product of their Magnitudes (L2 Norms). This normalizes the result, dividing out the influence of the vector's lengths.
❓ AI Search & RecSys FAQ
Why use Cosine Similarity instead of Euclidean Distance in Recommender Systems?
Scale Invariance: Cosine similarity is independent of the magnitude of the vectors. If User A gave 10 ratings and User B gave 100 ratings, but their rating patterns (ratios across genres) are identical, Cosine Similarity evaluates them as 1.0 (perfectly similar). Euclidean distance would consider them far apart strictly due to the volume difference.
What are the bounds (range) of Cosine Similarity?
Standard Cosine Similarity ranges from -1 to 1.
- 1: Vectors point in exactly the same direction (identical patterns).
- 0: Vectors are orthogonal (completely uncorrelated/independent).
- -1: Vectors point in opposite directions (perfectly anti-correlated).
Note: In term frequency datasets (like TF-IDF matrices where values are $\ge 0$), the range is strictly 0 to 1.
How do you compute Cosine Similarity in Python?
While you can write it from scratch using numpy.dot and numpy.linalg.norm, in production systems like Content-Based Filtering, it is most efficient to use scikit-learn.
from sklearn.metrics.pairwise import cosine_similarity
# Assuming X is a 2D array/matrix
similarity_matrix = cosine_similarity(X, X)