If you can't measure it, you can't improve it. In recommendation, 'Accuracy' is just the beginning of the story.
1Precision and Recall at K
In RecSys, we don't care about the 'Whole list'โusers only look at the top few items. Precision@K tells us what percentage of the items in the top 'K' slots were actually relevant. Recall@K tells us how many of the available relevant items we successfully captured in that same window. There is always a trade-off: as you show more items (increasing K), Recall goes up, but Precision usually goes down because you're including lower-quality matches to fill the slots.
2NDCG: The Gold Standard
Normalized Discounted Cumulative Gain (NDCG) is the most important metric for production systems. Unlike Precision, which treats every slot as equal, NDCG is Rank-Sensitive. It uses a logarithmic 'Discount'โan item at position #1 is worth significantly more than an item at position #10. This encourages the algorithm to be extremely confident about its top-most choices, perfectly matching the human behavior of scanning lists from the top down.
3Diversity, Novelty, and Serendipity
A system with 100% Precision might actually be a bad product. If a user likes 'Star Wars', a 100% precise system might only recommend 'Star Wars 1-9'. This is accurate but Boring. Professional systems also track Diversity (are the items different from each other?) and Novelty (how 'Unexpected' or 'Unknown' is the recommendation?). The ultimate goal is Serendipityโfinding something the user didn't know they wanted, but absolutely loves once they see it.
