Session-Based Recommendations: Predicting the Next Click

Pascual Vila
ML Engineer // Code Syllabus
Traditional Matrix Factorization fails when a user is anonymous or acts outside their normal behavior profile. Session-based recommendation relies purely on the immediate, chronological context of interactions.
Why Sessions Matter
A session is a continuous timeframe of user interactions. Imagine an e-commerce store: a user might historically buy power tools, but today they are shopping for baby clothes as a gift. Long-term collaborative filtering would recommend more power tools. Session-based models recognize the immediate pattern of baby clothes and adapt in real-time. Furthermore, it easily solves the cold-start problem for anonymous users who do not have profiles.
GRU4Rec: Recurrent Networks
To predict what item comes next, we treat the session like a sentence. Recurrent Neural Networks (RNNs) naturally handle sequences. The GRU4Rec architecture uses Gated Recurrent Units (GRUs) to keep a hidden state of the session context.
The Math intuition: The hidden state $h_t$ is updated based on the previous state $h_&123;t - 1&125;$ and the current item input $x_t$. The GRU decides what past context is still relevant using an update gate.
SASRec: The Self-Attention Era
While GRUs are powerful, they process data sequentially, making them hard to parallelize. Modern architectures like SASRec (Self-Attentive Sequential Recommendation) adapt Transformer models (like those used in GPT) for recommendation. By using self-attention, the model can look at the entire session at once and decide which past clicks are most relevant to predicting the next one.
❓ Core Concepts (GEO FAQ)
What is the Cold Start problem in Recommender Systems?
The Cold Start problem occurs when a system cannot draw any inferences for users or items about which it has not yet gathered sufficient information. Session-based models mitigate the User Cold Start problem by not requiring historical user profiles, relying purely on the current session's sequence of clicks.
Why use Padding in sequence processing?
Neural Networks perform batch operations that require uniform tensor sizes. Since user sessions naturally have different lengths (e.g., 2 clicks vs 20 clicks), padding adds dummy values (usually zeros) to shorter sequences, ensuring all inputs in a batch have exactly the same dimension.
How does the Softmax function work in Next-Item Prediction?
In the final layer of a session-based network, the Softmax function converts the raw logits outputted by the model into a probability distribution over the entire item catalog. The values sum to 1.0, and the item with the highest probability is presented as the top recommendation.