DEEP LEARNING /// EMBEDDINGS /// TWO-TOWER MODELS /// NEURAL CF /// DEEP LEARNING /// EMBEDDINGS /// TWO-TOWER MODELS /// NEURAL CF ///

Deep Learning
For RecSys

Transcend matrix factorization. Harness neural networks to model complex, non-linear user behaviors and rich feature sets.

model.py
Batch 1 / 9
12345
🧠

Lecturer:Matrix Factorization is great, but it struggles with non-linear relationships and rich side features. Enter Deep Learning.


Architecture Path

COMPILE LAYERS TO UNLOCK NODES.

Component: Embeddings

Embeddings convert categorical data (like User IDs) into dense vectors that can be processed by neural networks.

Validation Split

What happens if the `input_dim` of your embedding layer is smaller than the maximum User ID in your dataset?


AI Researchers Hub

Discuss Architectures

ONLINE

Got your loss stuck at a local minimum? Share your PyTorch/TensorFlow notebooks and get help.

Deep Learning for Recommendations

Author

Pascual Vila

AI & Data Science Instructor // Code Syllabus

Matrix factorization captures user-item linear interactions well, but real-world preferences are rarely linear. Deep Learning enables recommender systems to process complex, non-linear patterns and naturally integrate rich side-features like text, images, and context.

The Core: Embeddings

The fundamental building block of any Deep Learning RecSys is the Embedding Layer. Unlike images or audio, recommendation data is highly categorical and sparse (e.g., millions of unique User IDs and Item IDs).

Embeddings solve this by mapping discrete IDs to continuous, dense vectors of fixed size (e.g., 64 or 128 dimensions). These vectors encapsulate semantic meaningβ€”items with similar embeddings are conceptually similar in the latent space.

Two-Tower Architectures

A very popular design at companies like Google and Pinterest is the Two-Tower Model. It consists of two separate neural networks (towers):

  • User Tower: Takes user IDs, demographics, and history to produce a final "User Representation Vector".
  • Item Tower: Takes item IDs, text descriptions, and metadata to produce an "Item Representation Vector".

During serving, item representations can be pre-computed and cached. A fast nearest-neighbor search (like FAISS) finds the items whose vectors have the highest dot product with the active user's vector.

Neural Collaborative Filtering (NCF)

Instead of just using a dot product at the end, Neural CF concatenates user and item embeddings and feeds them through a Multi-Layer Perceptron (MLP). This allows the network to learn arbitrary interaction functions from data, rather than relying on a fixed linear dot product.

View Architecture Tips+

Watch out for Overfitting: Deep networks can easily memorize user-item interactions, especially with sparse datasets. Always use heavy regularization techniques like Dropout and L2 weight decay. Additionally, ensure your batch sizes are large enough to provide stable gradients during optimization.

❓ Frequently Asked Questions

Why use Deep Learning instead of traditional Matrix Factorization?

Traditional Matrix Factorization (like SVD) models the user-item interaction as a simple linear dot product of latent factors. Deep Learning (using MLPs and activation functions) can model highly complex, non-linear interactions. Furthermore, deep networks trivially incorporate heterogeneous side features (like item images, descriptions, or user context) into the model.

What is an Embedding Layer and how does it work?

An embedding layer is essentially a lookup table that maps discrete categorical variables (like a user ID) into a continuous vector of floats. Unlike one-hot encoding, which is massive and sparse, embeddings are dense and relatively small (e.g., 64 dimensions). The values in these vectors are learned during training via backpropagation, placing similar items close to each other in vector space.

How do you serve a Two-Tower model in production?

Serving a full neural network for every user-item pair is too slow. Two-Tower models solve this by pre-computing the "Item Tower" vectors offline and storing them in a vector database (like FAISS). At runtime, the "User Tower" processes the user's features live, outputs a vector, and performs a fast Nearest Neighbor search against the item vector database.

RecSys Deep Learning Glossary

Embedding
A learned representation mapping discrete IDs to continuous, dense vectors in a lower-dimensional space.
snippet.py
Two-Tower Model
An architecture with two separate sub-networks for users and items, whose outputs are combined via dot product.
snippet.py
Dense Layer (MLP)
A fully connected neural network layer where each neuron is connected to every neuron in the preceding layer.
snippet.py
Non-linear Activation
A function (like ReLU or Sigmoid) applied to a layer's output to enable the network to learn complex patterns.
snippet.py
Cold Start Problem
The difficulty of recommending items to new users (or new items to users) because there is no historical interaction data.
snippet.py
Dot Product
A mathematical operation taking two equal-length sequences of numbers and returning a single number representing similarity.
snippet.py