DEEP Q-NETWORKS /// FUNCTION APPROXIMATION /// BELLMAN EQUATION /// DEEP Q-NETWORKS /// FUNCTION APPROXIMATION ///

Intro To DQN

Escape the limitations of the Q-Table. Master Neural Network function approximation and build your first Deep Q-Network.

dqn_agent.py
1 / 7
🧠
Neural Link Initiated

A.I.D.E:Tabular Q-Learning works great for simple games like Tic-Tac-Toe. But what happens when we play a video game with millions of pixels? Q-Tables run out of memory.


Architecture

UNLOCK NODES BY MASTERING NETWORKS.

Concept: The Q-Table Limit

A standard Q-table requires one row per state. When states are continuous (like coordinates) or massive (like images), a table is computationally impossible.

System Check

Why do we replace Q-Tables with Neural Networks?


Community Nexus

Share Your Agent's Progress

ONLINE

Struggling with exploding gradients? Join our Discord and discuss hyperparameters with fellow ML engineers!

Deep Q-Networks: Solving The Dimensionality Curse

🤖

RL Guide

Lead AI Instructor // Neural Syllabus

"Tabular Q-Learning is excellent until your agent needs to play a video game. You can't fit billions of pixel combinations into an array. We needed a function approximator. We needed Neural Networks."

The Problem: State Explosion

Standard Q-learning stores knowledge in a dictionary or matrix called a Q-Table. If you have 10 states and 4 actions, your table has 40 entries. Easy. But consider an Atari game screen (210 x 160 pixels, 128 colors). The number of possible states is astronomically larger than the number of atoms in the universe. Storing this in RAM is impossible. This is known as the Curse of Dimensionality.

The Solution: Function Approximation

Instead of a table that says: "If state is exactly X, Q-value is Y", we use a Neural Network. The network takes the state as input, and outputs the estimated Q-values for all possible actions. The neural network learns the *underlying patterns* of the game, allowing it to generalize to states it has never seen before!

The Math: Training the DQN

We train the network using the same Bellman equation principles. We want our network to minimize the difference (loss) between its predicted Q-value and the "target" Q-value.

Target = Reward + (Discount Factor * Max Q of Next State)
View Implementation Note+

DQN Instability: Combining Non-linear Function Approximators (Neural Nets) with Bootstrapping (Bellman equation) and Off-policy learning (Q-learning) is known as the "Deadly Triad". It often leads to training divergence. In the next lessons, we introduce Experience Replay and Target Networks to fix this!

Frequently Asked Questions (SEO)

What is the difference between Q-Learning and Deep Q-Learning?

Q-Learning uses a discrete lookup table (Q-Table) to store the value of every state-action pair. Deep Q-Learning (DQN) replaces this table with a deep neural network, allowing the agent to handle continuous or massively large state spaces (like raw images) by generalizing across similar states.

Why do we use MSE loss in Deep Q-Networks?

We use Mean Squared Error (MSE) because Q-learning is fundamentally a regression problem. We are trying to predict a continuous numerical value (the expected cumulative reward), not classifying a label. MSE directly measures how far our neural network's Q-value prediction is from the target Bellman value.

How many outputs does a DQN have?

A standard DQN has an output size equal to the action space dimension. For example, if an agent can move UP, DOWN, LEFT, or RIGHT, the final layer outputs a tensor of size 4. This allows the network to calculate the Q-values for all possible actions simultaneously in a single forward pass.

Architecture Glossary

DQN (Deep Q-Network)
An algorithm that uses a neural network to approximate the optimal action-value function, resolving the curse of dimensionality.
code.py
State Space
The complete set of all possible situations the agent can be in. In DQN, this is the input vector to the neural net.
code.py
Action Space
The set of all possible actions. In DQN, this defines the number of nodes in the final output layer.
code.py
Forward Pass
Pushing the state tensor through the network layers to calculate predicted Q-values.
code.py
Bellman Equation
The recursive equation used to formulate the Target Q-value for loss calculation.
code.py
MSE Loss
Mean Squared Error: The function used to measure the difference between the predicted Q-value and the target.
code.py