Deep Learning assumes data is independent. RL data is anything but. Experience Replay and Target Networks are the tools that bridge this gap.
1Breaking the Correlation
In a normal RL loop, step 10 is very similar to step 11. If a neural network learns from these in sequence, it becomes 'Overfit' to the immediate situation and forgets everything else. Experience Replay solves this by storing $(s, a, r, s')$ transitions in a large buffer (a 'memory pool'). During training, we sample a Random Batch from this pool. This effectively turns the RL problem into a Supervised Learning problem with independent, identically distributed (i.i.d.) data.
2Learning from the Past
Another massive benefit of Experience Replay is Data Efficiency. In traditional RL, once an experience happens, it's gone. With a buffer, the agent can 're-study' its past successes and failures multiple times. This allows the model to extract every ounce of information from a single interaction, which is critical in environments where gathering data is expensive (like real-world robotics).
3Target Networks
In DQN, we calculate our loss using a 'Target': $Y = R + gamma max Q(s', a')$. If we use our active model to calculate this target, the target changes every time we update the weights. This is like a dog chasing its own tail. A Target Network is a 'Frozen' copy of the model used *only* to calculate the targets. Every few thousand steps, we 'sync' the target network with the active model, providing a stable goalpost for the learning process to aim for.
