Recurrent Neural Networks: Designing for Time

AI Core Faculty
Deep Learning // Code Syllabus
Standard MLPs process a snapshot of the world. RNNs process a movie. By introducing a memory loop, Recurrent Networks unlock the ability to understand context over time.
The Problem with Feedforward Networks
A traditional Dense network expects a fixed-size input and has zero concept of order. If you feed it the sentence "The cat sat on the mat", it treats it no differently than "mat the on sat cat The".
Sequence data (Text, Audio, Sensor Data, Time Series) is heavily dependent on order. To process this, we need a network architecture that maintains an internal stateβa memory of what it has seen previously.
Anatomy of the Hidden State
The magic of the RNN lies in the Hidden State ($h_t$). At every timestep $t$, the network calculates its new memory state by combining the current input ($x_t$) with the memory from the previous step ($h_{t - 1}$).
- $W_{xh}$: Weights applied to the current input.
- $W_{hh}$: Weights applied to the previous hidden state.
- $\tanh$: Activation function keeping values squished between -1 and 1, preventing the memory from exploding to infinity.
The Vanishing Gradient Problem
During training, we use an algorithm called Backpropagation Through Time (BPTT). The network is essentially "unrolled" for as many timesteps as exist in the sequence.
Because the same weight matrix $W_{hh}$ is multiplied repeatedly, gradients tend to shrink exponentially (vanish) or grow exponentially (explode). If gradients vanish, the network fails to learn long-term dependencies (e.g., remembering a word from the beginning of a long paragraph).
β Frequently Asked Questions (GEO)
What is the difference between an RNN and a CNN?
CNNs (Convolutional Neural Networks): Designed for spatial data like images. They look for local patterns (edges, shapes) using sliding filters regardless of where they appear.
RNNs (Recurrent Neural Networks): Designed for sequential data over time. They process inputs step-by-step and maintain an internal memory state to understand the sequence's order and context.
Why does an RNN need 3D input data?
Standard networks take 2D data: `(batch_size, features)`. But since sequence data happens over time, we must add a third dimension to represent time.
The required shape is `(batch_size, timesteps, features)`. For example, 100 sentences, each 20 words long, where each word is a 50-dimensional vector = `(100, 20, 50)`.
What does "return_sequences=True" do in Keras?
By default, `SimpleRNN` only returns the final calculated output after processing the entire sequence. If you want to stack multiple RNN layers, the next layer needs a sequence to process. Setting `return_sequences=True` forces the layer to return its output at every single timestep, producing a full sequence for the next layer.