LSTMs For Time Series: Capturing Memory
Standard feed-forward neural networks suffer from amnesia; they treat every data point independently. Time series data requires context. Long Short-Term Memory networks (LSTMs) were built specifically to carry the "context" of the past into the predictions of the future.
The Vanishing Gradient Solution
Standard Recurrent Neural Networks (RNNs) theoretically support sequential memory. However, in practice, as the sequence gets longer, the gradients used to update the network weights become vanishingly small. This prevents the network from learning long-term dependencies.
LSTMs solve this by introducing a Cell Stateβan internal conveyor belt that runs straight down the entire chain, with only minor linear interactions. Information can easily flow along it unchanged.
The Gate Architecture
LSTMs have the ability to remove or add information to the cell state, carefully regulated by structures called gates.
- Forget Gate: Looks at the previous hidden state and the current input, and outputs a number between 0 and 1 for each number in the cell state. 1 represents "keep this" while 0 represents "get rid of this."
- Input Gate: Decides what new information we're going to store in the cell state.
- Output Gate: Decides what the next hidden state should be. The hidden state contains information on previous inputs, and is also used for predictions.
Data Preprocessing Mandates+
1. Scaling: Neural networks are highly sensitive to unscaled data. Always use MinMaxScaler or StandardScaler from Scikit-Learn. Otherwise, large values will cause exploding gradients.
2. Windowing: You must transform your 1D array into overlapping sequences (e.g. `[X1, X2, X3] -> y1`).
3. 3D Tensor: LSTMs strictly require input arrays in the shape of [number_of_samples, time_steps_per_sample, number_of_features].
β Deep Learning FAQ
Why use LSTM over ARIMA for forecasting?
ARIMA is excellent for linear relationships and requires manual differencing to achieve stationarity. However, LSTMs can capture complex, non-linear relationships, don't strictly require stationary data, and can easily incorporate multivariate inputs (multiple feature columns) which is difficult for standard ARIMA.
What does return_sequences=True do in Keras?
If you stack multiple LSTM layers, you must pass return_sequences=True to all layers except the last one. This tells the LSTM to output the full sequence of hidden states for every time step (a 3D array) rather than just the final hidden state (a 2D array), which is what the next LSTM layer requires as input.
What is the "Lookback" window?
The lookback (or time steps) is how many previous periods of data you provide the model to predict the next period. For example, if you are predicting tomorrow's stock price based on the last 30 days, your lookback is 30. This defines the second dimension of your input tensor shape.