Time Series Forecasting: Predicting the Future
In standard Machine Learning tasks, shuffling your data is a best practice. But in Time Series Forecasting, shuffling destroys the data. The order—the sequence—is the most valuable feature you have.
The Problem with Standard Neural Networks
Traditional Multilayer Perceptrons (MLPs) or Convolutional Neural Networks (CNNs) assume that all inputs and outputs are independent of each other. If you are predicting the price of a stock on Friday, a standard MLP doesn't inherently care that Thursday's price was high.
To fix this, we need a network with memory. This is where Recurrent Neural Networks (RNNs) and their more advanced cousins, LSTMs (Long Short-Term Memory networks), come into play.
Windowing: Framing the Problem
Deep Learning frameworks like TensorFlow and PyTorch expect data to be explicitly labeled as Input (X) and Output (y). But a time series is just a single 1D list of numbers.
We must transform this list into a supervised learning problem using a technique called Windowing. We slide a "window" of a fixed size over the data. If the window size is 30 days, the features (X) are days 1-30, and the target (y) is day 31. We move one step forward, X becomes days 2-31, and y becomes day 32, and so on.
Evaluating Regression
Because forecasting involves predicting a continuous numerical value, we cannot use classification metrics like Accuracy. Instead, we use error metrics that calculate the distance between our predicted value and the actual value.
- MSE (Mean Squared Error): Averages the squared differences. It heavily penalizes large errors.
- MAE (Mean Absolute Error): Averages the absolute differences. It provides a more linear interpretation of the error.
View Architecture Tips+
Always scale your sequence data. Neural Networks converge much faster when inputs are small numbers, typically between 0 and 1. Always use a MinMaxScaler on your training data before passing it into an LSTM, and remember to inverse transform your predictions to get real-world values.
❓ Frequently Asked Questions
Why use LSTMs instead of basic RNNs for time series forecasting?
Basic RNNs: They suffer from the "vanishing gradient problem." If your sequence is very long (e.g., trying to predict today based on data from 100 days ago), the network "forgets" the early data because the gradients become too small to update the weights effectively.
LSTMs (Long Short-Term Memory): They solve this by using an internal mechanism called "gates" (Forget Gate, Input Gate, Output Gate) that explicitly decide what information is important enough to keep and what should be thrown away, allowing them to remember long-term dependencies.
How do I format data for an LSTM in Keras/TensorFlow?
LSTMs require a specific 3-dimensional input shape: (samples, time_steps, features).
- Samples: The total number of windows you created.
- Time Steps: The size of your window (e.g., 30 days).
- Features: The number of variables per time step (e.g., 1 if predicting just closing price, or 5 if using Open, High, Low, Close, Volume).
# Reshaping an array for an LSTM (univariate forecasting) X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))What is the difference between univariate and multivariate time series forecasting?
Univariate: Predicting a single variable based entirely on its own historical values. For example, predicting tomorrow's temperature using only the past 30 days of temperatures.
Multivariate: Predicting a variable using multiple historical features. For example, predicting tomorrow's temperature using historical temperature, humidity, wind speed, and atmospheric pressure.
