AUTOREGRESSIVE MODELS /// LAGS /// PACF /// FORECASTING /// AUTOREGRESSIVE MODELS /// LAGS /// PACF /// FORECASTING ///

Autoregressive Models (AR)

Learn to forecast the future using the past. Master the mathematics of lags, partial autocorrelation, and build robust AR models in Python.

main.py
1 / 8
123456
📈

Lead DS: Time Series Forecasting relies on finding patterns over time. An Autoregressive (AR) model is the purest form of this: predicting tomorrow based on yesterday.


Architecture Path

UNLOCK NODES BY MASTERING AUTOREGRESSION.

Concept: Time Lags

Shifting temporal data backwards to use past states as predictive features.

Validation Node

In a daily time series, what does Lag 7 typically represent?


Quant Connect

Peer Review Network

LIVE

Struggling with non-stationary data? Post your graphs and get feedback from fellow quants.

Autoregressive Models: Time Traveling through Data

Author

Dr. Pascual Vila

Lead Data Scientist // Code Syllabus

In linear regression, we predict Y using independent features X. In Time Series forecasting, an Autoregressive (AR) model predicts Y using its own past historical values as the independent features.

Mathematical Foundation

The core assumption of an AR model is that the current value of the series, $Y_t$, can be explained as a linear combination of its previous $p$ values (lags), plus a constant $c$, and white noise $\epsilon_t$.

$Y_t = c + \phi_1 Y_&123;t - 1&125; + \phi_2 Y_&123;t - 2&125; + \dots + \phi_p Y_&123;t - p&125; + \epsilon_t$

Here, $\phi$ (phi) represents the coefficients (weights) that the model will learn during training using methods like Ordinary Least Squares (OLS) or Maximum Likelihood Estimation (MLE).

Partial Autocorrelation (PACF)

How do we determine $p$ (how many lags to look back)? If we look at standard Autocorrelation (ACF), a lag 1 effect might artificially inflate the correlation at lag 2.

The PACF isolates the direct correlation between $Y_t$ and $Y_&123;t - k&125;$ by removing the indirect effects of the intermediate lags. We typically choose $p$ based on the last lag in the PACF plot that significantly protrudes outside the confidence interval.

The Stationarity Prerequisite

Classical AR models demand that your data is stationary. This means the statistical properties (mean, variance) remain constant over time.

  • If data has a trend, the model's coefficients will be biased.
  • To fix this, we apply differencing: $Y_t' = Y_t - Y_&123;t - 1&125;$.
  • Use the Augmented Dickey-Fuller (ADF) test to verify stationarity programmatically.

Model Architecture FAQ

Why use AR instead of standard Linear Regression?

Standard regression assumes observations are independent. Time series data violates this inherently because today's price is highly dependent on yesterday's price. AR models are mathematically designed to exploit this temporal dependence.

What does "White Noise" ($\epsilon_t$) mean in the equation?

It represents the random, unpredictable variations in the data that cannot be captured by past lags. A perfect AR model will result in residuals (errors) that resemble pure white noise, meaning all signal has been extracted.

Terminology Database

Lag
A shifted version of the time series dataset, used to represent past states.
IPython
Stationarity
A time series whose statistical properties (mean, variance) are constant over time.
IPython
PACF
Partial Autocorrelation Function: Measures direct correlation between a point and its lags.
IPython
AutoReg
Statsmodels class used to estimate Autoregressive (AR-p) models.
IPython