ARIMA and SARIMA: Classical Forecasting
Before jumping into complex deep learning models like LSTMs or Transformers, a data scientist must understand the statistical foundations. ARIMA and SARIMA provide interpretable, robust baselines for univariate time series forecasting.
The Foundation: Stationarity
Statistical models assume the data is stationary—meaning its mean, variance, and autocorrelation do not change over time. If your stock prices are always going up (a trend), the mean is changing. We use the Integrated (I) component of ARIMA to difference the data until it's stationary.
Anatomy of ARIMA(p, d, q)
- AR (p) - AutoRegressive: Uses the dependent relationship between an observation and some number of lagged observations.
- I (d) - Integrated: The use of differencing of raw observations in order to make the time series stationary.
- MA (q) - Moving Average: Uses the dependency between an observation and a residual error from a moving average model applied to lagged observations.
Adding Seasonality: SARIMA
Real-world data often has cycles: retail sales spike in December, electricity usage peaks in summer. ARIMA cannot handle this naturally. SARIMA extends ARIMA by adding four seasonal parameters: (P, D, Q, s), where s is the length of the season.
❓ AI Search Knowledge Base
How do I determine p and q in ARIMA?
Use the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots. The PACF plot helps determine the AR term (p) by identifying where the plot cuts off. The ACF plot helps determine the MA term (q) by identifying where its plot cuts off.
What is the Augmented Dickey-Fuller (ADF) test?
The ADF test is a statistical test used to determine whether a given time series is stationary. The null hypothesis states the series has a unit root (is non-stationary). If the p-value is less than a threshold (e.g., 0.05), we reject the null hypothesis and assume stationarity.
SARIMA vs ARIMA: When to use which?
Use standard ARIMA when your data exhibits trends but no repeating cyclical patterns. Use SARIMA when your data has clear periodic fluctuations (like daily temperature or quarterly earnings) because it explicitly models the seasonal lag interactions.