Statistics

Time Series: ARIMA and Decomposition

'What will next month's website traffic look like?' 'When is the next sales peak?' - time series are everywhere: stock prices, weather, heart rate, web visits. The ability to forecast them is one of the most in-demand skills in data science.

  • Stock and crypto prices: volatility forecasting and trading strategies
  • Meteorology: temperature anomalies, precipitation forecasting
  • Web analytics: traffic forecasting for infrastructure planning
  • Retail: demand forecasting for inventory management
  • Energy: electricity consumption forecasting for grid balancing

Предварительные знания

  • Linear Regression

Time Series Decomposition

A **time series** is a sequence of observations ordered in time. Any series can be broken into three components: **trend** (long-term direction), **seasonality** (periodic fluctuations), and **noise** (random component). Additive model: Y(t) = Trend + Seasonality + Noise. Multiplicative model: Y(t) = Trend × Seasonality × Noise.

**When to use each model:** additive - when seasonal amplitude is constant (web traffic). Multiplicative - when amplitude grows with the trend (retail sales: December is always '3× the norm', not 'N units above'). STL - the universal choice for noisy real-world data.

An online store's sales grow year over year, and December always brings about 3× the typical monthly volume (not a fixed number of extra units). Which decomposition model fits?

Stationarity and the Dickey-Fuller Test

A **stationary series** has constant mean, variance, and autocovariance over time. Most real-world series are non-stationary due to trend or changing variance. ARIMA cannot be applied directly to a non-stationary series. **Differencing** is the main technique to achieve stationarity: Δy(t) = y(t) − y(t−1).

**Spurious regression:** two non-stationary series can show high correlation simply because both trend upward over time. For example, GDP and the number of pirates in the 18th century. Before regressing time series on each other, check stationarity or use cointegration.

An ADF test on daily stock prices returns p = 0.87. What does this mean and what is the next step?

ARIMA Models: AR, MA, and Their Combinations

**ARIMA(p, d, q)** is a family of models for stationary time series. **AR(p)** - autoregression: current value depends on p past values. **MA(q)** - moving average: current value depends on q past errors. **d** - differencing order (from the ADF test). Choosing p and q: use the ACF (autocorrelation function) and PACF (partial autocorrelation function) plots.

**Choosing p and q from plots:** ACF (autocorrelation function) - a sharp cutoff at lag q indicates MA(q). PACF (partial ACF) - a sharp cutoff at lag p indicates AR(p). In practice: use `pmdarima.auto_arima()` or grid-search by AIC/BIC. Modern alternative to ARIMA - Facebook Prophet, designed for data with strong seasonality and holidays.

An ADF test on monthly sales gives p = 0.03. The ACF decays slowly; the PACF cuts off sharply after lag 2. Which model is most appropriate?

Key Ideas

  • Time series = Trend + Seasonality + Noise (additive) or × (multiplicative)
  • Stationarity is required for ARIMA: constant mean and variance over time
  • ADF test: p < 0.05 → stationary; p > 0.05 → difference the series
  • ARIMA(p,d,q): p from PACF, d from ADF test, q from ACF
  • SARIMA adds seasonal components (P,D,Q)[s]
  • Validation: time-based train/test split, metrics MAE/RMSE/MAPE

Connections to Other Methods

Time series intersect with regression (ARIMAX = ARIMA + external regressors), machine learning (LSTM, Prophet, XGBoost on time features), and spectral analysis (Fourier transform).

  • Linear Regression — ARIMAX extends ARIMA with external predictors, like regression
  • Bayesian Statistics — Bayesian Structural Time Series (BSTS) models by Google

Вопросы для размышления

  • Take any public time series (exchange rate, city temperature). Run a decomposition. Which component dominates?
  • Why is ordinary cross-validation (random split) inappropriate for time series? What should be used instead?
  • Facebook Prophet vs ARIMA: in which situations is each model preferable?

Связанные уроки

  • prob-13-clt
Time Series: ARIMA and Decomposition

0

1

Sign In