Probability Theory
Extreme Value Theory
Цели урока
- Understand the analogy: CLT for sums, Gnedenko's theorem for maxima
- Distinguish three limiting distributions: Gumbel, Fréchet, Weibull
- Know the generalized Pareto distribution (GPD) and its role in tail modeling
- Compute VaR and CVaR as measures of tail risk
- Apply EVT to rare events: floods, financial crashes, cyberattacks
Предварительные знания
- Convergence of random variables
- CLT and laws of large numbers
- Basic distributions
The CLT describes typical values of sums. But in finance, engineering, and insurance, what matters are extreme events: the maximum loss, the worst 100-year flood, the peak server load. Extreme value theory is the 'CLT for maxima': there are exactly three possible limiting distributions for a maximum, regardless of the underlying distribution. This lets you predict rare events from available data.
- Finance: VaR (Value at Risk) and CVaR - Basel III regulatory requirements for banks
- Reliability engineering: designing dams and bridges for 100- and 500-year flood events
- Cybersecurity: estimating maximum DDoS traffic
Three families and one theorem
Ronald Fisher and Leonard Tippett proposed the three-type hypothesis for limiting distributions of maxima in 1928. Boris Gnedenko gave a rigorous proof in 1943 - the Fisher-Tippett-Gnedenko theorem, the exact analog of the CLT for extremes. Pickands and Balkema-de Haan independently proved in 1974 - 1975 that exceedances over high thresholds converge to the generalized Pareto distribution for any distribution in the domain of attraction of a GEV.
1. The Fisher-Tippett-Gnedenko Theorem
Let $X_1, \ldots, X_n$ be i.i.d., $M_n = \max(X_1, \ldots, X_n)$. If normalizing constants $a_n > 0$, $b_n$ exist such that:
then $G$ must be one of three families (or their unification - the GEV):
**Three types of the GEV (generalized extreme value distribution):**
**Examples:** - Maximum of N(0,1) samples: $\to$ Gumbel - Maximum of Pareto($\alpha$) samples: $\to$ Fréchet, $\xi = 1/\alpha$ - Maximum of Uniform(0,1) samples: $\to$ Weibull
2. Generalized Pareto Distribution and Power Laws
**Pickands-Balkema-de Haan theorem:** threshold exceedances converge to the GPD:
**Power law / Pareto distribution:** when $\xi > 0$: $$P(X > x) \sim C \cdot x^{-\alpha}, \quad \alpha = 1/\xi$$ **Power law signatures:** - Wealth distribution (Pareto law): 80% of wealth in 20% of hands - $\alpha \approx 1.16$ - City sizes (Zipf's law) - Earthquake magnitudes (Gutenberg-Richter law) - Vertex degrees in the internet graph **The danger of Fréchet tails:** variance is infinite when $\xi \ge 1/2$; mean is infinite when $\xi \ge 1$. The CLT does not apply - mean-based statistics are unreliable.
3. VaR, CVaR, and Tail Risk Management
**VaR** (Value at Risk) $= F^{-1}(\alpha)$ - the loss level exceeded with probability $1-\alpha$. **CVaR** (Conditional VaR, Expected Shortfall) - the mean loss beyond VaR. CVaR is more informative and coherent (VaR is not subadditive). **EVT for rare events:** with $n = 1000$ observations but needing the $1/10000$ quantile - extrapolate via GPD. Without EVT, there's simply no information beyond the data range. **Key insight:** 99%-VaR under a normal tail $\ne$ 99%-VaR under a heavy tail. Financial crises are 'black swans' - Fréchet tails that normal models systematically underestimate.
Fisher-Tippett-Gnedenko theorem and GEV families
The Fisher-Tippett-Gnedenko theorem is the CLT for maxima: if normalizing constants exist such that (M_n - b_n)/a_n converges in distribution, the limit must be a GEV with shape parameter xi. Three cases: xi=0 (Gumbel, light tails like Normal), xi>0 (Frechet, heavy tails like Pareto), xi<0 (Weibull, bounded support like Uniform).
The GEV unifies all three families: G_xi(x) = exp(-(1 + xi*(x-mu)/sigma)^(-1/xi)). Gumbel is the limiting case xi->0: G_0(x) = exp(-exp(-(x-mu)/sigma)). Every EVT application starts by estimating xi from data.
Normal distribution samples: to which GEV type does the normalized maximum converge?
The normal distribution has a light sub-exponential tail. Block maxima from N(0,1) converge to Gumbel (xi=0). Normalization constants: b_n = sqrt(2 log n), a_n = 1/b_n.
Generalized Pareto distribution and threshold exceedances
The Pickands-Balkema-de Haan theorem: for any distribution in the domain of attraction of a GEV, exceedances over a high threshold u converge to the GPD: H_{xi,sigma}(y) = 1 - (1 + xi*y/sigma)^(-1/xi). When xi>0, the GPD is a Pareto distribution - power law tail P(X>x) ~ C*x^(-1/xi).
Power law signatures in practice: wealth distribution (Pareto 80/20, alpha~1.16), city sizes (Zipf law), earthquake magnitudes (Gutenberg-Richter), internet graph degrees. When xi >= 0.5, variance is infinite; when xi >= 1, mean is infinite - classical CLT-based statistics break down entirely.
Threshold exceedances above u converge to which distribution according to the Pickands-Balkema-de Haan theorem?
The Pickands-Balkema-de Haan (1974-1975) theorem establishes GPD as the universal limit for threshold exceedances. The shape parameter xi is the same as in the corresponding GEV - connecting block maxima and POT approaches.
VaR, CVaR, and tail risk quantification
VaR_alpha = F^(-1)(alpha): the loss level exceeded with probability 1-alpha. CVaR_alpha = E[X | X >= VaR_alpha]: the expected loss beyond VaR. For a GPD tail: CVaR = VaR/(1-xi) + (sigma - xi*u)/(1-xi). CVaR is a coherent risk measure (subadditive); VaR is not.
Normal models systematically underestimate tail risk. At the 99.9% level: N(0,1) VaR = 3.09, t(df=3) VaR = 10.21 - more than 3x larger. Financial crises are Frechet-tail events that Gaussian models cannot price correctly. This is why Basel III requires EVT-based internal models for market risk.
A bank loss GPD tail has xi=0.4. CVaR vs VaR ratio is approximately:
For GPD: CVaR = VaR/(1-xi). With xi=0.4: CVaR = VaR/0.6 ≈ 1.67*VaR. CVaR is finite when xi < 1 (mean exists). The 67% gap shows how much VaR understates expected loss in the tail.
| Parameter $\xi$ | Type | Name | Source tail |
|---|---|---|---|
| $\xi = 0$ | I | Gumbel | Light (normal, Poisson) |
| $\xi > 0$ | II | Fréchet | Heavy (Pareto, Cauchy) |
| $\xi < 0$ | III | Weibull | Bounded (uniform, beta) |
Python: three families of limiting distributions
Maxima from different distributions converge to GEV
```python import numpy as np import matplotlib.pyplot as plt from scipy.stats import gumbel_r, genextreme, pareto np.random.seed(42) n_block = 100 n_blocks = 5000 fig, axes = plt.subplots(1, 3, figsize=(15, 5)) # 1. Normal → Gumbel samples_norm = np.random.randn(n_blocks, n_block) block_maxima = samples_norm.max(axis=1) b_n = np.sqrt(2 * np.log(n_block)) a_n = 1 / b_n norm_max = (block_maxima - b_n) / a_n axes[0].hist(norm_max, bins=50, density=True, alpha=0.7, label='Normalized maxima') x_g = np.linspace(-3, 8, 300) axes[0].plot(x_g, gumbel_r.pdf(x_g), 'r-', lw=2, label='Gumbel') axes[0].set_title('N(0,1) → Gumbel (ξ=0)') axes[0].legend(fontsize=8) # 2. Pareto → Fréchet alpha = 2.0 samples_pareto = pareto.rvs(alpha, size=(n_blocks, n_block)) block_maxima_p = samples_pareto.max(axis=1) a_n_p = n_block**(1/alpha) norm_max_p = block_maxima_p / a_n_p xi_frechet = 1/alpha axes[1].hist(norm_max_p, bins=50, density=True, alpha=0.7, range=(0, 10), label='Normalized maxima') x_f = np.linspace(0.01, 10, 300) axes[1].plot(x_f, genextreme.pdf(x_f, -xi_frechet), 'r-', lw=2, label=f'Fréchet (ξ={xi_frechet:.2f})') axes[1].set_title(f'Pareto(α={alpha}) → Fréchet (ξ=1/α)') axes[1].legend(fontsize=8) # 3. Uniform → reversed Weibull samples_unif = np.random.uniform(0, 1, size=(n_blocks, n_block)) block_maxima_u = samples_unif.max(axis=1) a_n_u = 1 / n_block norm_max_u = (block_maxima_u - 1) / a_n_u axes[2].hist(norm_max_u, bins=50, density=True, alpha=0.7, range=(-5, 0.1), label='Normalized maxima') x_w = np.linspace(-5, 0.1, 300) axes[2].plot(x_w, genextreme.pdf(x_w, 1), 'r-', lw=2, label='Weibull (ξ=-1)') axes[2].set_title('Uniform(0,1) → Weibull (ξ<0)') axes[2].legend(fontsize=8) for ax in axes: ax.set_xlabel('Normalized maximum') ax.set_ylabel('Density') plt.suptitle('Fisher-Tippett-Gnedenko theorem: three GEV families', fontsize=12) plt.tight_layout() plt.show() ```
$X_i \sim$ Cauchy(0,1). To which GEV type does the normalized maximum converge?
The Cauchy distribution has a power-law tail P(X>x) ~ 1/(πx) as x→∞ (tail index α=1). The shape parameter is ξ = 1/α = 1. So the maximum from Cauchy samples converges to a Fréchet distribution with ξ=1. The lack of a mean does not prevent EVT - the theorem only requires existence of a distribution function.
Python: VaR and CVaR under light vs heavy tails
Why normal approximation is dangerous for risk
```python import numpy as np import matplotlib.pyplot as plt from scipy.stats import norm, t np.random.seed(42) n = 10_000 alpha = 0.99 normal_returns = np.random.randn(n) student_returns = np.random.standard_t(df=3, size=n) fig, axes = plt.subplots(1, 2, figsize=(14, 5)) for data, label, color in [ (normal_returns, 'N(0,1)', 'blue'), (student_returns, 't(df=3)', 'red') ]: var = np.quantile(data, alpha) cvar = data[data >= var].mean() data_sorted = np.sort(data) emp_cdf = np.arange(1, n+1) / n axes[0].plot(data_sorted, 1 - emp_cdf, label=f'{label}: VaR={var:.2f}, CVaR={cvar:.2f}', color=color, alpha=0.7) x = np.linspace(-5, 10, 1000) axes[0].plot(x, 1 - norm.cdf(x), 'b--', lw=2, alpha=0.5, label='N(0,1) theory') axes[0].plot(x, 1 - t.cdf(x, df=3), 'r--', lw=2, alpha=0.5, label='t(3) theory') axes[0].set_xlim(0, 8) axes[0].set_yscale('log') axes[0].set_xlabel('x') axes[0].set_ylabel('P(X > x)') axes[0].set_title('Tail probabilities: light vs heavy tail') axes[0].legend(fontsize=7) axes[0].grid(True) alphas = np.linspace(0.90, 0.999, 50) axes[1].plot(alphas, norm.ppf(alphas), 'b-', label='N(0,1) VaR', lw=2) axes[1].plot(alphas, t.ppf(alphas, df=3), 'r-', label='t(3) VaR', lw=2) axes[1].set_xlabel('Level α') axes[1].set_ylabel('VaR_α') axes[1].set_title('VaR: light vs heavy tail') axes[1].legend() axes[1].grid(True) plt.tight_layout() plt.show() print(f'99.9%-VaR: N(0,1)={norm.ppf(0.999):.2f}, t(3)={t.ppf(0.999, df=3):.2f}') # N(0,1): 3.09, t(3): 10.21 - more than 3x larger! ```
A bank's daily loss has a GPD tail with $\xi = 0.4$. What does this imply for risk management?
With ξ=0.4 > 0: heavy Fréchet tail. Variance is finite since ξ < 0.5. Mean is finite since ξ < 1. But CVaR = VaR/(1-ξ) = VaR/0.6 - 67% higher than VaR itself. Normal approximation severely underestimates extreme loss risk. This is why EVT is mandatory in risk management.
Extreme values - the tails of distributions
EVT completes the picture: CLT for the center, EVT for the tails.
- Concentration of measure — Hoeffding bounds tails from above; EVT describes their actual shape
- Information-theoretic methods — KL divergence between a normal and a heavy-tailed distribution quantifies the danger of the normal assumption
- Convergence of random variables — Gnedenko's theorem is convergence in distribution for maxima - the complete analog of the CLT
Итоги
- **Gnedenko's theorem:** the normalized maximum converges to a GEV with parameter $\xi$ (Gumbel/Fréchet/Weibull)
- **Three types:** $\xi=0$ Gumbel (light tail), $\xi>0$ Fréchet (heavy, power law), $\xi<0$ Weibull (bounded)
- **GPD:** threshold exceedances follow the generalized Pareto distribution; power law when $\xi > 0$
- **VaR/CVaR:** VaR = quantile, CVaR = tail mean; EVT enables extrapolation beyond observed data
Вопросы для размышления
- Nassim Taleb's 'black swans' are rare extreme events with outsized impact. How does EVT formalize this concept? In what sense do distributions with ξ > 0 'generate' black swans?
- CVaR is coherent (subadditive); VaR is not. What does this mean practically - why can a portfolio have a higher VaR than the sum of its components' VaRs?
- Power laws appear everywhere - wealth, traffic, earthquakes. Is there a single mechanism generating them? How does preferential attachment (Barabási-Albert) produce power laws in networks?