Probability Theory

Extreme Value Theory

Цели урока

Understand the analogy: CLT for sums, Gnedenko's theorem for maxima
Distinguish three limiting distributions: Gumbel, Fréchet, Weibull
Know the generalized Pareto distribution (GPD) and its role in tail modeling
Compute VaR and CVaR as measures of tail risk
Apply EVT to rare events: floods, financial crashes, cyberattacks

Предварительные знания

Convergence of random variables
CLT and laws of large numbers
Basic distributions

Convergence of Random Variables

The CLT describes typical values of sums. But in finance, engineering, and insurance, what matters are extreme events: the maximum loss, the worst 100-year flood, the peak server load. Extreme value theory is the 'CLT for maxima': there are exactly three possible limiting distributions for a maximum, regardless of the underlying distribution. This lets you predict rare events from available data.

Finance: VaR (Value at Risk) and CVaR - Basel III regulatory requirements for banks
Reliability engineering: designing dams and bridges for 100- and 500-year flood events
Cybersecurity: estimating maximum DDoS traffic

Three families and one theorem

Ronald Fisher and Leonard Tippett proposed the three-type hypothesis for limiting distributions of maxima in 1928. Boris Gnedenko gave a rigorous proof in 1943 - the Fisher-Tippett-Gnedenko theorem, the exact analog of the CLT for extremes. Pickands and Balkema-de Haan independently proved in 1974 - 1975 that exceedances over high thresholds converge to the generalized Pareto distribution for any distribution in the domain of attraction of a GEV.

1. The Fisher-Tippett-Gnedenko Theorem

Let $X_1, \ldots, X_n$ be i.i.d., $M_n = \max(X_1, \ldots, X_n)$. If normalizing constants $a_n > 0$, $b_n$ exist such that:

then $G$ must be one of three families (or their unification - the GEV):

**Three types of the GEV (generalized extreme value distribution):**

**Examples:** - Maximum of N(0,1) samples: $\to$ Gumbel - Maximum of Pareto($\alpha$) samples: $\to$ Fréchet, $\xi = 1/\alpha$ - Maximum of Uniform(0,1) samples: $\to$ Weibull

2. Generalized Pareto Distribution and Power Laws

**Pickands-Balkema-de Haan theorem:** threshold exceedances converge to the GPD:

**Power law / Pareto distribution:** when $\xi > 0$: $$P(X > x) \sim C \cdot x^{-\alpha}, \quad \alpha = 1/\xi$$ **Power law signatures:** - Wealth distribution (Pareto law): 80% of wealth in 20% of hands - $\alpha \approx 1.16$ - City sizes (Zipf's law) - Earthquake magnitudes (Gutenberg-Richter law) - Vertex degrees in the internet graph **The danger of Fréchet tails:** variance is infinite when $\xi \ge 1/2$; mean is infinite when $\xi \ge 1$. The CLT does not apply - mean-based statistics are unreliable.

3. VaR, CVaR, and Tail Risk Management

**VaR** (Value at Risk) $= F^{-1}(\alpha)$ - the loss level exceeded with probability $1-\alpha$. **CVaR** (Conditional VaR, Expected Shortfall) - the mean loss beyond VaR. CVaR is more informative and coherent (VaR is not subadditive). **EVT for rare events:** with $n = 1000$ observations but needing the $1/10000$ quantile - extrapolate via GPD. Without EVT, there's simply no information beyond the data range. **Key insight:** 99%-VaR under a normal tail $\ne$ 99%-VaR under a heavy tail. Financial crises are 'black swans' - Fréchet tails that normal models systematically underestimate.

Fisher-Tippett-Gnedenko theorem and GEV families

The Fisher-Tippett-Gnedenko theorem is the CLT for maxima: if normalizing constants exist such that (M_n - b_n)/a_n converges in distribution, the limit must be a GEV with shape parameter xi. Three cases: xi=0 (Gumbel, light tails like Normal), xi>0 (Frechet, heavy tails like Pareto), xi<0 (Weibull, bounded support like Uniform).

The GEV unifies all three families: G_xi(x) = exp(-(1 + xi*(x-mu)/sigma)^(-1/xi)). Gumbel is the limiting case xi->0: G_0(x) = exp(-exp(-(x-mu)/sigma)). Every EVT application starts by estimating xi from data.

Normal distribution samples: to which GEV type does the normalized maximum converge?

The normal distribution has a light sub-exponential tail. Block maxima from N(0,1) converge to Gumbel (xi=0). Normalization constants: b_n = sqrt(2 log n), a_n = 1/b_n.

Generalized Pareto distribution and threshold exceedances

The Pickands-Balkema-de Haan theorem: for any distribution in the domain of attraction of a GEV, exceedances over a high threshold u converge to the GPD: H_{xi,sigma}(y) = 1 - (1 + xi*y/sigma)^(-1/xi). When xi>0, the GPD is a Pareto distribution - power law tail P(X>x) ~ C*x^(-1/xi).

Power law signatures in practice: wealth distribution (Pareto 80/20, alpha~1.16), city sizes (Zipf law), earthquake magnitudes (Gutenberg-Richter), internet graph degrees. When xi >= 0.5, variance is infinite; when xi >= 1, mean is infinite - classical CLT-based statistics break down entirely.

Threshold exceedances above u converge to which distribution according to the Pickands-Balkema-de Haan theorem?

The Pickands-Balkema-de Haan (1974-1975) theorem establishes GPD as the universal limit for threshold exceedances. The shape parameter xi is the same as in the corresponding GEV - connecting block maxima and POT approaches.

VaR, CVaR, and tail risk quantification

VaR_alpha = F^(-1)(alpha): the loss level exceeded with probability 1-alpha. CVaR_alpha = E[X | X >= VaR_alpha]: the expected loss beyond VaR. For a GPD tail: CVaR = VaR/(1-xi) + (sigma - xi*u)/(1-xi). CVaR is a coherent risk measure (subadditive); VaR is not.

Normal models systematically underestimate tail risk. At the 99.9% level: N(0,1) VaR = 3.09, t(df=3) VaR = 10.21 - more than 3x larger. Financial crises are Frechet-tail events that Gaussian models cannot price correctly. This is why Basel III requires EVT-based internal models for market risk.

A bank loss GPD tail has xi=0.4. CVaR vs VaR ratio is approximately:

For GPD: CVaR = VaR/(1-xi). With xi=0.4: CVaR = VaR/0.6 ≈ 1.67*VaR. CVaR is finite when xi < 1 (mean exists). The 67% gap shows how much VaR understates expected loss in the tail.

Parameter $\xi$	Type	Name	Source tail
$\xi = 0$	I	Gumbel	Light (normal, Poisson)
$\xi > 0$	II	Fréchet	Heavy (Pareto, Cauchy)
$\xi < 0$	III	Weibull	Bounded (uniform, beta)

Python: three families of limiting distributions

Maxima from different distributions converge to GEV

```python import numpy as np import matplotlib.pyplot as plt from scipy.stats import gumbel_r, genextreme, pareto np.random.seed(42) n_block = 100 n_blocks = 5000 fig, axes = plt.subplots(1, 3, figsize=(15, 5)) # 1. Normal → Gumbel samples_norm = np.random.randn(n_blocks, n_block) block_maxima = samples_norm.max(axis=1) b_n = np.sqrt(2 * np.log(n_block)) a_n = 1 / b_n norm_max = (block_maxima - b_n) / a_n axes[0].hist(norm_max, bins=50, density=True, alpha=0.7, label='Normalized maxima') x_g = np.linspace(-3, 8, 300) axes[0].plot(x_g, gumbel_r.pdf(x_g), 'r-', lw=2, label='Gumbel') axes[0].set_title('N(0,1) → Gumbel (ξ=0)') axes[0].legend(fontsize=8) # 2. Pareto → Fréchet alpha = 2.0 samples_pareto = pareto.rvs(alpha, size=(n_blocks, n_block)) block_maxima_p = samples_pareto.max(axis=1) a_n_p = n_block**(1/alpha) norm_max_p = block_maxima_p / a_n_p xi_frechet = 1/alpha axes[1].hist(norm_max_p, bins=50, density=True, alpha=0.7, range=(0, 10), label='Normalized maxima') x_f = np.linspace(0.01, 10, 300) axes[1].plot(x_f, genextreme.pdf(x_f, -xi_frechet), 'r-', lw=2, label=f'Fréchet (ξ={xi_frechet:.2f})') axes[1].set_title(f'Pareto(α={alpha}) → Fréchet (ξ=1/α)') axes[1].legend(fontsize=8) # 3. Uniform → reversed Weibull samples_unif = np.random.uniform(0, 1, size=(n_blocks, n_block)) block_maxima_u = samples_unif.max(axis=1) a_n_u = 1 / n_block norm_max_u = (block_maxima_u - 1) / a_n_u axes[2].hist(norm_max_u, bins=50, density=True, alpha=0.7, range=(-5, 0.1), label='Normalized maxima') x_w = np.linspace(-5, 0.1, 300) axes[2].plot(x_w, genextreme.pdf(x_w, 1), 'r-', lw=2, label='Weibull (ξ=-1)') axes[2].set_title('Uniform(0,1) → Weibull (ξ<0)') axes[2].legend(fontsize=8) for ax in axes: ax.set_xlabel('Normalized maximum') ax.set_ylabel('Density') plt.suptitle('Fisher-Tippett-Gnedenko theorem: three GEV families', fontsize=12) plt.tight_layout() plt.show() ```

$X_i \sim$ Cauchy(0,1). To which GEV type does the normalized maximum converge?

The Cauchy distribution has a power-law tail P(X>x) ~ 1/(πx) as x→∞ (tail index α=1). The shape parameter is ξ = 1/α = 1. So the maximum from Cauchy samples converges to a Fréchet distribution with ξ=1. The lack of a mean does not prevent EVT - the theorem only requires existence of a distribution function.

Python: VaR and CVaR under light vs heavy tails

Why normal approximation is dangerous for risk

```python import numpy as np import matplotlib.pyplot as plt from scipy.stats import norm, t np.random.seed(42) n = 10_000 alpha = 0.99 normal_returns = np.random.randn(n) student_returns = np.random.standard_t(df=3, size=n) fig, axes = plt.subplots(1, 2, figsize=(14, 5)) for data, label, color in [ (normal_returns, 'N(0,1)', 'blue'), (student_returns, 't(df=3)', 'red') ]: var = np.quantile(data, alpha) cvar = data[data >= var].mean() data_sorted = np.sort(data) emp_cdf = np.arange(1, n+1) / n axes[0].plot(data_sorted, 1 - emp_cdf, label=f'{label}: VaR={var:.2f}, CVaR={cvar:.2f}', color=color, alpha=0.7) x = np.linspace(-5, 10, 1000) axes[0].plot(x, 1 - norm.cdf(x), 'b--', lw=2, alpha=0.5, label='N(0,1) theory') axes[0].plot(x, 1 - t.cdf(x, df=3), 'r--', lw=2, alpha=0.5, label='t(3) theory') axes[0].set_xlim(0, 8) axes[0].set_yscale('log') axes[0].set_xlabel('x') axes[0].set_ylabel('P(X > x)') axes[0].set_title('Tail probabilities: light vs heavy tail') axes[0].legend(fontsize=7) axes[0].grid(True) alphas = np.linspace(0.90, 0.999, 50) axes[1].plot(alphas, norm.ppf(alphas), 'b-', label='N(0,1) VaR', lw=2) axes[1].plot(alphas, t.ppf(alphas, df=3), 'r-', label='t(3) VaR', lw=2) axes[1].set_xlabel('Level α') axes[1].set_ylabel('VaR_α') axes[1].set_title('VaR: light vs heavy tail') axes[1].legend() axes[1].grid(True) plt.tight_layout() plt.show() print(f'99.9%-VaR: N(0,1)={norm.ppf(0.999):.2f}, t(3)={t.ppf(0.999, df=3):.2f}') # N(0,1): 3.09, t(3): 10.21 - more than 3x larger! ```

A bank's daily loss has a GPD tail with $\xi = 0.4$. What does this imply for risk management?

With ξ=0.4 > 0: heavy Fréchet tail. Variance is finite since ξ < 0.5. Mean is finite since ξ < 1. But CVaR = VaR/(1-ξ) = VaR/0.6 - 67% higher than VaR itself. Normal approximation severely underestimates extreme loss risk. This is why EVT is mandatory in risk management.

Extreme values - the tails of distributions

EVT completes the picture: CLT for the center, EVT for the tails.

Concentration of measure — Hoeffding bounds tails from above; EVT describes their actual shape
Information-theoretic methods — KL divergence between a normal and a heavy-tailed distribution quantifies the danger of the normal assumption
Convergence of random variables — Gnedenko's theorem is convergence in distribution for maxima - the complete analog of the CLT

Итоги

**Gnedenko's theorem:** the normalized maximum converges to a GEV with parameter $\xi$ (Gumbel/Fréchet/Weibull)
**Three types:** $\xi=0$ Gumbel (light tail), $\xi>0$ Fréchet (heavy, power law), $\xi<0$ Weibull (bounded)
**GPD:** threshold exceedances follow the generalized Pareto distribution; power law when $\xi > 0$
**VaR/CVaR:** VaR = quantile, CVaR = tail mean; EVT enables extrapolation beyond observed data

Вопросы для размышления

Nassim Taleb's 'black swans' are rare extreme events with outsized impact. How does EVT formalize this concept? In what sense do distributions with ξ > 0 'generate' black swans?
CVaR is coherent (subadditive); VaR is not. What does this mean practically - why can a portfolio have a higher VaR than the sum of its components' VaRs?
Power laws appear everywhere - wealth, traffic, earthquakes. Is there a single mechanism generating them? How does preferential attachment (Barabási-Albert) produce power laws in networks?

Связанные уроки

stat-05-hypothesis