Probability Theory

Martingales

Цели урока

Understand martingales as the mathematical formalization of a fair game
Master Doob's Optional Stopping Theorem and its conditions
Solve the Gambler's Ruin problem using two martingales
Apply Doob's inequalities to bound tail probabilities
Recognize martingales in SGD, Azuma-Hoeffding, and Black-Scholes

Предварительные знания

Conditional expectation E[X|F]
Tower Property
Symmetric random walk
Sigma-algebras and filtrations

Conditional Expectation

The casino always wins against the doubling strategy. After each loss the bet doubles - on the first win everything is recovered plus the initial stake. The scheme seems foolproof. The expected wealth after any number of rounds equals the starting capital. Martingale theory is a rigorous proof of this for any strategy in any fair game.

Black-Scholes: stock price under risk-neutral measure is a martingale
SGD: noisy gradient is a martingale difference sequence
Azuma-Hoeffding: martingale concentration inequalities in online learning
Gambler's Ruin: P(reach N before 0) = k/N computed via martingale
Optional Stopping: why no exit strategy beats a fair casino

From gambling strategy to stochastic analysis

The word martingale is a French term for the doubling betting strategy known since the 18th century. Joseph Doob formalized the concept in 1953 using conditional expectation relative to a filtration. His book Stochastic Processes laid the foundation for modern probability theory. Martingales became the key tool in Ito stochastic calculus and financial mathematics - the Black-Scholes formula rests on the representation of option price as a martingale expectation.

What is a martingale

Formal definition

SGD converges because gradient noise forms a martingale: $\mathbb{E}[\nabla L(w_t) | w_0,...,w_{t-1}] = $ true gradient. Black-Scholes option pricing gives a martingale under risk-neutral measure. The filtration $\mathcal{F}_0 \subseteq \mathcal{F}_1 \subseteq \ldots$ encodes accumulated information, and $\{M_n\}$ is a martingale if:

1. Adapted: M_n is F_n-measurable 2. Integrable: E[|M_n|] < inf for all n 3. Fair game: E[M_{n+1} | F_n] = M_n Consequence (Tower Property): E[M_n] = E[E[M_n | F_{n-1}]] = E[M_{n-1}] = ... = E[M_0] The expected value of a martingale is constant in time.

Sub- and supermartingales

Relaxing equality: $E[M_{n+1} \mid \mathcal{F}_n] \geq M_n$ is a **submartingale** (expected growth, e.g. wealth in a favorable game). If $\leq$ - a **supermartingale** (expected decay). Wealth in a casino with commission is a supermartingale.

Key examples

Process	Definition	Why it is a martingale
Symmetric random walk	$S_n = \xi_1 + \ldots + \xi_n$, $\xi_i \in \{-1,+1\}$	$E[S_{n+1}\|\mathcal{F}_n] = S_n + 0 = S_n$
Centered squared walk	$M_n = S_n^2 - n$	$E[S_{n+1}^2\|\mathcal{F}_n] = S_n^2 + 1$, subtract $n+1$
Exponential martingale	$M_n = e^{\theta S_n} / (\cosh\theta)^n$	$E[e^{\theta \xi}] = \cosh\theta$ cancels the growth

S_{n+1} = S_n + xi_{n+1}, therefore: S_{n+1}^2 = S_n^2 + 2*S_n*xi_{n+1} + xi_{n+1}^2 E[S_{n+1}^2 | F_n] = S_n^2 + 2*S_n*E[xi] + E[xi^2] = S_n^2 + 0 + 1 = S_n^2 + 1 E[S_{n+1}^2 - (n+1) | F_n] = S_n^2 + 1 - n - 1 = S_n^2 - n checked Key consequence: E[S_n^2] = E[M_n] + n = M_0 + n = n Variance of symmetric random walk after n steps = n.

Numpy: simulating the martingale property

Both outputs give ~0.0000 - the martingale property confirmed numerically.

Is $M_n = 2^{S_n}$ a martingale, where $S_n$ is a symmetric random walk?

Optional Stopping Theorem

Strategy: "play until winning 1000 dollars, then quit." Sounds reasonable. The Optional Stopping Theorem says: in a fair game this does not help - the expected state at exit equals the starting capital.

Conditions of the theorem

1. T is bounded: T <= N for some constant N (simplest) 2. E[T] < inf and |M_{n+1} - M_n| <= C (bounded increments) 3. E[sup_n |M_{n ^ T}|] < inf (dominated convergence) Without conditions the theorem is FALSE. Example of failure: doubling strategy gives T < inf a.s., but E[T] = inf - theorem does not apply.

Gambler's Ruin

A player starts with $k$ dollars, wins or loses 1 dollar per round with probability 1/2. The game ends at $N$ (win) or $0$ (ruin). Find $P(\text{win})$.

**Expected game duration** via the second martingale $M_n = S_n^2 - n$: $E[S_T^2 - T] = E[S_0^2] = k^2$ $E[T] = E[S_T^2] - k^2 = Nk - k^2 = k(N-k)$ With k=50, N=100: $E[T] = 50 \cdot 50 = 2500$ rounds.

Unfair game: a different martingale

If the coin is unfair ($p \neq 1/2$), the process $S_n$ is not a martingale. But there exists a function $f(S_n)$ that is.

With P(step +1) = p, P(step -1) = q = 1-p, p != 1/2: M_n = (q/p)^{S_n} is a martingale: E[M_{n+1} | F_n] = M_n * (p*(q/p) + q*(p/q)) = M_n * (q + p) = M_n checked By OST: (q/p)^N * p_k + 1 * (1-p_k) = (q/p)^k p_k = ((q/p)^k - 1) / ((q/p)^N - 1) With p=0.4, k=40, N=100: q/p = 1.5 p_k = (1.5^40 - 1) / (1.5^100 - 1) ≈ 0.0025 (one quarter of a percent!)

Even a small unfavorable bias (p=0.4 instead of 0.5) turns 10% odds into 0.25% with the same starting capital. This is exactly how casinos operate.

A player starts at k=25 with boundaries 0 and N=100 in a symmetric game. What is the win probability and expected game length?

Doob's Inequalities

Doob's inequalities answer: how far can a martingale wander in $n$ steps? This is the tool without which one cannot prove SGD convergence or justify generalization bounds in ML.

Three forms of the inequalities

Maximal inequality: P(max_{k<=n} M_k >= lambda) <= E[M_n^+] / lambda L^p inequality (p > 1): E[(max_{k<=n} M_k)^p] <= (p/(p-1))^p * E[M_n^p] L^2 case (p=2, most practical): E[max_{k<=n} M_k^2] <= 4 * E[M_n^2] The constant (p/(p-1))^p is called Doob's constant. For p=2: (2/1)^2 = 4. For p=4: (4/3)^4 ≈ 3.16.

The bound is often imprecise (2-3x larger than the empirical value). That is fine: it gives a **guaranteed** upper bound for any martingale without knowing the specific distribution.

Azuma-Hoeffding: concentration for bounded martingales

If the martingale increments are bounded: $|M_k - M_{k-1}| \leq c_k$ almost surely, then deviation from the starting value concentrates exponentially:

P(|M_n - M_0| >= t) <= 2 * exp(-t^2 / (2 * sum(c_k^2))) Comparison with Hoeffding for n i.i.d. variables with |X_i| <= c: Hoeffding: P(|S_n/n - mu| >= t) <= 2*exp(-2*n*t^2/c^2) Azuma: applies to DEPENDENT variables! Example - online learning: Regret after T rounds: R_T = sum(loss_t - loss*) R_T is a submartingale with |R_t - R_{t-1}| <= 1 => P(R_T >= t*sqrt(T)) <= 2*exp(-t^2/2)

Numpy: verifying the maximal inequality

For a martingale $M_n$ with $E[M_n^2] = 16$: what upper bound does Doob's $L^2$ inequality give for $E[\max_{k \leq n} M_k^2]$?

Martingales in ML and Finance

Martingales are not an abstract construction. Three concrete applications: SGD, online learning, Black-Scholes.

SGD: martingale structure of noise

Stochastic gradient descent: $w_{t+1} = w_t - \eta g_t$, where $g_t$ is the gradient on a random mini-batch. The noise $\varepsilon_t = g_t - \nabla L(w_t)$ is unbiased by condition:

E[g_t - nabla L(w_t) | w_0, ..., w_t] = 0 (unbiased mini-batch) Accumulated deviation: D_n = sum_{t=0}^{n-1} epsilon_t D_n is a martingale! By Azuma-Hoeffding: P(|D_n| >= lambda * sqrt(n)) <= 2*exp(-lambda^2/2) This is used in SGD convergence proofs: noise cancels in expectation, descent moves in the correct direction.

Black-Scholes: option price as martingale expectation

The Black-Scholes formula for a call option with strike $K$:

Real measure P: S_t = S_0 * exp(mu*t + sigma*W_t) dS_t = mu*S_t*dt + sigma*S_t*dW_t (trend mu!) Change of measure (Girsanov): mu -> r (risk-free rate) Under risk-neutral measure Q: e^{-rt}*S_t is a martingale dS_t = r*S_t*dt + sigma*S_t*dW_t^Q Option price with payoff H = max(S_T - K, 0): V_t = e^{-r(T-t)} * E^Q[max(S_T - K, 0) | F_t] This is the Black-Scholes formula! No arbitrage <=> existence of measure Q.

Interview: classic questions

Classic interview questions

**Q: Is a sum of i.i.d. zero-mean random variables a martingale?**

Yes. $E[S_{n+1} \mid \mathcal{F}_n] = S_n + E[X_{n+1}] = S_n + 0 = S_n$. Independence of $X_{n+1}$ from $\mathcal{F}_n$ is the key step. Simplest martingale in the ML context (SGD noise).

**Q: Why does the doubling strategy not guarantee a win?**

For the doubling strategy $E[T] = \infty$ - the OST condition is violated. With bounded capital $E[M_T] < E[M_0]$. OST: in a fair game without limits $E[M_T] = E[M_0]$ - no profit.

**Q: How are martingales connected to no-arbitrage in finance?**

Fundamental Theorem of Asset Pricing: no arbitrage if and only if a risk-neutral measure Q exists. Under Q discounted asset prices are martingales. Price of any derivative = $E^Q$[discounted payoff].

What is the meaning of the risk-neutral measure Q in the Black-Scholes model?

Martingales: bridge between probability and ML

Martingales connect classical probability, financial mathematics, and modern ML.

Brownian Motion — Next lesson: W(t) and W(t)^2 - t are continuous martingales; foundation of stochastic calculus
Markov Chains — Harmonic functions are martingales; OST applies to hitting times
Conditional Expectation — Previous lesson: E[X|F_n] is the core tool in the martingale definition
Stochastic Calculus — Ito stochastic integral is a martingale; Girsanov's theorem changes the measure

Итоги

**Martingale:** $E[M_{n+1} \mid \mathcal{F}_n] = M_n$ - expected value tomorrow equals today's
**Consequence:** $E[M_n] = E[M_0]$ - the average never changes
**Examples:** $S_n$ (random walk), $S_n^2 - n$, exponential martingale
**OST:** $E[M_T] = E[M_0]$ under suitable conditions on $T$
**Gambler's Ruin:** $P(\text{win} \mid S_0 = k) = k/N$ - direct consequence of OST
**Doob:** $E[\max_k M_k^2] \leq 4 E[M_n^2]$ - controls the maximum
**ML:** SGD noise forms MDS; Azuma-Hoeffding; risk-neutral measure in Black-Scholes

Вопросы для размышления

Why does the Optional Stopping Theorem prevent winning at a casino even with unlimited capital?
For which function f is M_n = f(S_n) a martingale when the coin is unfair?
How are martingales connected to the no-arbitrage condition in financial mathematics?
Why is E[T] < inf important in Doob's theorem, rather than just P(T < inf) = 1?

Связанные уроки

sp-01