Measure Theory
Disintegration of Measures
Elementary probability defines P(A|B) only when P(B) > 0. The Disintegration Theorem resolves this radically: it decomposes any measure into fibers and gives rigorous meaning to conditional distributions even when P(B) = 0.
- **AlphaGo and conditional probabilities:** DeepMind trained AlphaGo on 50 million Go positions; every move-selection step computes P(move | position). Mathematically this is the conditional measure mu_y delivered by disintegration
- **Spotify Bayesian models:** in 2023 Spotify processed 100 billion streams using Bayesian models built on P(genre | listening history) - formally a disintegration of the joint measure along the observation projection
- **Optimal transport:** a transport plan γ on X×Y disintegrates over X, giving for each x the distribution γ_x of destinations - the standard computational step in Wasserstein distance evaluation
- **Diffusion models (DDPM, score matching):** the score function ∇ log p_t(x) is the gradient of the log marginal density, obtained by integrating conditional measures over the noise schedule
Предварительные знания
- Polish spaces: complete separable metric spaces (R^n, C([0,1]), l^2)
- Borel sigma-algebra and measurable maps
- Pushforward measure: nu = pi_* mu, nu(B) = mu(pi^{-1}(B))
- Conditional expectation E[f | G] as an L^2 projection
The Disintegration Theorem
In 2014, Google DeepMind trained AlphaGo on 50 million Go positions, with conditional probability at the core of every move-selection step. The rigorous foundation is the Disintegration Theorem: any measure can be decomposed along the fibers of a measurable map, yielding a family of probability measures, one per fiber.
The disintegration of mu via pi: X to Y produces {mu_y}. What does mu_y(pi^{-1}(y)) = 1 mean?
Conditional Measures and Optimal Transport
Spotify processed 100 billion streams in 2023 using Bayesian models built on explicit conditional probability structures. The disintegration theorem gives rigorous meaning to P(A|B) even when P(B) = 0, resolving a fundamental gap left by elementary probability theory.
For bivariate normal (X,Y) with correlation rho, why is the conditional measure mu_y = N(rho*y, 1-rho^2)?
Disintegration across measure theory and related fields
The Disintegration Theorem bridges abstract measure theory and applied conditional distributions. It unifies the iterated-integral formula, the Fubini-Tonelli theorem, and regular conditional distributions under one roof.
- Optimal transport — Disintegrating the plan γ ∈ Π(μ,ν) over the first coordinate gives the conditional plan γ_x, the key step in numerical OT
- Fubini-Tonelli theorem — Fubini is a special case: for product measures the conditional measures μ_y are equal to the second marginal
- Bayesian inference — The posterior P(θ|x) is the disintegration of the joint P(θ,x) along the observation x
- Diffusion models — Score matching uses conditional densities p(x|t), which are formally the conditional measures from disintegration over the noise schedule
Итоги
- **Disintegration theorem:** for Polish X, measurable pi: X → Y, and measure mu there exists a nu-a.e. unique family {mu_y} concentrated on the fibers pi^{-1}(y) such that mu = ∫ mu_y dν(y)
- **Iterated integration:** ∫_X f dμ = ∫_Y (∫_{pi^{-1}(y)} f dμ_y) dν(y) - generalizes Fubini to non-product measures
- **Regular conditional distribution:** P(A|Y=y) = mu_y(A) resolves conditioning on zero-probability events
- **Bivariate Gaussian:** mu_y = N(ρy, 1-ρ²) is the conditional measure in closed form, a consequence of the closure of the Gaussian family under conditioning
- **OT application:** disintegrating the optimal plan γ over x yields point masses γ_x = δ_{σ(x)} for monotone transport on R