Measure Theory

Disintegration of Measures

Elementary probability defines P(A|B) only when P(B) > 0. The Disintegration Theorem resolves this radically: it decomposes any measure into fibers and gives rigorous meaning to conditional distributions even when P(B) = 0.

**AlphaGo and conditional probabilities:** DeepMind trained AlphaGo on 50 million Go positions; every move-selection step computes P(move | position). Mathematically this is the conditional measure mu_y delivered by disintegration
**Spotify Bayesian models:** in 2023 Spotify processed 100 billion streams using Bayesian models built on P(genre | listening history) - formally a disintegration of the joint measure along the observation projection
**Optimal transport:** a transport plan γ on X×Y disintegrates over X, giving for each x the distribution γ_x of destinations - the standard computational step in Wasserstein distance evaluation
**Diffusion models (DDPM, score matching):** the score function ∇ log p_t(x) is the gradient of the log marginal density, obtained by integrating conditional measures over the noise schedule

Предварительные знания

Polish spaces: complete separable metric spaces (R^n, C([0,1]), l^2)
Borel sigma-algebra and measurable maps
Pushforward measure: nu = pi_* mu, nu(B) = mu(pi^{-1}(B))
Conditional expectation E[f | G] as an L^2 projection

The Disintegration Theorem

In 2014, Google DeepMind trained AlphaGo on 50 million Go positions, with conditional probability at the core of every move-selection step. The rigorous foundation is the Disintegration Theorem: any measure can be decomposed along the fibers of a measurable map, yielding a family of probability measures, one per fiber.

The disintegration of mu via pi: X to Y produces {mu_y}. What does mu_y(pi^{-1}(y)) = 1 mean?

Conditional Measures and Optimal Transport

Spotify processed 100 billion streams in 2023 using Bayesian models built on explicit conditional probability structures. The disintegration theorem gives rigorous meaning to P(A|B) even when P(B) = 0, resolving a fundamental gap left by elementary probability theory.

For bivariate normal (X,Y) with correlation rho, why is the conditional measure mu_y = N(rho*y, 1-rho^2)?

Disintegration across measure theory and related fields

The Disintegration Theorem bridges abstract measure theory and applied conditional distributions. It unifies the iterated-integral formula, the Fubini-Tonelli theorem, and regular conditional distributions under one roof.

Optimal transport — Disintegrating the plan γ ∈ Π(μ,ν) over the first coordinate gives the conditional plan γ_x, the key step in numerical OT
Fubini-Tonelli theorem — Fubini is a special case: for product measures the conditional measures μ_y are equal to the second marginal
Bayesian inference — The posterior P(θ|x) is the disintegration of the joint P(θ,x) along the observation x
Diffusion models — Score matching uses conditional densities p(x|t), which are formally the conditional measures from disintegration over the noise schedule

Итоги

**Disintegration theorem:** for Polish X, measurable pi: X → Y, and measure mu there exists a nu-a.e. unique family {mu_y} concentrated on the fibers pi^{-1}(y) such that mu = ∫ mu_y dν(y)
**Iterated integration:** ∫_X f dμ = ∫_Y (∫_{pi^{-1}(y)} f dμ_y) dν(y) - generalizes Fubini to non-product measures
**Regular conditional distribution:** P(A|Y=y) = mu_y(A) resolves conditioning on zero-probability events
**Bivariate Gaussian:** mu_y = N(ρy, 1-ρ²) is the conditional measure in closed form, a consequence of the closure of the Gaussian family under conditioning
**OT application:** disintegrating the optimal plan γ over x yields point masses γ_x = δ_{σ(x)} for monotone transport on R