Probability Theory
Infinite-Dimensional Probability
Цели урока
- Understand Gaussian measures on Banach spaces through one-dimensional projections
- Master Minlos's theorem and the role of nuclearity of the covariance operator
- Analyze the construction of Wiener space via Kolmogorov's theorem
- Connect Fernique concentration and Cameron-Martin to modern ML
Предварительные знания
- Hilbert and Banach spaces
- Weak convergence of probability measures
- Gaussian distributions and covariance matrices
- Compact and nuclear operators
How does one build a probability measure on the space of all continuous functions - and why does DALL-E and Stable Diffusion need it?
- **Diffusion models:** DALL-E, Stable Diffusion generate images by inverting a Gaussian diffusion process in image space
- **Quantum field theory:** Feynman path integrals are measures on quantum-particle trajectory spaces
- **Neural Network Gaussian Process:** the limit of an infinitely wide network is a Gaussian process with NTK kernel
- **Gaussian Process Regression:** a measure on function space for Bayesian ML - the standard in Bayesian optimization
Gaussian Measures on Banach Spaces
In finite dimensions a Gaussian is defined by the density exp(-x^2/2). In infinite dimensions neither Lebesgue measure nor a density exists; a Gaussian measure is characterized through all one-dimensional projections. This is the mathematical foundation of the Neural Network Gaussian Process - the limiting model of infinitely wide neural networks.
A standard Gaussian vector X = (X_1, X_2, ...) on R^infty has ||X||^2 = sum X_i^2 = infinity almost surely. So a 'standard' Gaussian measure on R^infty does not exist; one needs weighted norms or restriction to subspaces with finite covariance trace.
Why does Minlos's theorem require nuclearity of the covariance operator?
Wiener Space
Norbert Wiener in 1923 constructed a probability measure on C([0,1]) of continuous functions, realizing Brownian motion as a standard object. Today this measure underlies diffusion generative models: DALL-E 3 and Stable Diffusion are interpreted as measures on C([0,1] x R^d) - the path space in the image space.
Karhunen-Loeve construction of Wiener measure: W_t = sum (xi_n / (n*pi)) sin(n*pi*t), where xi_n are iid N(0,1). This is the expansion in eigenfunctions of the covariance operator - a standard tool in infinite-dimensional probability.
What is the Holder exponent of a Brownian path almost surely?
Gaussian Concentration and Cameron-Martin
A remarkable property of Gaussian measures in Banach spaces is strong concentration of the norm around its mean. Fernique's inequality gives exponentially small tails for ||X||, the analog of classical Gaussian concentration in R^n. This underlies generalization bounds for infinitely wide neural networks in NTK theory.
Infinite-dimensional probability underlies modern ML
Probability measures on function spaces unite random processes, functional analysis, and Bayesian methods.
- Wiener space — Wiener measure on C([0,1]) is the canonical space for Brownian motion, built via Kolmogorov's theorem
- Malliavin calculus — The Malliavin derivative differentiates on Wiener space; the Cameron-Martin space is the 'smooth' direction
- Neural Tangent Kernel — Limit of an infinitely wide neural network is a Gaussian process with NTK; training is a shift of the Gaussian measure in H
- Diffusion models — Stable Diffusion and DALL-E work with measures on functional spaces of noise-removal trajectories
Итоги
- **Cylindrical measures:** defined on finite-dimensional projections, extended by Kolmogorov
- **Minlos's theorem:** the characteristic functional with nuclear C defines the Gaussian measure
- **No Lebesgue measure:** no sigma-finite translation-invariant measure exists in infinite dimensions
- **Cameron-Martin:** shift of gamma by h is absolutely continuous iff h is in H
- **Dichotomy:** for h outside H, gamma_h and gamma are mutually singular
- **Fernique:** exponential concentration of the norm; Bogolyubov-Sudakov isoperimetry
- **Applications:** Wiener space, NTK, diffusion models, Gaussian processes
When is a shift of a Gaussian measure gamma by vector h absolutely continuous with respect to gamma?