Probability Theory
Exchangeability and de Finetti's Theorem
Цели урока
- Understand exchangeability as a weak generalization of independence, invariant to permutations
- Master de Finetti's theorem via the martingale limit of frequencies
- Connect exchangeability to Bayesian statistics and the existence of a prior
- Analyze the Chinese Restaurant Process and Dirichlet Process
Предварительные знания
- Martingales and the martingale convergence theorem
- Conditional expectation and conditional independence
- Measure theory and spaces of probability measures
- Law of large numbers
If the order of patients in a clinical trial does not matter but the outcomes are not independent, what is the structure of their joint distribution?
- **Bayesian statistics:** de Finetti's theorem mathematically justifies the Bayesian approach - the hidden parameter theta emerges as a consequence of exchangeability
- **Permutation-invariant neural networks:** Set Transformer and DeepSets explicitly build permutation-invariant functions - exchangeability baked into the architecture
- **Knowledge representation:** exchangeable partitions in clustering (Chinese Restaurant Process, Indian Buffet Process) in nonparametric Bayesian statistics
- **Finance:** exchangeable default models in CDOs; joint loss distribution over a mortgage portfolio is modeled via exchangeability
Definition of Exchangeability
A researcher tests a drug on 100 patients. A natural assumption: the order of patients does not matter, but outcomes are not independent (they are linked through unknown drug efficacy). This is exchangeability - symmetry weaker than independence. Today exchangeability is built into Transformers through permutation-invariant attention on sets of tokens.
In neural networks, permutation-invariant attention (Set Transformer, DeepSets) explicitly uses exchangeability: the output does not depend on input token order. Architectures for point clouds (PointNet) are built on this invariance.
How does exchangeability differ from independence and identical distribution (iid)?
The de Finetti Theorem
In 1931 Bruno de Finetti proved a theorem that overturned the foundations of probability. He was a finitist and subjectivist who saw probability as a degree of belief. Yet his theorem shows: if observations are exchangeable, an objective parameter theta exists mathematically as the limit of frequencies, and the prior pi is the unique way to describe them. This is the mathematical justification of the Bayesian approach.
de Finetti's theorem requires INFINITE sequences. For finite exchangeable sequences the decomposition need not exist. Counterexample: the uniform distribution on permutations of (1,0,0),(0,1,0),(0,0,1) is exchangeable but not a mixture of iid.
What does de Finetti's theorem guarantee for an infinite exchangeable sequence?
Applications: Bayesian Statistics and Nonparametrics
Modern nonparametric Bayesian methods grow out of de Finetti's theorem. Chinese Restaurant Process (CRP) and Dirichlet Process - infinite-dimensional generalizations that allow modeling clusterings with unknown number of classes. At Stitch Fix and Spotify such models are used for user segmentation: the number of segments is determined automatically by the data rather than set in advance.
Exchangeability connects probability, statistics, and philosophy
de Finetti's theorem bridges subjective and frequentist views of probability via the algebraic structure of exchangeability.
- Bayesian statistics — The prior pi is the unique way to describe exchangeable observations; the posterior is the conditional of theta given X
- Nonparametric Bayes — Chinese Restaurant Process (CRP) and Dirichlet Process are infinite-dimensional extensions of de Finetti for random partitions
- Martingales — Frequencies form a martingale converging to theta - this is the martingale proof of de Finetti's theorem
- Infinite-dimensional probability — The measure pi on the parameter space is a probability measure on an infinite-dimensional space of measures
Итоги
- **Exchangeability:** (X_1,...,X_n) =^d (X_{pi(1)},...,X_{pi(n)}) for all permutations pi
- **iid vs exchangeability:** iid implies exchangeability; the converse fails (Polya urn)
- **de Finetti's theorem:** an infinite exchangeable sequence is a mixture of iid with prior pi
- **Parameter theta** = limit of frequencies - emerges as a limit, not a given
- **Bayesian inference:** the prior pi is unique; updating is the conditional theta | X_1, ..., X_n
- **Nonparametrics:** CRP and Dirichlet Process generalize de Finetti to nonparametric models
- **Finite exchangeability:** does not imply the de Finetti decomposition - infinite sequences are needed
Why is the Dirichlet Process useful in nonparametric Bayesian statistics?