Measure Theory
Sigma-Algebras and Measurable Sets
The probability that a random number from $[0,1]$ is rational equals zero - even though there are infinitely many rationals. Lebesgue in 1902 broke Riemann's intuition with one counterexample: the Riemann integral cannot compute the length of the set of rationals. The Lebesgue integral can - and the answer is zero. Infinitely many points, measure zero. This is measure theory: the language in which modern probability, the expectation of a neural network's loss function, and Gaussian processes are all written. Without sigma-algebras, probability collapses into contradictions.
- **Neural networks and expectation:** the loss of a neural network is $\mathbb{E}[L(\theta)] = \int L(\theta, x) \, d\mu(x)$ - a Lebesgue integral over a measure on the data. Not Riemann, because data can have singular structure.
- **Kolmogorov's probability:** the σ-algebra of events defines which questions about a random experiment are well-posed. Gaussian Processes in ML are probability measures on infinite-dimensional function spaces - measure theory all the way down.
- **Financial mathematics:** filtrations (growing σ-algebras) model the arrival of information to the market - what a trader knows at time $t$. Diffusion models (DDPM, Stable Diffusion) are stochastic differential equations built on exactly this foundation.
- **Quantum mechanics:** operator algebras (a generalization of σ-algebras) describe observables - quantities that can physically be measured. Quantum measurement probabilities are measures on σ-algebras.
Sigma Algebra
The goal: assign a 'length' to subsets of the real line, so that the length of $[a,b]$ is $b-a$ and there are no contradictions. The naive idea: measure everything. Vitali in 1905 showed this is impossible - there are subsets of $\mathbb{R}$ for which any assignment of length leads to a contradiction. A sigma-algebra is the answer: a collection of subsets for which measurement is internally consistent.
**Sigma-algebra** (σ-algebra) F over a set X is a family of subsets of X satisfying three axioms: 1. **Empty set:** ∅ ∈ F 2. **Closure under complements:** if A ∈ F, then X \ A ∈ F 3. **Closure under countable unions:** if A₁, A₂, A₃, ... ∈ F, then ∪ᵢ Aᵢ ∈ F
From the three axioms, more follows immediately: $X \in F$ (since $X = X \setminus \emptyset$, and $\emptyset \in F$ with F closed under complements). And F is closed under **countable intersections** - by De Morgan: $\bigcap A_i = \left(\bigcup A_i^c\right)^c$. The structure is self-contained: three axioms are enough to derive everything.
Émile Borel and the Birth of the Theory
At the beginning of the 20th century, Émile Borel and Henri Lebesgue faced a problem: the Riemann integral could not handle many important functions. Borel introduced the concept of 'measurable sets' in 1898, and Lebesgue built a complete measure theory in 1902. Sigma-algebras became the formal foundation on which all modern probability theory and integration rests.
Why **countable** unions specifically? Allowing uncountable unions would force the sigma-algebra to collapse into all subsets in many cases - making it impossible to exclude the Vitali set. Countability is exactly the boundary that preserves both flexibility and consistency.
Which of the following families is NOT a σ-algebra over X = {1, 2, 3}?
Measurable
Given a sigma-algebra $F$, the definition is immediate: a set $A$ is **measurable** if $A \in F$. The pair $(X, F)$ is a **measurable space**. Add a measure $\mu: F \to [0, \infty]$ and get a **measure space** $(X, F, \mu)$. Kolmogorov's probability space is exactly $(\Omega, \mathcal{F}, \mathbb{P})$.
**Measurable space** - a pair (X, F), where X is a set and F is a σ-algebra over X. Elements of F are called **measurable sets**. If a measure μ: F → [0, ∞] is also given, the triple (X, F, μ) is called a **measure space**.
Key point: **not all subsets** of $X$ need to be measurable. This is not a limitation - it is a necessity. Giuseppe Vitali in 1905 proved: if length is assigned to every subset of $\mathbb{R}$ in a translation-invariant, sigma-additive way, an irresolvable contradiction arises.
**The Vitali set** is constructed as follows: partition [0, 1] into equivalence classes by x ~ y ⟺ x - y ∈ ℚ. By the axiom of choice, pick one representative from each class. The resulting set V cannot be consistently given a length: if λ(V) = 0, then λ(ℝ) = 0 (contradiction); if λ(V) > 0, then λ([0,2]) = ∞ (contradiction too).
| Property | Measurable set | Non-measurable set |
|---|---|---|
| Belongs to F? | Yes | No |
| Can be assigned a measure? | Yes | No (contradiction) |
| Example in ℝ | Any interval, open/closed set | Vitali set |
| Requires axiom of choice? | No | Yes (for construction) |
In practice, non-measurable sets are exotic - they require the axiom of choice to construct. Every set encountered in analysis, physics, and ML is measurable. The sigma-algebra is not a formality - it is the exact boundary beyond which the whole idea of integration falls apart.
Why are not all subsets of ℝ Lebesgue measurable?
Borel
In practice one works with a specific sigma-algebra on $\mathbb{R}$ - the **Borel** σ-algebra $B(\mathbb{R})$. This is the smallest σ-algebra containing all open sets. Borel introduced it in 1898 - four years before Lebesgue built his theory of integration on top of it. Two people, one revolution.
**Borel σ-algebra** B(ℝ) = σ(τ), where τ is the topology on ℝ (the collection of all open sets). This means: B(ℝ) is the smallest σ-algebra containing all open subsets of ℝ.
What is in $B(\mathbb{R})$? Start with open sets, apply the σ-algebra axioms: complements of open sets are closed; countable unions of closed sets are $F_\sigma$; their complements are $G_\delta$; and so on. The hierarchy goes deep - but covers every set encountered in analysis, probability, and ML.
$B(\mathbb{R})$ is a **strict subset** of the σ-algebra of Lebesgue measurable sets. Every Borel set is Lebesgue measurable, but not the other way around - Lebesgue adds subsets of zero-measure Borel sets. For ML practice, this distinction is irrelevant: everything encountered is Borel.
| Type of set | Example | Borel? | Lebesgue measurable? |
|---|---|---|---|
| Open interval | (0, 1) | Yes | Yes |
| Closed segment | [0, 1] | Yes | Yes |
| Countable set | ℚ ∩ [0,1] | Yes | Yes |
| Cantor set | C | Yes (closed) | Yes |
| Subset of C | Any subset of the Cantor set | Not necessarily | Yes (measure 0) |
| Vitali set | V | No | No |
In probability theory, $B(\mathbb{R})$ is the foundation of everything. When one says 'the random variable $X$ takes values in $B$', one means a Borel set $B$. This ensures $\mathbb{P}(X \in B)$ is well-defined. A Gaussian Process in ML is a probability measure on a function space - and Borel sigma-algebras are needed at every step.
The Borel σ-algebra B(ℝ) is:
Generating
$B(\mathbb{R})$ contains uncountably many sets - listing them all is impossible. But specifying a σ-algebra via a small 'seed' is entirely possible. Such a seed is called a **generating class**: a compact family of sets from which the σ-algebra is uniquely reconstructed.
**Generated σ-algebra** σ(C) for a family of sets C is the smallest σ-algebra containing C. It exists and is unique (as the intersection of all σ-algebras containing C). The family C is called a **generating class**.
Key result: $B(\mathbb{R})$ is generated by many different classes. Specifying all open sets is unnecessary - half-lines $(-\infty, a]$ suffice. This is not a coincidence: the CDF $F(a) = \mathbb{P}(X \leq a)$ is exactly the measure evaluated on this generating class.
Why does this matter? To prove that a measure has property P on all of $B(\mathbb{R})$, it suffices to check P on a generating class - under standard conditions. Hundreds of pages of proofs in probability theory use exactly this trick. The measure uniqueness extension theorem is a direct consequence.
In higher dimensions, $B(\mathbb{R}^n)$ is generated by rectangles $(a_1,b_1) \times \ldots \times (a_n, b_n)$. This is the foundation for joint distributions of random variables - and for joint distributions over neural network weights in Bayesian ML. Sigma-algebras answer: 'what can be measured.' Generating classes answer: 'how to describe this compactly.'
All subsets of ℝ are measurable - the σ-algebra is just mathematical overhead
There exist Lebesgue non-measurable subsets of ℝ. The sigma-algebra is not overhead - it is a necessary restriction for measure consistency.
Vitali (1905): if a measure on $\mathbb{R}$ is translation-invariant and σ-additive, it cannot be defined on all subsets. Without a σ-algebra, one cannot write $\mathbb{E}[L(\theta)]$ as an integral over data - the entire idea of an expected loss collapses. This is not abstraction - it is the foundation of ML.
Which of the generating classes does NOT generate the Borel σ-algebra B(ℝ)?
Key Ideas
- **Sigma-algebra** F - a family of subsets closed under complements and countable unions. Three axioms determine which sets can be measured without contradiction
- **Measurable set** - an element of the σ-algebra. Not all subsets of $\mathbb{R}$ are measurable: the Vitali set (1905) proves that measuring everything leads to a contradiction with additivity
- **$B(\mathbb{R})$** - the Borel σ-algebra generated by open sets. Contains all 'ordinary' sets; strictly smaller than the σ-algebra of Lebesgue measurable sets
- **Generating class** - a compact description of a σ-algebra: B(ℝ) = σ({(-∞, a] : a ∈ ℝ}). The CDF of a distribution is exactly the measure evaluated on this generating class
Related Topics
Sigma-algebras are the foundation for the following topics:
- Lebesgue Measure — A specific measure defined on the σ-algebra of Lebesgue measurable sets
- Measurable Functions — Functions compatible with σ-algebras - the preimage of a Borel set is measurable
Вопросы для размышления
- Why does the definition of a σ-algebra use countable (rather than finite or arbitrary) unions?
- If the axiom of choice were false, would non-measurable sets exist?
- How is the Borel σ-algebra related to the concept of 'information' in probability theory?