Measure Theory

Sigma-Algebras and Measurable Sets

The probability that a random number from $[0,1]$ is rational equals zero - even though there are infinitely many rationals. Lebesgue in 1902 broke Riemann's intuition with one counterexample: the Riemann integral cannot compute the length of the set of rationals. The Lebesgue integral can - and the answer is zero. Infinitely many points, measure zero. This is measure theory: the language in which modern probability, the expectation of a neural network's loss function, and Gaussian processes are all written. Without sigma-algebras, probability collapses into contradictions.

**Neural networks and expectation:** the loss of a neural network is $\mathbb{E}[L(\theta)] = \int L(\theta, x) \, d\mu(x)$ - a Lebesgue integral over a measure on the data. Not Riemann, because data can have singular structure.
**Kolmogorov's probability:** the σ-algebra of events defines which questions about a random experiment are well-posed. Gaussian Processes in ML are probability measures on infinite-dimensional function spaces - measure theory all the way down.
**Financial mathematics:** filtrations (growing σ-algebras) model the arrival of information to the market - what a trader knows at time $t$. Diffusion models (DDPM, Stable Diffusion) are stochastic differential equations built on exactly this foundation.
**Quantum mechanics:** operator algebras (a generalization of σ-algebras) describe observables - quantities that can physically be measured. Quantum measurement probabilities are measures on σ-algebras.

Sigma Algebra

The goal: assign a 'length' to subsets of the real line, so that the length of $[a,b]$ is $b-a$ and there are no contradictions. The naive idea: measure everything. Vitali in 1905 showed this is impossible - there are subsets of $\mathbb{R}$ for which any assignment of length leads to a contradiction. A sigma-algebra is the answer: a collection of subsets for which measurement is internally consistent.

**Sigma-algebra** (σ-algebra) F over a set X is a family of subsets of X satisfying three axioms: 1. **Empty set:** ∅ ∈ F 2. **Closure under complements:** if A ∈ F, then X \ A ∈ F 3. **Closure under countable unions:** if A₁, A₂, A₃, ... ∈ F, then ∪ᵢ Aᵢ ∈ F

From the three axioms, more follows immediately: $X \in F$ (since $X = X \setminus \emptyset$, and $\emptyset \in F$ with F closed under complements). And F is closed under **countable intersections** - by De Morgan: $\bigcap A_i = \left(\bigcup A_i^c\right)^c$. The structure is self-contained: three axioms are enough to derive everything.

Émile Borel and the Birth of the Theory

At the beginning of the 20th century, Émile Borel and Henri Lebesgue faced a problem: the Riemann integral could not handle many important functions. Borel introduced the concept of 'measurable sets' in 1898, and Lebesgue built a complete measure theory in 1902. Sigma-algebras became the formal foundation on which all modern probability theory and integration rests.

Why **countable** unions specifically? Allowing uncountable unions would force the sigma-algebra to collapse into all subsets in many cases - making it impossible to exclude the Vitali set. Countability is exactly the boundary that preserves both flexibility and consistency.

Which of the following families is NOT a σ-algebra over X = {1, 2, 3}?

Measurable

Given a sigma-algebra $F$, the definition is immediate: a set $A$ is **measurable** if $A \in F$. The pair $(X, F)$ is a **measurable space**. Add a measure $\mu: F \to [0, \infty]$ and get a **measure space** $(X, F, \mu)$. Kolmogorov's probability space is exactly $(\Omega, \mathcal{F}, \mathbb{P})$.

**Measurable space** - a pair (X, F), where X is a set and F is a σ-algebra over X. Elements of F are called **measurable sets**. If a measure μ: F → [0, ∞] is also given, the triple (X, F, μ) is called a **measure space**.

Key point: **not all subsets** of $X$ need to be measurable. This is not a limitation - it is a necessity. Giuseppe Vitali in 1905 proved: if length is assigned to every subset of $\mathbb{R}$ in a translation-invariant, sigma-additive way, an irresolvable contradiction arises.

**The Vitali set** is constructed as follows: partition [0, 1] into equivalence classes by x ~ y ⟺ x - y ∈ ℚ. By the axiom of choice, pick one representative from each class. The resulting set V cannot be consistently given a length: if λ(V) = 0, then λ(ℝ) = 0 (contradiction); if λ(V) > 0, then λ([0,2]) = ∞ (contradiction too).

Property	Measurable set	Non-measurable set
Belongs to F?	Yes	No
Can be assigned a measure?	Yes	No (contradiction)
Example in ℝ	Any interval, open/closed set	Vitali set
Requires axiom of choice?	No	Yes (for construction)

In practice, non-measurable sets are exotic - they require the axiom of choice to construct. Every set encountered in analysis, physics, and ML is measurable. The sigma-algebra is not a formality - it is the exact boundary beyond which the whole idea of integration falls apart.

Why are not all subsets of ℝ Lebesgue measurable?

Borel

In practice one works with a specific sigma-algebra on $\mathbb{R}$ - the **Borel** σ-algebra $B(\mathbb{R})$. This is the smallest σ-algebra containing all open sets. Borel introduced it in 1898 - four years before Lebesgue built his theory of integration on top of it. Two people, one revolution.

**Borel σ-algebra** B(ℝ) = σ(τ), where τ is the topology on ℝ (the collection of all open sets). This means: B(ℝ) is the smallest σ-algebra containing all open subsets of ℝ.

What is in $B(\mathbb{R})$? Start with open sets, apply the σ-algebra axioms: complements of open sets are closed; countable unions of closed sets are $F_\sigma$; their complements are $G_\delta$; and so on. The hierarchy goes deep - but covers every set encountered in analysis, probability, and ML.

$B(\mathbb{R})$ is a **strict subset** of the σ-algebra of Lebesgue measurable sets. Every Borel set is Lebesgue measurable, but not the other way around - Lebesgue adds subsets of zero-measure Borel sets. For ML practice, this distinction is irrelevant: everything encountered is Borel.

Type of set	Example	Borel?	Lebesgue measurable?
Open interval	(0, 1)	Yes	Yes
Closed segment	[0, 1]	Yes	Yes
Countable set	ℚ ∩ [0,1]	Yes	Yes
Cantor set	C	Yes (closed)	Yes
Subset of C	Any subset of the Cantor set	Not necessarily	Yes (measure 0)
Vitali set	V	No	No

In probability theory, $B(\mathbb{R})$ is the foundation of everything. When one says 'the random variable $X$ takes values in $B$', one means a Borel set $B$. This ensures $\mathbb{P}(X \in B)$ is well-defined. A Gaussian Process in ML is a probability measure on a function space - and Borel sigma-algebras are needed at every step.

The Borel σ-algebra B(ℝ) is:

Generating

$B(\mathbb{R})$ contains uncountably many sets - listing them all is impossible. But specifying a σ-algebra via a small 'seed' is entirely possible. Such a seed is called a **generating class**: a compact family of sets from which the σ-algebra is uniquely reconstructed.

**Generated σ-algebra** σ(C) for a family of sets C is the smallest σ-algebra containing C. It exists and is unique (as the intersection of all σ-algebras containing C). The family C is called a **generating class**.

Key result: $B(\mathbb{R})$ is generated by many different classes. Specifying all open sets is unnecessary - half-lines $(-\infty, a]$ suffice. This is not a coincidence: the CDF $F(a) = \mathbb{P}(X \leq a)$ is exactly the measure evaluated on this generating class.

Why does this matter? To prove that a measure has property P on all of $B(\mathbb{R})$, it suffices to check P on a generating class - under standard conditions. Hundreds of pages of proofs in probability theory use exactly this trick. The measure uniqueness extension theorem is a direct consequence.

In higher dimensions, $B(\mathbb{R}^n)$ is generated by rectangles $(a_1,b_1) \times \ldots \times (a_n, b_n)$. This is the foundation for joint distributions of random variables - and for joint distributions over neural network weights in Bayesian ML. Sigma-algebras answer: 'what can be measured.' Generating classes answer: 'how to describe this compactly.'

All subsets of ℝ are measurable - the σ-algebra is just mathematical overhead

There exist Lebesgue non-measurable subsets of ℝ. The sigma-algebra is not overhead - it is a necessary restriction for measure consistency.

Vitali (1905): if a measure on $\mathbb{R}$ is translation-invariant and σ-additive, it cannot be defined on all subsets. Without a σ-algebra, one cannot write $\mathbb{E}[L(\theta)]$ as an integral over data - the entire idea of an expected loss collapses. This is not abstraction - it is the foundation of ML.

Which of the generating classes does NOT generate the Borel σ-algebra B(ℝ)?

Key Ideas

**Sigma-algebra** F - a family of subsets closed under complements and countable unions. Three axioms determine which sets can be measured without contradiction
**Measurable set** - an element of the σ-algebra. Not all subsets of $\mathbb{R}$ are measurable: the Vitali set (1905) proves that measuring everything leads to a contradiction with additivity
**$B(\mathbb{R})$** - the Borel σ-algebra generated by open sets. Contains all 'ordinary' sets; strictly smaller than the σ-algebra of Lebesgue measurable sets
**Generating class** - a compact description of a σ-algebra: B(ℝ) = σ({(-∞, a] : a ∈ ℝ}). The CDF of a distribution is exactly the measure evaluated on this generating class

Вопросы для размышления

Why does the definition of a σ-algebra use countable (rather than finite or arbitrary) unions?
If the axiom of choice were false, would non-measurable sets exist?
How is the Borel σ-algebra related to the concept of 'information' in probability theory?