Measure Theory

Ergodic Theory

MCMC (Markov Chain Monte Carlo) works precisely because of the ergodic theorem: a Markov chain is a measure-preserving ergodic map, and the time average along the trajectory equals the integral over the target distribution. All of Bayesian statistics rests on this fact.

MCMC: Metropolis-Hastings, NUTS - the ergodic theorem guarantees convergence
Statistical physics: the ergodic hypothesis is the foundation of thermodynamics
Equidistribution of {nα}: Weyl's theorem as a consequence of the ergodic theorem
Shannon's source theorem: code length ≈ entropy (consequence of the ergodic theorem)

Measure-Preserving Maps

Ergodic theory underpins Netflix shuffle algorithms: a proper shuffle must preserve the uniform measure , a measure-preserving map. A map T: (X, 𝒜, μ) → (X, 𝒜, μ) is called **measure-preserving** if μ(T⁻¹(E)) = μ(E) for all E ∈ 𝒜 (equivalently: μ ∘ T⁻¹ = μ). Intuitively: T 'shuffles' the points of the space without changing the measure of sets. Examples: rotation of the circle, torus shift, expanding maps.

**Examples of measure-preserving maps:** 1. **Circle rotation:** T(x) = x + α (mod 1) on [0,1) with Lebesgue measure. Every rotation is measure-preserving. 2. **Bernoulli shift:** T on {0,1}^ℤ (infinite binary sequences), left shift. This is a model of 'coin flips in time'. 3. **Doubling map:** T(x) = 2x (mod 1) on [0,1). Measure-preserving, but non-invertible! 4. **Hamiltonian flow:** in mechanics - flow along a Hamiltonian vector field. Liouville's theorem: phase space volume is preserved. 5. **Gauss map:** T(x) = {1/x} (fractional part of 1/x) - measure-preserving for the Gauss measure dx/(1+x) on [0,1). Connection with continued fractions!

A map T preserves the measure μ. What does this mean formally?

Ergodicity: Only Directly Invariant Sets

A measure-preserving map T is called **ergodic** if T⁻¹(E) = E ⟹ μ(E) ∈ {0, μ(X)} (invariant sets are trivial: measure 0 or full measure). Intuition: the system 'mixes' so thoroughly that no nontrivial part of the space is closed under T.

**Ergodicity: examples and counterexamples** **Ergodic:** - Circle rotation by an irrational angle (α ∉ ℚ): the orbit {nα mod 1} is dense in [0,1) - Doubling map T(x) = 2x mod 1 on [0,1) - Bernoulli shift on {0,1}^ℤ **NOT ergodic:** - Circle rotation by a rational angle α = p/q ∈ ℚ: each orbit is finite (q points), many invariant sets - T = identity map: every point is an invariant set **Practical characterization:** T is ergodic ↔ every T-invariant function f is constant (a.e.): f∘T = f ⟹ f = const μ-a.e.

Why is circle rotation by α = 1/3 not ergodic?

Birkhoff's Theorem: Time = Space

**Birkhoff's Ergodic Theorem (1931):** If T is a measure-preserving map and f ∈ L¹(μ), then the time average converges a.e. and in L¹: lim_{N→∞} (1/N) Σ_{n=0}^{N-1} f(Tⁿx) = E[f|𝒥], where 𝒥 is the σ-algebra of T-invariant sets. **If T is ergodic,** then E[f|𝒥] = ∫f dμ = const, i.e., **time average = space average**.

**The meaning of Birkhoff's theorem:** Time average: ⟨f⟩_time = lim_{N→∞} (f(x) + f(Tx) + ... + f(T^{N-1}x)) / N Space average: ⟨f⟩_space = ∫_X f dμ For ergodic T: ⟨f⟩_time = ⟨f⟩_space (for a.e. starting point x) **Applications:** - **Statistical physics:** The ergodic hypothesis is the foundation of thermodynamics. Time average = ensemble average. - **Number theory:** Weyl's theorem: (1/N)Σ e^{2πi·nα·k} → 0 for irrational α (equidistribution) - **Data compression:** Shannon-McMillan-Breiman theorem: optimal code length ≈ source entropy - **ML:** Markov chains: MCMC converges to the stationary distribution by the ergodic theorem

What does Birkhoff's ergodic theorem state for ergodic T?

Key Ideas

Measure-preserving T: μ(T⁻¹(E)) = μ(E) - the measure is invariant under T
Ergodicity: T⁻¹(E) = E ⟹ μ(E) = 0 or μ(X) (only trivial invariants)
Irrational rotation - ergodic; rational rotation - not (finite orbits)
Birkhoff's theorem: (1/N)Σf(Tⁿx) → ∫f dμ (a.e.) when T is ergodic
Time average = space average (ergodic hypothesis in physics)
MCMC = ergodic sampling: time averages → integrals

Вопросы для размышления

What is the difference between the ergodic hypothesis in physics and Birkhoff's mathematical theorem?
Why do MCMC algorithms require ergodicity of the Markov chain? What happens when it fails?
How is Birkhoff's theorem related to the Strong Law of Large Numbers for i.i.d. random variables?

Связанные уроки

prob-05-independence

Measure-Preserving Maps

A map T preserves the measure μ. What does this mean formally?

Ergodicity: Only Directly Invariant Sets

Why is circle rotation by α = 1/3 not ergodic?

Birkhoff's Theorem: Time = Space

What does Birkhoff's ergodic theorem state for ergodic T?

Key Ideas

Measure-preserving T: μ(T⁻¹(E)) = μ(E) - the measure is invariant under T

Ergodicity: T⁻¹(E) = E ⟹ μ(E) = 0 or μ(X) (only trivial invariants)

Irrational rotation - ergodic; rational rotation - not (finite orbits)

Birkhoff's theorem: (1/N)Σf(Tⁿx) → ∫f dμ (a.e.) when T is ergodic

Time average = space average (ergodic hypothesis in physics)

MCMC = ergodic sampling: time averages → integrals

Ergodic Theory

Measure-Preserving Maps

Ergodicity: Only Directly Invariant Sets

Birkhoff's Theorem: Time = Space

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки

Ergodic Theory

Measure-Preserving Maps

Ergodicity: Only Directly Invariant Sets

Birkhoff's Theorem: Time = Space

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки