Measure Theory
Abstract Integration Theory
The Riemann integral cannot integrate the indicator function of the rationals $\mathbf{1}_{\mathbb{Q}}$ - it is discontinuous everywhere. The Lebesgue integral answers immediately: $\int \mathbf{1}_{\mathbb{Q}} \, d\mu = \mu(\mathbb{Q} \cap [0,1]) = 0$. This is not a technical trick but a fundamental extension that makes modern probability theory and signal processing possible.
- $L^2$ spaces: functions in $L^2$ form a Hilbert space - the foundation of quantum mechanics, signal processing, and neural networks with weight-norm regularization.
- The Dominated Convergence Theorem (DCT) justifies differentiating under the integral sign when computing gradients of expectations in REINFORCE and other policy gradient algorithms.
- Holder's inequality $\|fg\|_1 \leq \|f\|_p \|g\|_q$ underlies generalization bounds in statistical learning theory and information-theoretic inequalities.
Цели урока
- Construct the Lebesgue integral via simple functions and explain its advantages over the Riemann integral
- Apply the Dominated Convergence Theorem to exchange limits and integrals
- Prove Holder's inequality and use it to analyze $L^p$ spaces
Предварительные знания
- Measure theory: sigma-algebras and measures
- Measurable functions and their properties
- Riemann integral and its limitations
Construction of the Lebesgue integral
The integral is built in three steps. Step 1: for a simple function $\phi = \sum_{k=1}^n c_k \mathbf{1}_{A_k}$ (measurable $A_k$, $c_k \geq 0$) define $\int \phi \, d\mu = \sum c_k \mu(A_k)$ (with convention $0 \cdot \infty = 0$). Step 2: for a nonneg measurable $f$, take the supremum over simple functions $\phi \leq f$. Step 3: general case via $f = f^+ - f^-$, where $f^+ = \max(f,0)$, $f^- = \max(-f,0)$.
Three key convergence theorems: (1) Monotone Convergence (MCT): $0 \leq f_n \nearrow f$ a.e. implies $\int f_n \to \int f$. (2) Fatou's Lemma: $\int \liminf f_n \leq \liminf \int f_n$. (3) Dominated Convergence (DCT): $f_n \to f$ a.e. and $|f_n| \leq g \in L^1$ implies $\int f_n \to \int f$. DCT is the most powerful: it requires an integrable dominating function.
$L^p$ spaces and Holder's inequality
For $p \geq 1$, the space $L^p(\mu) = \{f : \|f\|_p = (\int |f|^p \, d\mu)^{1/p} < \infty\}$. At $p=2$ this is a Hilbert space with inner product $\langle f, g \rangle = \int fg \, d\mu$. Holder's inequality: $\int |fg| \, d\mu \leq \|f\|_p \|g\|_q$ for $1/p + 1/q = 1$. At $p=q=2$ this is the Cauchy-Schwarz inequality. Proof: Young's inequality $ab \leq a^p/p + b^q/q$ applied pointwise then integrated.
Elements of $L^p$ are equivalence classes of functions agreeing almost everywhere, not individual functions. Changing a function on a set of measure zero does not change its $L^p$ norm. This matters: the question 'value of $f$ at point $x$' is not well-posed for an element of $L^p$ without additional regularity assumptions (such as continuity).
Henri Lebesgue published his integral in 1902 at age 27 in his doctoral thesis '
Henri Lebesgue published his integral in 1902 at age 27 in his doctoral thesis 'Integral, Length, Area'. Simultaneously, Emile Borel was developing measure theory. The priority dispute between Lebesgue and Borel lasted years. Frigyes Riesz systematized $L^p$ spaces in 1910. Lebesgue's convergence theorems rigorously justified Fourier series expansions - the main analytical tool for engineers of that era.
The Lebesgue Integral and Convergence Theorems
Henri Lebesgue's 1902 construction handles 1,700,000 pathological functions that Riemann integration cannot. Every modern probability textbook and every quantum mechanics course depends on this framework.
Why does the DCT require the dominator g to be in L¹(μ)?
Lp Spaces and the Holder Inequality
Stefan Banach's 1932 monograph proved L²([0,1]) is a complete Hilbert space with uncountably many dimensions. The Lp scale unifies functional analysis, harmonic analysis, and probability in one framework.
When does the embedding L²([0,1]) ⊂ L¹([0,1]) hold?
DCT in policy gradient computations
In REINFORCE we need $\nabla_\theta \mathbb{E}_{\pi_\theta}[R] = \mathbb{E}[R \nabla_\theta \log \pi_\theta]$. Formally: $\nabla_\theta \int R(\tau)\pi_\theta(\tau)\,d\tau = \int R(\tau)\nabla_\theta \pi_\theta(\tau)\,d\tau$. DCT justifies moving $\nabla_\theta$ inside: we need a dominating function $|R(\tau)\nabla_\theta \log \pi_\theta| \leq g(\tau)$ with $\int g \, d\tau < \infty$ - satisfied under bounded rewards and smooth policies.
Итоги
- The Lebesgue integral is constructed through simple functions and extends the Riemann integral to all measurable functions.
- DCT justifies moving limits inside integrals when an integrable dominating function exists.
- $L^p$ spaces are function spaces with a norm; Holder's inequality connects $L^p$ and $L^q$ norms of conjugate exponents.
Connections to other topics
In probability theory, the Lebesgue integral is expectation: $\mathbb{E}[X] = \int X \, dP$. The Radon-Nikodym theorem (mt-28) extends this: probability density is the Radon-Nikodym derivative. In functional analysis, the dual of $L^p$ is $L^q$ - a consequence of Holder's inequality.
- Mt 28 — related
Вопросы для размышления
- The Dirichlet function $\mathbf{1}_{\mathbb{Q}}$ is not Riemann integrable, but $\int \mathbf{1}_{\mathbb{Q}} \, d\lambda = 0$ under Lebesgue (rationals are countable, hence measure zero). How does this example illustrate the difference between partitioning the $x$-axis (Riemann) versus partitioning the $y$-axis (Lebesgue)?