Functional Analysis
L² and Lebesgue Spaces: Infinite-Dimensional Geometry of Signals
L²[0,1] is a Hilbert space of infinite dimension. The Fourier decomposition is an orthogonal basis in L². Every MP3 file, every quantum mechanical wave function, every kernel in SVM - all are points in L². Understanding L² means understanding the mathematics behind half of modern ML.
- **MP3/AAC:** discrete cosine transform (DCT) is a variant of FFT. Fourier series + Parseval identity: store only significant coefficients. 10:1 compression with nearly inaudible loss.
- **Gaussian Processes (sklearn, GPyTorch):** kernel = inner product in RKHS (infinite-dimensional L²). Kernel ridge regression = projection in RKHS.
- **Quantum computing:** wave function ψ ∈ L²(R³), ‖ψ‖₂ = 1. All of quantum mechanics is linear algebra in L².
L² - an Infinite-Dimensional Hilbert Space
L²[0,1] is a Hilbert space of **infinite dimension**. The Fourier decomposition is an orthogonal basis in this space. Every signal, every quantum mechanical wave function, every kernel in machine learning - all are points in L².
Start with Lp spaces: for p ≥ 1, the space Lp(Ω) is the completion of measurable functions under the p-norm ‖f‖_p. L² is special: at p = 2 the norm comes from an inner product, and L² becomes a Hilbert space.
**Lp space**: L^p(Ω) = {f: Ω → R | ‖f‖_p < ∞} - measurable functions with finite p-norm; equivalence classes coinciding a.e. **Lp norm**: ‖f‖_p = (∫_Ω |f(x)|^p dx)^{1/p} - at p=1: sum of moduli; p=2: root mean square; p→∞: ‖f‖_∞ = ess sup|f(x)|. **L² space**: L²(Ω) with ‖f‖_2 = √(∫_Ω |f(x)|² dx) - the only Lp space with Euclidean geometry (orthogonality, projections, bases). **Inner product in L²**: ⟨f, g⟩ = ∫_Ω f(x)g(x) dx - generates the norm via ‖f‖₂ = √⟨f,f⟩; f,g orthogonal when ⟨f,g⟩ = 0. **Riesz-Fischer theorem**: L² is complete - limits of Cauchy sequences remain in L², distinguishing it from the space of Riemann-integrable functions.
```python import numpy as np # L2 and Lp norms: geometry in function space n = 1000 # discretization of [0,1] x = np.linspace(0, 1, n) dx = 1 / n # Two signals f = np.sin(2 * np.pi * x) # sine g = np.cos(2 * np.pi * x) # cosine # L2 norms norm_f = np.sqrt(np.trapz(f**2, x)) # ~= 0.707 = 1/sqrt(2) norm_g = np.sqrt(np.trapz(g**2, x)) # ~= 0.707 print(f"||sin||_2 = {norm_f:.4f}") print(f"||cos||_2 = {norm_g:.4f}") # Inner product: sin perp cos inner_fg = np.trapz(f * g, x) print(f"<sin, cos> = {inner_fg:.6f}") # ~= 0.0 (orthogonal!) # L1 vs L2 vs Linf norms norm_f_L1 = np.trapz(np.abs(f), x) # ~= 0.637 = 2/pi norm_f_Linf = np.max(np.abs(f)) # = 1.0 print(f"||sin||_1 = {norm_f_L1:.4f}, ||sin||_2 = {norm_f:.4f}, ||sin||_inf = {norm_f_Linf:.4f}") ```
**ML application:** quantum wave functions ψ ∈ L²(R³), normalized so ‖ψ‖₂ = 1. Kernel SVM: the kernel function K(x,y) = ⟨φ(x), φ(y)⟩_H is an inner product in an infinite-dimensional RKHS - an L²-type inner product. Neural network feature maps f: R^n → L².
Why is L² the only Lp space that possesses geometry (a notion of angle between functions)? What is the fundamental difference from L¹?
At p=2 the norm ‖f‖₂ = √⟨f,f⟩ arises from the inner product ⟨f,g⟩ = ∫fg, enabling orthogonality and angle between functions. No other Lp space (p≠2) admits an inner product generating its norm, so the Pythagorean theorem ‖f+g‖₂² = ‖f‖₂² + 2⟨f,g⟩ + ‖g‖₂² holds only in L².
Fourier Series: Orthogonal Basis in L²
The Fourier series is the expansion of a function f ∈ L²[0,2π] in the orthonormal basis {eⁱⁿˣ/√(2π)}. This is not just a convenient trick - it is the unique orthogonal basis of L² from trigonometric functions. FFT (Fast Fourier Transform) is the fast computation algorithm used in every smartphone, MP3 player, and wireless module.
```python import numpy as np # Fourier series: signal approximation N_points = 1024 x = np.linspace(0, 2*np.pi, N_points, endpoint=False) # Square wave (jump at pi) f = np.where(x < np.pi, 1.0, -1.0) # FFT - fast algorithm for Fourier coefficients F = np.fft.fft(f) / N_points # coefficients cn freqs = np.fft.fftfreq(N_points, d=1/N_points) # Reconstruction from N_terms harmonics for N_terms in [5, 10, 50]: mask = np.abs(freqs) <= N_terms F_trunc = np.where(mask, F, 0) f_approx = np.real(np.fft.ifft(F_trunc * N_points)) error = np.sqrt(np.mean((f - f_approx)**2)) # L2 error print(f"N={N_terms} harmonics: L2 error = {error:.4f}") # N=5: 0.3415, N=10: 0.2474, N=50: 0.1164 - slow convergence at jumps (Gibbs) ```
**Gibbs phenomenon:** at discontinuities, the Fourier series does not converge uniformly - a 'ringing' of ~9% of the jump height appears, which does not vanish as N→∞. Solution: wavelet decomposition (another orthogonal L² basis) - used in JPEG-2000 instead of DCT.
Why do the Fourier coefficients of a smooth function f ∈ C∞ decay rapidly (fast cₙ decrease), while those of a function with discontinuities decay slowly?
For f ∈ Cᵏ, integration by parts k times in cₙ = ∫f(x)e⁻ⁱⁿˣ dx transfers derivatives onto eⁱⁿˣ, yielding cₙ = O(1/nᵏ). A discontinuity prevents integration by parts (boundary terms survive), limiting decay to O(1/n). This slow decay explains the Gibbs phenomenon: many high-frequency harmonics are needed to reproduce sharp jumps.
Parseval's Identity: Energy is Conserved
Parseval's identity is the most beautiful equality in signal theory: the total energy of a signal in the time domain equals the sum of squared amplitudes of all harmonics. This is the Pythagorean theorem for infinite-dimensional space.
```python import numpy as np # Parseval's identity: numerical verification N = 512 x = np.linspace(0, 2*np.pi, N, endpoint=False) # Test function: mixture of frequencies f = 3*np.sin(x) + 1.5*np.cos(2*x) + 0.7*np.sin(5*x) # Energy in time domain energy_time = np.trapz(f**2, x) print(f"||f||_2^2 (time domain): {energy_time:.4f}") # Fourier coefficients via FFT C = np.fft.fft(f) / N energy_freq = 2*np.pi * np.sum(np.abs(C)**2) print(f"||f||_2^2 (frequency domain): {energy_freq:.4f}") print(f"Relative error: {abs(energy_time - energy_freq)/energy_time:.2e}") # Error < 1e-12 - Parseval's identity holds exactly! # Analytical check: ||3sin||^2 + ||1.5cos||^2 + ||0.7sin||^2 = pi(9+2.25+0.49) theory = np.pi * (9 + 2.25 + 0.49) print(f"Analytical: {theory:.4f}") ```
**ML application: WaveNet (DeepMind).** Audio generation (Google Assistant, text-to-speech) - neural network in the frequency domain. Parseval's identity guarantees: training in L²-norm on Fourier coefficients is equivalent to training on the time-domain signal - choose whichever representation is convenient.
Parseval's identity states that the Fourier transform preserves the L²-norm. What does this mean for the FFT operator as a linear transformation?
Parseval's identity ‖f‖₂² = 2π·Σ|cₙ|² directly states that the Fourier transform F preserves the L²-norm (up to 1/√(2π) scaling). Unitarity F*F = I means the DFT matrix U satisfies U*U = I - an orthogonal transformation, i.e., a rotation/reflection in Hilbert space that preserves inner products, lengths, and angles.
L² in Machine Learning: RKHS and Kernels
Reproducing Kernel Hilbert Space (RKHS) is a Hilbert space of functions with a reproducing kernel. This is the foundation of kernel methods in ML: SVM, Gaussian Processes, kernel PCA. Mercer's theorem connects kernels to L² spaces.
The kernel K(x,y) = ⟨φ(x), φ(y)⟩_H is the inner product in some Hilbert space H. The kernel trick: work with inner products without computing φ(x) explicitly (which may be infinite-dimensional).
```python import numpy as np # Gaussian kernel as an inner product in L² # K(x,y) = exp(-||x-y||^2 / (2sigma^2)) = <phi(x), phi(y)> in infinite-dim RKHS def gaussian_kernel(X, Y, sigma=1.0): """K(x,y) = exp(-||x-y||^2 / 2*sigma^2)""" X2 = np.sum(X**2, axis=1, keepdims=True) Y2 = np.sum(Y**2, axis=1, keepdims=True) dist2 = X2 + Y2.T - 2 * X @ Y.T return np.exp(-dist2 / (2 * sigma**2)) np.random.seed(42) X_train = np.random.randn(50, 2) # 50 points in R^2 X_test = np.random.randn(10, 2) # Gram matrix - discrete version of a HS operator K_train = gaussian_kernel(X_train, X_train) print(f"Gram matrix: {K_train.shape}, symmetric: {np.allclose(K_train, K_train.T)}") print(f"PSD (all lambda > 0): {np.all(np.linalg.eigvalsh(K_train) > -1e-10)}") # Kernel ridge regression: w* = (K + lambda*I)^{-1} y np.random.seed(0) y = np.sin(X_train[:, 0]) + 0.1*np.random.randn(50) lambda_reg = 0.1 alpha = np.linalg.solve(K_train + lambda_reg * np.eye(50), y) # Prediction on test set K_test = gaussian_kernel(X_test, X_train) y_pred = K_test @ alpha print(f"Predictions (first 3): {y_pred[:3].round(3)}") ```
**Mercer's theorem:** a symmetric continuous PSD kernel K(x,y) decomposes as K(x,y) = Σₙ λₙ φₙ(x)φₙ(y), where φₙ are orthonormal functions (eigenfunctions of the integral operator T_K), λₙ ≥ 0. This L² decomposition bridges kernels and spectral theory.
In Gaussian Process regression, the prediction f*(x) = K(x, X_train)·(K_train + σ²I)^{-1}·y. How does the L² interpretation explain why the prediction is 'smooth' near training points?
The Gaussian RKHS implicitly penalizes all derivatives of f via ‖f‖_H, selecting the smoothest function consistent with training data. Near a training point xᵢ, K(x,xᵢ) = exp(-‖x-xᵢ‖²/2σ²) ≈ 1, so the prediction is strongly correlated with y(xᵢ). Far away, K(x,xᵢ) → 0 and the influence vanishes - smoothness is enforced by the RKHS geometry, not just regularization.
Key Ideas
- **Lp(Ω)**: completion under norm ‖f‖_p = (∫|f|^p)^{1/p}. L² is special: norm from inner product ⟨f,g⟩ = ∫fg
- **Riesz-Fischer theorem**: L² is complete - limits of Cauchy sequences remain in L²
- **Fourier series**: f = Σcₙeⁱⁿˣ, cₙ = ⟨f, eⁱⁿˣ⟩. Orthogonal basis {eⁱⁿˣ/√(2π)} in L²[0,2π]
- **Parseval's identity**: ‖f‖₂² = 2π·Σ|cₙ|² - energy is preserved under Fourier transform
- **RKHS**: K(x,y) = ⟨φ(x),φ(y)⟩_H - kernel trick as inner product in (infinite-dimensional) L²-space
- **Mercer's theorem**: K(x,y) = Σλₙφₙ(x)φₙ(y) - spectral decomposition of the kernel via L² basis
Related Topics
L² spaces are the foundation of spectral theory and kernel methods:
- Spectral Theory — The Laplace operator -Δ is diagonalized in L² via the Fourier basis: spectral theorem = generalized Fourier analysis
- Compact Operators — Integral operators with kernel K ∈ L²(Ω×Ω) are Hilbert-Schmidt operators; Mercer's theorem follows from their spectrum
Вопросы для размышления
- The Fourier transform is a unitary operator L² → l². What does this mean for numerical analysis: why is it better to process a signal in the frequency domain rather than the time domain?
- Kernel SVM in RKHS with a Gaussian kernel finds a separating hyperplane in infinite-dimensional space. Why does this not lead to overfitting with proper regularization?
- Parseval's identity says: ‖f‖₂² = Σ|cₙ|². How is this used in signal compression (MP3, JPEG) to decide which coefficients can be discarded?