Functional Analysis
Spectral Theory of Operators
Google PageRank = dominant eigenvector of a 50 billion × 50 billion matrix. Found in 50 power iteration steps - spectral theory in action. PCA = spectral theorem for the covariance matrix. Quantum mechanics = physics of self-adjoint operators. Three different fields, one mathematics.
- **Google PageRank:** power iteration on a graph with 50 billion vertices. Spectral gap λ₂/λ₁ ≈ 0.85 (damping factor) - convergence speed. ~50 iterations.
- **PCA and kernel PCA:** spectral theorem for the symmetric covariance matrix. Kernel PCA (sklearn) - spectral theorem for the integral operator with kernel K.
- **Quantum chemistry:** DFT (Density Functional Theory) - finding eigenfunctions of the Hamiltonian H = -∇² + V(x). All chemistry of bonds through the spectrum of H.
Spectrum: Three Types, One PageRank
Google PageRank is the dominant eigenvector of a 50 billion × 50 billion matrix. Found via power iteration - spectral theory in action. Understanding this requires just one concept: what is the spectrum of an operator.
For a bounded operator A: X → X on a Banach space: the **resolvent set** ρ(A) = {λ ∈ C : (λI - A)⁻¹ exists and is bounded}. The **spectrum** σ(A) = C \ ρ(A). In finite dimensions: σ(A) = set of eigenvalues. In infinite dimensions - three distinct components.
**Spectrum of an operator**: σ(A) = C \ ρ(A), ρ(A) = {λ: (A−λI)⁻¹ ∈ B(X)} - closed subset of C; for bounded A: σ(A) ⊆ {λ: |λ| ≤ ‖A‖}; spectral radius r(A) = lim ‖Aⁿ‖^{1/n}. **Point (eigenvalue) spectrum**: σ_p(A) = {λ: ker(A−λI) ≠ 0} - eigenvalues; in finite dimensions σ(A) = σ_p(A). **Continuous spectrum**: σ_c(A) = {λ: ker=0, Im(A−λ) dense, (A−λ)⁻¹ unbounded} - example: multiplication operator M_x on L²[0,1] has σ(M_x) = [0,1]. **Residual spectrum**: σ_r(A) = {λ: ker=0, Im(A−λ) not dense} - for self-adjoint operators σ_r = ∅; characteristic of non-self-adjoint operators. **Resolvent operator**: R(λ) = (A−λI)⁻¹, λ ∈ ρ(A) - analytic B(X)-valued function; Hilbert identity: R(λ)−R(μ) = (λ−μ)R(λ)R(μ); ‖R(λ)‖ → ∞ as λ approaches σ(A).
```python import numpy as np # Three spectrum types: illustrated # 1. 4x4 matrix: only point spectrum A = np.array([[2, 1, 0, 0], [0, 3, 1, 0], [0, 0, 3, 0], [0, 0, 0, 5]], dtype=float) evals = np.linalg.eigvals(A) print(f"4x4 matrix, sigma(A) = sigma_p(A) = {sorted(np.real(evals))}") # 2. Multiplication operator M_x on discretized L^2[0,1] # M_x f(x) = x*f(x): sigma(M_x) = [0,1] - continuous spectrum N = 100 x_grid = np.linspace(0, 1, N) M_x = np.diag(x_grid) # discretization of multiplication operator evals_Mx = np.linalg.eigvalsh(M_x) print(f"Multiplication operator M_x: min lambda = {evals_Mx[0]:.3f}, max lambda = {evals_Mx[-1]:.3f}") print(f"sigma(M_x) ~= [{evals_Mx[0]:.3f}, {evals_Mx[-1]:.3f}] - continuous spectrum") # 3. Discretization of -d^2/dx^2 on [0,pi]: sigma = {n^2 : n=1,2,...} h = np.pi / (N + 1) diag_main = 2 * np.ones(N) / h**2 diag_off = -np.ones(N-1) / h**2 L = np.diag(diag_main) + np.diag(diag_off, 1) + np.diag(diag_off, -1) evals_L = np.sort(np.linalg.eigvalsh(L)) print(f"\n-d^2/dx^2 on [0,pi]: first 5 lambda = {np.round(evals_L[:5], 3)}") print(f"Expected n^2: {[1, 4, 9, 16, 25]}") ```
**PCA as spectral theory:** the covariance matrix Σ = (1/n)·X^T·X is self-adjoint (symmetric). Its spectrum σ(Σ) - variances along principal components. Eigenvectors - directions of maximum variation. Spectral theory = PCA in infinite dimensions.
The multiplication operator M_x f(x) = x·f(x) on L²[0,1] has σ(M_x) = [0,1], but no eigenvalues. Explain: how can λ ∈ [0,1] be in the spectrum if (M_x - λI)f = (x-λ)f ≠ 0 for any f ≠ 0?
(M_x − λI)f = (x−λ)f(x) = 0 a.e. only if f = 0, so the operator is injective - λ is not an eigenvalue. However, the inverse [(M_x − λI)⁻¹g](x) = g(x)/(x−λ) is unbounded: functions g concentrated near x=λ map to arbitrarily large images. This is the continuous spectrum: injectivity holds, bounded inverse does not - impossible in finite dimensions.
Self-Adjoint Operators: Real Spectrum
The Hamiltonian in quantum mechanics must be self-adjoint - otherwise measurable energies would be complex. The covariance matrix in PCA is symmetric - otherwise principal components would not be orthogonal. Self-adjointness is the fundamental condition for physical correctness.
```python import numpy as np # Self-adjoint vs non-self-adjoint: spectrum comparison np.random.seed(42) B = np.random.randn(6, 6) A_sa = (B + B.T) / 2 # self-adjoint (symmetric) A_non = B # not self-adjoint ev_sa = np.linalg.eigvals(A_sa) ev_non = np.linalg.eigvals(A_non) print("Self-adjoint: imaginary parts of eigenvalues:") print(f" max |Im(lambda)| = {np.max(np.abs(np.imag(ev_sa))):.2e}") # ~= 0 print("Non-self-adjoint: imaginary parts:") print(f" max |Im(lambda)| = {np.max(np.abs(np.imag(ev_non))):.4f}") # > 0 # PCA: covariance matrix is self-adjoint np.random.seed(1) X = np.random.randn(100, 4) # 100 points in R^4 X[:, 1] *= 3 # second component more variable Sigma = X.T @ X / 100 # covariance matrix ev_pca, evec_pca = np.linalg.eigh(Sigma) # for symmetric matrices print(f"\nPCA eigenvalues (variances): {np.round(ev_pca, 3)}") print(f"Orthogonality of PCs: V^T V = I? error={np.max(np.abs(evec_pca.T @ evec_pca - np.eye(4))):.2e}") ```
**For self-adjoint A:** ‖A‖ = r(A) (operator norm = spectral radius). This makes numerical computations stable. For non-symmetric matrices ‖A‖ >> r(A) is possible - a source of numerical instability.
The covariance matrix Σ = X^T X / n in PCA is symmetric and positive semi-definite. How do these two properties together guarantee that PCA components exist and are orthogonal?
Symmetry (Σ = Σᵀ, i.e., self-adjoint) guarantees: all eigenvalues are real and eigenvectors for distinct eigenvalues are orthogonal - the spectral theorem yields Σ = VΛVᵀ with orthogonal V. Positive semi-definiteness (xᵀΣx ≥ 0) ensures λᵢ ≥ 0, so principal component variances are non-negative. Together: PCA exists, is orthogonal, and gives real non-negative explained variances.
The Spectral Theorem: Diagonalization in Infinite Dimensions
The spectral theorem is the central result of operator theory. Every bounded self-adjoint operator is unitarily equivalent to a multiplication operator. This is infinite-dimensional diagonalization. The Fourier transform is the spectral theorem for -d²/dx².
```python import numpy as np from scipy import linalg # Spectral theorem: functions of operators # f(A) = U * diag(f(lambda_i)) * U^T np.random.seed(42) B = np.random.randn(5, 5) A = B + B.T # self-adjoint (symmetric) # Spectral decomposition: A = U Lambda U^T lambdas, U = np.linalg.eigh(A) # Matrix functions via spectral decomposition def matrix_func(A_evals, U, f): return U @ np.diag(f(A_evals)) @ U.T # Matrix exponential: exp(A) = U diag(e^lambda_i) U^T exp_A_spectral = matrix_func(lambdas, U, np.exp) exp_A_scipy = linalg.expm(A) print(f"exp(A) via spectrum vs scipy: error = {np.max(np.abs(exp_A_spectral - exp_A_scipy)):.2e}") # Power iteration for PageRank: finding the dominant eigenvector np.random.seed(0) A_pos = np.abs(np.random.randn(6, 6)) A_pos = A_pos / A_pos.sum(axis=0) # columns sum to 1 (Markov matrix) v = np.ones(6) / 6 for i in range(50): v_new = A_pos @ v v_new /= v_new.sum() if np.linalg.norm(v_new - v) < 1e-10: print(f"Power iteration converged in {i} iterations") break v = v_new print(f"PageRank: {np.round(v, 4)}") ```
**Fourier = spectral theorem for -d²/dx².** The operator -d²/dx² on L²(R) is self-adjoint. The Fourier transform U = F 'diagonalizes' it: F(-d²/dx²)F^{-1} = M_{|ξ|²} (multiplication operator by |ξ|²). Fourier transforming a differential operator gives multiplication by its symbol - the foundation of all PDE theory.
Why is power iteration used to compute the largest eigenvalue of a large sparse matrix rather than direct diagonalization?
Direct diagonalization (e.g., np.linalg.eigh) costs O(n³) operations and O(n²) memory - infeasible for 10⁹×10⁹ matrices like the web graph. Power iteration only multiplies the sparse matrix by a vector (O(nnz) per step) and stores just the current vector O(n). It converges to the dominant eigenvector in O(1/spectral_gap) steps. Google uses ~50 iterations for PageRank on billions of nodes.
Key Ideas
- **Spectrum σ(A) = C \ ρ(A)**: three types - point (eigenvalues), continuous (unbounded inverse), residual (non-dense image)
- **Self-adjoint A = A***: σ(A) ⊆ R, σ_r = ∅, eigenvectors are orthogonal
- **Spectral theorem**: A = ∫λ dE(λ), f(A) = ∫f(λ) dE(λ) - functional calculus
- **Fourier = spectral theorem for -d²/dx²**: F(-d²/dx²)F^{-1} = M_{|ξ|²}
- **Power iteration**: xₖ₊₁ = A·xₖ / ‖A·xₖ‖ → dominant eigenvector. Rate: (λ₂/λ₁)ᵏ
Related Topics
Spectral theory is at the heart of functional analysis and ML:
- L² and Lebesgue Spaces — The spectral theorem diagonalizes operators in L²; Fourier analysis is the special case for -d²/dx²
- Compact Operators — For compact self-adjoint operators the spectral theorem is especially strong: countable spectrum, ONB from eigenfunctions
Вопросы для размышления
- The Fourier transform diagonalizes the differentiation operator. What does the wavelet transform diagonalize? Why are wavelets better for non-smooth signals?
- The PageRank algorithm uses damping factor d = 0.85. This changes the spectrum of the transition matrix: λ₁ = 1, λ₂ ≤ d = 0.85. How does the damping factor affect the number of iterations for convergence?
- In quantum computing: finding the smallest eigenvalue of a Hamiltonian (ground state energy) is QMA-hard. What quantum algorithms attempt to solve it, and how does the spectral theorem relate?