Measure Theory

Duality and the Riesz Representation Theorem

How does a neural network 'see' data? What is an SVM kernel geometrically? Duality theory answers: each observation is a functional in the hypothesis space, the dual optimization finds coefficients for these functionals, and RKHS makes this space concrete and computable.

**SVMs and kernel methods:** the kernel trick works because SVMs operate in an RKHS where evaluation functionals are bounded; the dual variables αᵢ are weights on these functionals
**Neural Tangent Kernel:** infinitely wide networks train like kernel methods in an RKHS, providing theoretical convergence and generalization guarantees
**Gaussian processes:** a GP is a measure on a function space, and its kernel generates an RKHS in which the GP searches for the optimum

Предварительные знания

Lp Spaces

Dual Space and Bounded Linear Functionals

Every normed space X has a **dual space** X* - the collection of all bounded linear functionals on X. Understanding duality is the key to interpreting dual optimization problems and to grasping what a 'linear model' means in an infinite-dimensional feature space.

**Bounded linear functional:** a linear map φ: X → ℝ is bounded if: ‖φ‖_{X*} = sup { |φ(f)| : ‖f‖_X ≤ 1 } < ∞ **Dual space X*** = { all bounded linear functionals on X } with the operator norm ‖φ‖_{X*}. X* is always complete (Banach), even if X is not.

**Dual problem in optimization:** in SVMs the primal problem (finding a hyperplane) transforms into the dual (finding support vectors). This is duality theory in action: each support vector corresponds to a functional in the dual of the feature space, and the dual variables αᵢ are its coordinates.

The norm of a dual functional ‖φ‖_{X*} = sup_{‖f‖≤1} |φ(f)| measures the 'sensitivity' of the functional to unit perturbations. In ML this quantifies how much the model's prediction can change in response to a unit-norm perturbation of the input.

Is the functional φ(f) = f(0) bounded on L²([0,1])?

The Dual of Lp is Lq

The main result on Lp duality: the dual of Lp is Lq, where 1/p + 1/q = 1. Every bounded linear functional on Lp has an integral representation via a function in Lq.

**Riesz Representation Theorem (for Lp):** let 1 ≤ p < ∞ and 1/p + 1/q = 1. Then: (Lp(μ))* ≅ Lq(μ) More precisely: for every φ ∈ (Lp)* there exists a unique g ∈ Lq such that: φ(f) = ∫ f · g dμ for all f ∈ Lp and ‖φ‖_{(Lp)*} = ‖g‖_q. **Exception:** (L¹)* = L∞, but (L∞)* strictly contains L¹.

**Special case p=2:** L² is self-dual! Its dual is again L². This follows from L² being a Hilbert space: the inner product ⟨f,g⟩ = ∫fg dμ provides the isomorphism with the dual. This property sets L² apart from all other Lp spaces.

**Statistics application:** in ordinary least squares regression (an L²-problem), the estimator β = (X^TX)^{-1}X^Ty is the dual-space solution. The primal problem lives in the parameter space; the dual lives in the observation space. The matrix X^TX mediates between them.

The dual space of L³([0,1]) is:

Reproducing Kernel Hilbert Spaces (RKHS)

The Riesz theorem for L² has an important extension: in special Hilbert spaces called **RKHS** (Reproducing Kernel Hilbert Spaces), the point evaluation functional f ↦ f(x₀) becomes bounded. This makes RKHS the natural function space for machine learning theory.

**RKHS:** a Hilbert space H of functions f: X → ℝ is called a reproducing kernel Hilbert space if for every x ∈ X the evaluation functional φ_x: f ↦ f(x) is bounded on H. By the Riesz theorem: there exists a unique **reproducing kernel** k: X×X → ℝ such that: f(x) = ⟨f, k(x,·)⟩_H for all f ∈ H, x ∈ X The function k(x,·) is the 'dual object' corresponding to the point x in H.

**Representer Theorem:** the optimal solution in RKHS when minimizing the regularized functional Σ L(f(xᵢ), yᵢ) + λ‖f‖²_H has the form f* = Σᵢ αᵢ k(xᵢ, ·). This is the theoretical justification of the kernel trick in SVMs: instead of working in the infinite-dimensional H, it suffices to solve a finite-dimensional system over the training points.

RKHS is the bridge between measure theory and practical ML. Examples: the Gaussian kernel generates the space of infinitely differentiable functions; the polynomial kernel generates polynomial functions; the linear kernel gives ordinary Euclidean space. Choosing a kernel = choosing a hypothesis space.

Why is f ↦ f(x₀) bounded in an RKHS but not in L²?

Functional Analysis of Neural Networks

Dual space theory actively informs our understanding of neural networks. In the infinite-width limit, neural networks connect to RKHS via the Neural Tangent Kernel, and neural network regularization can be interpreted through norms in function spaces.

**Neural Tangent Kernel (NTK):** as network width → ∞ with small learning rate, the network behaves like a kernel method in an RKHS with kernel: k_NTK(x, x') = E_θ[∇_θ f(x,θ) · ∇_θ f(x',θ)] Training is equivalent to minimizing the squared loss in this RKHS, and the weights converge to the kernel regression solution.

**Barron norm:** for single-hidden-layer networks a specific norm is defined that controls the approximation capacity. Functions with small Barron norm can be approximated by a network with polynomially many neurons, independent of input dimension - a theoretical explanation for why neural networks escape the curse of dimensionality.

The dual space in ML is the space of 'data valuations'. In kernel methods each training point xᵢ contributes a functional k(xᵢ, ·), and the dual variables αᵢ are their weights. Support vectors in SVMs are precisely the training points whose αᵢ are nonzero - the most informative functionals.

The Neural Tangent Kernel connects infinitely wide neural networks to:

Key Ideas

**Dual space X*:** bounded linear functionals on X; the dual of Lp is Lq for 1/p+1/q=1; L² is self-dual
**Riesz theorem for Lp:** every bounded functional on Lp has the form φ(f) = ∫fg dμ for a unique g ∈ Lq
**RKHS:** a Hilbert space where evaluation functionals are bounded; the reproducing kernel k implements f(x) = ⟨f, k(x,·)⟩
**Applications:** SVM kernel trick, Neural Tangent Kernel, Bayesian optimization, the representer theorem

Вопросы для размышления

Why is L² self-dual while L¹ is not? What does self-duality mean geometrically for the unit ball?
In SVMs the dual variables αᵢ measure the 'importance' of each training point. How does this relate to the notion of a bounded functional in the dual space?
The Gaussian RKHS contains infinitely differentiable functions. How does this constrain what an infinitely wide neural network can learn in the NTK regime?

Связанные уроки

top-04

Measure Theory

Duality and the Riesz Representation Theorem

**SVMs and kernel methods:** the kernel trick works because SVMs operate in an RKHS where evaluation functionals are bounded; the dual variables αᵢ are weights on these functionals
**Neural Tangent Kernel:** infinitely wide networks train like kernel methods in an RKHS, providing theoretical convergence and generalization guarantees
**Gaussian processes:** a GP is a measure on a function space, and its kernel generates an RKHS in which the GP searches for the optimum

Предварительные знания

Lp Spaces

Dual Space and Bounded Linear Functionals

Is the functional φ(f) = f(0) bounded on L²([0,1])?

The Dual of Lp is Lq

The main result on Lp duality: the dual of Lp is Lq, where 1/p + 1/q = 1. Every bounded linear functional on Lp has an integral representation via a function in Lq.

The dual space of L³([0,1]) is:

Reproducing Kernel Hilbert Spaces (RKHS)

Why is f ↦ f(x₀) bounded in an RKHS but not in L²?

Functional Analysis of Neural Networks

The Neural Tangent Kernel connects infinitely wide neural networks to:

Key Ideas

**Dual space X*:** bounded linear functionals on X; the dual of Lp is Lq for 1/p+1/q=1; L² is self-dual
**Riesz theorem for Lp:** every bounded functional on Lp has the form φ(f) = ∫fg dμ for a unique g ∈ Lq
**RKHS:** a Hilbert space where evaluation functionals are bounded; the reproducing kernel k implements f(x) = ⟨f, k(x,·)⟩
**Applications:** SVM kernel trick, Neural Tangent Kernel, Bayesian optimization, the representer theorem

Вопросы для размышления

Why is L² self-dual while L¹ is not? What does self-duality mean geometrically for the unit ball?
In SVMs the dual variables αᵢ measure the 'importance' of each training point. How does this relate to the notion of a bounded functional in the dual space?
The Gaussian RKHS contains infinitely differentiable functions. How does this constrain what an infinitely wide neural network can learn in the NTK regime?

Связанные уроки

top-04

Duality and the Riesz Representation Theorem

Предварительные знания

Dual Space and Bounded Linear Functionals

The Dual of Lp is Lq

Reproducing Kernel Hilbert Spaces (RKHS)

Functional Analysis of Neural Networks

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки

Duality and the Riesz Representation Theorem

Предварительные знания

Dual Space and Bounded Linear Functionals

The Dual of Lp is Lq

Reproducing Kernel Hilbert Spaces (RKHS)

Functional Analysis of Neural Networks

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки