Topology
Metric Spaces
Why does Q-learning find the optimal policy? Why does Newton's method converge quadratically? Why is Wasserstein GAN more stable than vanilla GAN? The answer is always the same: the Banach fixed-point theorem in a metric space. Metric spaces are the language in which convergence is written. The Bellman operator with gamma = 0.99 is a contraction with factor 0.99: after 1000 steps the error shrinks by roughly 4.3 * 10^-5. That is why RL works. That is why the metric matters.
- **Reinforcement learning:** the Bellman operator is a contraction with factor gamma; Banach guarantees a unique optimal value function and convergence of value iteration from any initialization
- **Wasserstein GAN:** the Wasserstein metric on distributions combined with a Lipschitz constraint on the discriminator makes training a contraction - hence more stable than KL-based GAN
- **Newton's method:** quadratic convergence follows from the method being a near-zero contraction in a neighborhood of the root
- **Lipschitz networks:** the constraint ||f(x)-f(y)|| <= L||x-y|| in WGAN and robust ML is a metric compatibility requirement in the spirit of Banach contractions
- **Frechet distance (FID):** a metric on the space of curves used in trajectory similarity and generative model evaluation
Предварительные знания
Metric Space Definition
Distance is intuitive. A metric space makes it rigorous with axioms. A **pair (X, d)**, where d: X x X -> R is a metric satisfying four rules: (M1) d(x,y) >= 0; (M2) d(x,y) = 0 if and only if x = y; (M3) d(x,y) = d(y,x); (M4) d(x,z) <= d(x,y) + d(y,z). Break any rule and the algorithms that rely on the metric break with it.
**Cosine similarity is not a metric.** The function 1 - cos(x,y) violates M2 without normalization: two non-zero vectors with the same direction give distance 0 but are not equal. This is exactly why kNN with cosine similarity behaves differently from kNN with Euclidean distance - they live in different metric spaces.
| Metric | Formula | ML Application |
|---|---|---|
| Euclidean (L2) | sqrt(sum(xi-yi)^2) | KNN, k-means, embedding similarity |
| Manhattan (L1) | sum|xi-yi| | LASSO regularization, sparse features |
| Chebyshev (L-inf) | max|xi-yi| | Adversarial attacks (epsilon-ball in L-inf) |
| Wasserstein | inf E[d(X,Y)] over coupling | WGAN - more stable than KL divergence |
| Frechet | metric on curves | Trajectory similarity, motion planning |
Different metrics can generate the same topology - such metrics are called **equivalent**. L1, L2, and L-inf on R^n are equivalent. What changes when the metric changes: the shape of balls and, as a consequence, the notion of nearest neighbor.
Is d(x, y) = (x - y)^2 a valid metric on R?
Balls and Open Sets
The **open ball** of radius r > 0 centered at x: B(x, r) = {y : d(x, y) < r}. Closed ball: B_closed(x, r) = {y : d(x, y) <= r}. Open balls form a basis for the topology: a set U is open if and only if every point of U has some ball centered there that fits entirely inside U.
**A ball need not look round.** In the discrete metric (d(x,y) = 0 if x=y, else 1): B(x, 0.5) = {x} (a single point), B(x, 2) = X (the whole space). In a graph metric, a ball is the set of vertices within graph distance r. The shape of the ball is entirely determined by the metric.
**Adversarial attacks and ball shape.** An L-inf epsilon-ball perturbation means: change each pixel by at most epsilon - a hypercube in pixel space. An L2 attack is a hypersphere. An L1 attack is a cross-polytope. WGAN uses balls in the Wasserstein metric - completely different geometry. The choice of metric is the choice of what counts as an attack.
In the discrete metric (d(x,y) = 0 if x=y, else 1), which statement is true?
Convergence and Completeness
A sequence (x_n) **converges** to x if d(x_n, x) -> 0 as n -> infinity: for every eps > 0 there exists N such that d(x_n, x) < eps for all n > N. Limits are unique - a consequence of the Hausdorff property. The same epsilon-N language as ordinary calculus, but now the distance is an arbitrary metric.
**Cauchy sequence:** (x_n) is Cauchy if d(x_n, x_m) -> 0 as n, m -> infinity. Every convergent sequence is Cauchy. The converse holds only in **complete** spaces. **Completeness:** (X, d) is complete if every Cauchy sequence converges. Completeness = no holes for iterative algorithms to fall into.
| Space | Complete? | Notes |
|---|---|---|
| R | Yes | Completion of Q |
| Q | No | x_n -> sqrt(2), but sqrt(2) is not in Q |
| C([0,1]) with sup norm | Yes | Banach space |
| C([0,1]) with L2 norm | No | Limit can be a discontinuous function |
| L^2([0,1]) | Yes | Hilbert space; backbone of quantum mechanics and deep learning |
| (0,1) subset R | No | x_n = 1/n -> 0 but 0 is not in (0,1) |
Completeness determines whether an iterative algorithm works in a given space. Numerical methods are analyzed in complete spaces - Banach and Hilbert. The space L^2(R) is the backbone of signal processing, quantum mechanics, and gradient flow analysis for neural networks.
Is C([0,1]) with the metric d(f,g) = integral_0^1 |f(x)-g(x)| dx complete?
The Banach Fixed-Point Theorem
**Banach's theorem:** If (X, d) is a complete metric space and T: X -> X is a **contraction** (there exists 0 <= q < 1 such that d(Tx, Ty) <= q * d(x,y) for all x, y), then T has a unique fixed point x* = T(x*), and the iterates x_{n+1} = T(x_n) converge to x* from any starting point.
**Convergence rate:** d(x_n, x*) <= q^n / (1-q) * d(x_1, x_0). Geometric speed: each step multiplies the error by q < 1. The Bellman operator with discount factor gamma = 0.99 is a contraction with factor 0.99. After 1000 steps the error shrinks by a factor of 0.99^1000, approximately 4.3 * 10^-5. That is why RL agents converge.
**Contractions in ML and numerical analysis.** The Bellman operator (RL, gamma < 1) - contraction in the sup norm. Newton's method on strongly convex functions - contraction near the root, hence quadratic convergence. Wasserstein GAN with a Lipschitz constraint (||f(x)-f(y)|| <= L||x-y||) - enforces the discriminator to be a contraction. Jacobi and Gauss-Seidel for linear systems - contractions under diagonal dominance. Banach's theorem explains them all.
Banach's theorem delivers three things at once: existence of a solution, uniqueness, and a constructive algorithm with a convergence rate bound. That triple is exactly what a practitioner needs when designing an iterative method - not just 'a solution exists' but 'here is the algorithm and here is the guarantee'.
Value iteration converges to the optimal V* because the Bellman operator T is a contraction with factor gamma < 1. What does Banach's theorem guarantee?
Key Ideas
- **Metric d:** four axioms (non-negativity, identity, symmetry, triangle); L1/L2/L-inf are equivalent metrics on R^n with different ball shapes; cosine similarity is not a metric
- **Balls B(x,r):** basis for the topology; ball shape is entirely determined by the metric; adversarial epsilon-balls in L-inf, L2, L1 are different objects
- **Completeness:** every Cauchy sequence converges; R and L^2 are complete, Q and C([0,1]) with L2 norm are not; completeness means no holes for iterative algorithms
- **Banach theorem:** a contraction T on complete (X,d) has a unique fixed point; iterates T^n(x_0) -> x* at geometric speed q^n; Bellman, Newton, Jacobi all live here
Related Topics
Metric spaces are a special case of topological spaces with extra structure for analyzing convergence:
- Compactness — In metric spaces compactness is equivalent to sequential compactness (Bolzano-Weierstrass)
- Manifolds — A Riemannian manifold is a metric space with smooth structure; geodesics generalize straight lines
Вопросы для размышления
- Prove that a closed subspace of a complete metric space is itself complete. What does this imply for closed convex sets in L^2?
- Newton's method: x_{n+1} = x_n - f(x_n)/f'(x_n). Under what condition is it a contraction, and why does it converge quadratically rather than linearly?
- C([0,1]) with the sup norm is complete; with the L2 norm it is not. What happens if only polynomials of degree at most n are considered?
Связанные уроки
- top-03 — Compactness and connectedness - the previous lesson
- top-05 — Next step: topological groups and homogeneous spaces
- fa-01 — Banach space = complete normed space; the metric comes from the norm
- calc-01-sequences — Sequence convergence uses the same epsilon-N language
- cvx-01 — Convex optimization algorithms are analyzed as contractions in complete spaces
- la-01-vectors-intro
- stats-21