Topology

Metric Spaces

Why does Q-learning find the optimal policy? Why does Newton's method converge quadratically? Why is Wasserstein GAN more stable than vanilla GAN? The answer is always the same: the Banach fixed-point theorem in a metric space. Metric spaces are the language in which convergence is written. The Bellman operator with gamma = 0.99 is a contraction with factor 0.99: after 1000 steps the error shrinks by roughly 4.3 * 10^-5. That is why RL works. That is why the metric matters.

**Reinforcement learning:** the Bellman operator is a contraction with factor gamma; Banach guarantees a unique optimal value function and convergence of value iteration from any initialization
**Wasserstein GAN:** the Wasserstein metric on distributions combined with a Lipschitz constraint on the discriminator makes training a contraction - hence more stable than KL-based GAN
**Newton's method:** quadratic convergence follows from the method being a near-zero contraction in a neighborhood of the root
**Lipschitz networks:** the constraint ||f(x)-f(y)|| <= L||x-y|| in WGAN and robust ML is a metric compatibility requirement in the spirit of Banach contractions
**Frechet distance (FID):** a metric on the space of curves used in trajectory similarity and generative model evaluation

Предварительные знания

Connectedness and Compactness

Metric Space Definition

Distance is intuitive. A metric space makes it rigorous with axioms. A **pair (X, d)**, where d: X x X -> R is a metric satisfying four rules: (M1) d(x,y) >= 0; (M2) d(x,y) = 0 if and only if x = y; (M3) d(x,y) = d(y,x); (M4) d(x,z) <= d(x,y) + d(y,z). Break any rule and the algorithms that rely on the metric break with it.

**Cosine similarity is not a metric.** The function 1 - cos(x,y) violates M2 without normalization: two non-zero vectors with the same direction give distance 0 but are not equal. This is exactly why kNN with cosine similarity behaves differently from kNN with Euclidean distance - they live in different metric spaces.

Metric	Formula	ML Application
Euclidean (L2)	sqrt(sum(xi-yi)^2)	KNN, k-means, embedding similarity
Manhattan (L1)	sum\|xi-yi\|	LASSO regularization, sparse features
Chebyshev (L-inf)	max\|xi-yi\|	Adversarial attacks (epsilon-ball in L-inf)
Wasserstein	inf E[d(X,Y)] over coupling	WGAN - more stable than KL divergence
Frechet	metric on curves	Trajectory similarity, motion planning

Different metrics can generate the same topology - such metrics are called **equivalent**. L1, L2, and L-inf on R^n are equivalent. What changes when the metric changes: the shape of balls and, as a consequence, the notion of nearest neighbor.

Is d(x, y) = (x - y)^2 a valid metric on R?

Balls and Open Sets

The **open ball** of radius r > 0 centered at x: B(x, r) = {y : d(x, y) < r}. Closed ball: B_closed(x, r) = {y : d(x, y) <= r}. Open balls form a basis for the topology: a set U is open if and only if every point of U has some ball centered there that fits entirely inside U.

**A ball need not look round.** In the discrete metric (d(x,y) = 0 if x=y, else 1): B(x, 0.5) = {x} (a single point), B(x, 2) = X (the whole space). In a graph metric, a ball is the set of vertices within graph distance r. The shape of the ball is entirely determined by the metric.

**Adversarial attacks and ball shape.** An L-inf epsilon-ball perturbation means: change each pixel by at most epsilon - a hypercube in pixel space. An L2 attack is a hypersphere. An L1 attack is a cross-polytope. WGAN uses balls in the Wasserstein metric - completely different geometry. The choice of metric is the choice of what counts as an attack.

In the discrete metric (d(x,y) = 0 if x=y, else 1), which statement is true?

Convergence and Completeness

A sequence (x_n) **converges** to x if d(x_n, x) -> 0 as n -> infinity: for every eps > 0 there exists N such that d(x_n, x) < eps for all n > N. Limits are unique - a consequence of the Hausdorff property. The same epsilon-N language as ordinary calculus, but now the distance is an arbitrary metric.

**Cauchy sequence:** (x_n) is Cauchy if d(x_n, x_m) -> 0 as n, m -> infinity. Every convergent sequence is Cauchy. The converse holds only in **complete** spaces. **Completeness:** (X, d) is complete if every Cauchy sequence converges. Completeness = no holes for iterative algorithms to fall into.

Space	Complete?	Notes
R	Yes	Completion of Q
Q	No	x_n -> sqrt(2), but sqrt(2) is not in Q
C([0,1]) with sup norm	Yes	Banach space
C([0,1]) with L2 norm	No	Limit can be a discontinuous function
L^2([0,1])	Yes	Hilbert space; backbone of quantum mechanics and deep learning
(0,1) subset R	No	x_n = 1/n -> 0 but 0 is not in (0,1)

Completeness determines whether an iterative algorithm works in a given space. Numerical methods are analyzed in complete spaces - Banach and Hilbert. The space L^2(R) is the backbone of signal processing, quantum mechanics, and gradient flow analysis for neural networks.

Is C([0,1]) with the metric d(f,g) = integral_0^1 |f(x)-g(x)| dx complete?

The Banach Fixed-Point Theorem

**Banach's theorem:** If (X, d) is a complete metric space and T: X -> X is a **contraction** (there exists 0 <= q < 1 such that d(Tx, Ty) <= q * d(x,y) for all x, y), then T has a unique fixed point x* = T(x*), and the iterates x_{n+1} = T(x_n) converge to x* from any starting point.

**Convergence rate:** d(x_n, x*) <= q^n / (1-q) * d(x_1, x_0). Geometric speed: each step multiplies the error by q < 1. The Bellman operator with discount factor gamma = 0.99 is a contraction with factor 0.99. After 1000 steps the error shrinks by a factor of 0.99^1000, approximately 4.3 * 10^-5. That is why RL agents converge.

**Contractions in ML and numerical analysis.** The Bellman operator (RL, gamma < 1) - contraction in the sup norm. Newton's method on strongly convex functions - contraction near the root, hence quadratic convergence. Wasserstein GAN with a Lipschitz constraint (||f(x)-f(y)|| <= L||x-y||) - enforces the discriminator to be a contraction. Jacobi and Gauss-Seidel for linear systems - contractions under diagonal dominance. Banach's theorem explains them all.

Banach's theorem delivers three things at once: existence of a solution, uniqueness, and a constructive algorithm with a convergence rate bound. That triple is exactly what a practitioner needs when designing an iterative method - not just 'a solution exists' but 'here is the algorithm and here is the guarantee'.

Value iteration converges to the optimal V* because the Bellman operator T is a contraction with factor gamma < 1. What does Banach's theorem guarantee?

Key Ideas

**Metric d:** four axioms (non-negativity, identity, symmetry, triangle); L1/L2/L-inf are equivalent metrics on R^n with different ball shapes; cosine similarity is not a metric
**Balls B(x,r):** basis for the topology; ball shape is entirely determined by the metric; adversarial epsilon-balls in L-inf, L2, L1 are different objects
**Completeness:** every Cauchy sequence converges; R and L^2 are complete, Q and C([0,1]) with L2 norm are not; completeness means no holes for iterative algorithms
**Banach theorem:** a contraction T on complete (X,d) has a unique fixed point; iterates T^n(x_0) -> x* at geometric speed q^n; Bellman, Newton, Jacobi all live here

Вопросы для размышления

Prove that a closed subspace of a complete metric space is itself complete. What does this imply for closed convex sets in L^2?
Newton's method: x_{n+1} = x_n - f(x_n)/f'(x_n). Under what condition is it a contraction, and why does it converge quadratically rather than linearly?
C([0,1]) with the sup norm is complete; with the L2 norm it is not. What happens if only polynomials of degree at most n are considered?

Связанные уроки

top-03 — Compactness and connectedness - the previous lesson
top-05 — Next step: topological groups and homogeneous spaces
fa-01 — Banach space = complete normed space; the metric comes from the norm
calc-01-sequences — Sequence convergence uses the same epsilon-N language
cvx-01 — Convex optimization algorithms are analyzed as contractions in complete spaces
la-01-vectors-intro
stats-21

Topology

Metric Spaces

**Reinforcement learning:** the Bellman operator is a contraction with factor gamma; Banach guarantees a unique optimal value function and convergence of value iteration from any initialization
**Wasserstein GAN:** the Wasserstein metric on distributions combined with a Lipschitz constraint on the discriminator makes training a contraction - hence more stable than KL-based GAN
**Newton's method:** quadratic convergence follows from the method being a near-zero contraction in a neighborhood of the root
**Lipschitz networks:** the constraint ||f(x)-f(y)|| <= L||x-y|| in WGAN and robust ML is a metric compatibility requirement in the spirit of Banach contractions
**Frechet distance (FID):** a metric on the space of curves used in trajectory similarity and generative model evaluation

Предварительные знания

Connectedness and Compactness

Metric Space Definition

Metric	Formula	ML Application
Euclidean (L2)	sqrt(sum(xi-yi)^2)	KNN, k-means, embedding similarity
Manhattan (L1)	sum\|xi-yi\|	LASSO regularization, sparse features
Chebyshev (L-inf)	max\|xi-yi\|	Adversarial attacks (epsilon-ball in L-inf)
Wasserstein	inf E[d(X,Y)] over coupling	WGAN - more stable than KL divergence
Frechet	metric on curves	Trajectory similarity, motion planning

Is d(x, y) = (x - y)^2 a valid metric on R?

Balls and Open Sets

In the discrete metric (d(x,y) = 0 if x=y, else 1), which statement is true?

Convergence and Completeness

Space	Complete?	Notes
R	Yes	Completion of Q
Q	No	x_n -> sqrt(2), but sqrt(2) is not in Q
C([0,1]) with sup norm	Yes	Banach space
C([0,1]) with L2 norm	No	Limit can be a discontinuous function
L^2([0,1])	Yes	Hilbert space; backbone of quantum mechanics and deep learning
(0,1) subset R	No	x_n = 1/n -> 0 but 0 is not in (0,1)

Is C([0,1]) with the metric d(f,g) = integral_0^1 |f(x)-g(x)| dx complete?

The Banach Fixed-Point Theorem

Value iteration converges to the optimal V* because the Bellman operator T is a contraction with factor gamma < 1. What does Banach's theorem guarantee?

Key Ideas

**Metric d:** four axioms (non-negativity, identity, symmetry, triangle); L1/L2/L-inf are equivalent metrics on R^n with different ball shapes; cosine similarity is not a metric
**Balls B(x,r):** basis for the topology; ball shape is entirely determined by the metric; adversarial epsilon-balls in L-inf, L2, L1 are different objects
**Completeness:** every Cauchy sequence converges; R and L^2 are complete, Q and C([0,1]) with L2 norm are not; completeness means no holes for iterative algorithms
**Banach theorem:** a contraction T on complete (X,d) has a unique fixed point; iterates T^n(x_0) -> x* at geometric speed q^n; Bellman, Newton, Jacobi all live here

Вопросы для размышления

Prove that a closed subspace of a complete metric space is itself complete. What does this imply for closed convex sets in L^2?
Newton's method: x_{n+1} = x_n - f(x_n)/f'(x_n). Under what condition is it a contraction, and why does it converge quadratically rather than linearly?
C([0,1]) with the sup norm is complete; with the L2 norm it is not. What happens if only polynomials of degree at most n are considered?

Связанные уроки

top-03 — Compactness and connectedness - the previous lesson
top-05 — Next step: topological groups and homogeneous spaces
fa-01 — Banach space = complete normed space; the metric comes from the norm
calc-01-sequences — Sequence convergence uses the same epsilon-N language
cvx-01 — Convex optimization algorithms are analyzed as contractions in complete spaces
la-01-vectors-intro
stats-21

Metric Spaces

Предварительные знания

Metric Space Definition

Balls and Open Sets

Convergence and Completeness

The Banach Fixed-Point Theorem

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки

Metric Spaces

Предварительные знания

Metric Space Definition

Balls and Open Sets

Convergence and Completeness

The Banach Fixed-Point Theorem

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки