Topology

Connectedness and Compactness

In 2023 a team at Ayasdi analyzed breast cancer records - thousands of patients, hundreds of parameters. Classical clustering found nothing. Then came TDA: build a Vietoris-Rips complex, watch connected components merge and split as the scale grows - persistent homology. The result: a new tumor subtype with high survival rates, invisible to k-means and PCA. The mathematical foundation of that discovery is connectedness and compactness. H_0 homology is the count of connected components. Compactness guarantees the algorithm terminates. Abstractions deliver.

**TDA / Giotto-TDA** - persistent homology on medical and genomic data: connected components (H_0) and cycles (H_1) as topological features; compactness of the finite point cloud guarantees a finite persistence barcode
**Manifold hypothesis (UMAP, t-SNE)** - data lies on a compact path-connected manifold in R^n; local connectedness is used to build a fuzzy topological structure before projecting to 2D
**Extreme value theorem in ML** - if the hyperparameter search domain is compact (closed and bounded), a loss minimum is guaranteed to exist; this is why learning rate schedules often operate on closed intervals

Предварительные знания

Continuity and Homeomorphism

Connectedness

In 2023, the company Ayasdi applied Topological Data Analysis to medical records and found a subtype of breast cancer that had escaped every classical method. The tool: persistent homology. Take a point cloud, build a Vietoris-Rips complex, watch connected components emerge and merge as the scale grows. H_0 homology is literally the count of connected components. The entire machinery starts from a single definition.

A space is **connected** if it cannot be split into two non-empty open sets. Formally: X is connected if the only subsets that are simultaneously open and closed (clopen) are the empty set and X itself. No middle ground.

**Three equivalent definitions of connectedness:** 1. X is not the union of two non-empty disjoint open sets. 2. The only clopen subsets are the empty set and X. 3. Every continuous function f: X -> {0, 1} with the discrete topology is constant.

The key consequence, used throughout mathematics: **the continuous image of a connected space is connected**. If f: X -> Y is continuous and X is connected, then f(X) is connected in Y. This immediately gives the intermediate value theorem: a continuous function on [a,b] takes every value between f(a) and f(b). In TDA: if data lies on a connected manifold, its projection via UMAP is also connected.

Space	Connected?	Why
R	Yes	Intervals are the only connected subsets of R
[0,1]	Yes	Interval
(0,1) union (2,3)	No	Two disjoint open intervals
Q (rationals)	No	Q = (Q ∩ (-∞, √2)) ∪ (Q ∩ (√2, +∞))
R^n	Yes	Any two points are joined by a line segment

Q (rational numbers) with the topology induced from R:

Path Connectedness

The manifold hypothesis in ML states: natural data (images, text, audio) does not live in all of R^40000 but on a compact manifold of dimension 50-100. UMAP and t-SNE exploit this - both assume the manifold is **path-connected**: between any two data points there exists a continuous path along the surface. This is strictly stronger than connectedness.

**Path-connected => connected** (always). **Connected => path-connected** (not always!). Counterexample: the topologist's sine curve - connected but not path-connected. One of the most counterintuitive constructions in analysis.

For manifolds (open subsets of R^n, surfaces) connectedness and path connectedness coincide. Pathological examples like the topologist's sine curve do not arise in ML. This is exactly why the manifold hypothesis holds in practice: real data lies on manifolds, not on theoretical pathologies.

Property	Definition	Status
Connected	No partition into two non-empty open sets	Weaker than path-connected
Path-connected	Between any two points there exists a path gamma: [0,1] -> X	Path-connected => connected
Locally connected	Every point has connected neighborhoods	Does not follow from connectedness
Simply connected	Path-connected + every loop is contractible	Stronger than path-connected

The topologist's sine curve is connected but not path-connected. Why is there no path from (0, 0) to (1, sin 1)?

Compactness

Here is what happens inside Giotto-TDA when analyzing medical data: a Vietoris-Rips complex is built from thousands of points, then the radius grows from zero to its maximum. At every step the algorithm is guaranteed to terminate. The reason: compactness. A finite point cloud gives a compact complex - one where **every open cover has a finite subcover**. Without this property the persistence barcode would be infinite.

An **open cover** of X is a collection of open sets {U_alpha} whose union contains X. **Compactness**: for ANY open cover there exist finitely many U_1, ..., U_n that already cover X. This is a statement about all covers, not one specific cover.

The Weierstrass theorem follows: a continuous function on a compact set attains its maximum and minimum. In ML this is the extreme value theorem - if the search domain for parameters is compact, a loss minimum is guaranteed to exist. This is why optimization problems are often stated on closed bounded domains rather than all of R^n.

Space	Compact?	Why
[0, 1]	Yes	Heine-Borel: closed + bounded in R
(0, 1)	No	Cover {(1/n, 1)} has no finite subcover
R	No	Cover {(-n, n)} has no finite subcover
S^n (n-sphere)	Yes	Closed bounded subset of R^{n+1}
Z (discrete)	No	Infinitely many singleton opens - no finite subcover
{1,...,n} (finite)	Yes	Every finite space is compact

(0, 1) is not compact. Which cover has no finite subcover?

Heine-Borel and Continuous Maps

The cover definition is abstract. For R^n there is a practical criterion - the **Heine-Borel theorem**: a subset of R^n is compact if and only if it is **closed AND bounded**. This is the working tool: checking closedness and boundedness is far easier than inspecting all possible covers.

**Heine-Borel works ONLY in R^n!** In a general metric space, closed + bounded does not imply compact. Example: R with the metric d(x,y) = min(|x-y|, 1) - the whole space is bounded (diameter = 1) and closed, but NOT compact. Infinity can hide in the topology, not in the distances.

Two key facts about continuous maps. First: **the continuous image of a connected space is connected** - this proves the Bolzano intermediate value theorem. Second: **the continuous image of a compact space is compact** - this gives the Weierstrass theorem immediately. Both hold because continuous maps preserve topological properties.

Property	Consequence of compactness
Weierstrass theorem	A continuous f on a compact set attains its max and min
Bolzano-Weierstrass	Every sequence has a convergent subsequence
Lebesgue number lemma	Every open cover has a Lebesgue number delta > 0
Compact in Hausdorff	A compact subset is closed
Continuous bijection from compact	Automatically a homeomorphism (into Hausdorff)

A connected space is the same as a path-connected space

Path-connected always implies connected, but not the other way around. The topologist's sine curve is connected but not path-connected. The curve sin(1/x) approaches the segment {0}×[-1,1], ensuring connectedness via closure, but no continuous path from (0,0) to a point on the graph exists.

Connectedness means the space cannot be cut into two open pieces. Path-connected means there is a continuous curve between any two points. The first is weaker: a space can be uncut-able while still being too winding near boundary points to admit any path.

Is the set {1, 1/2, 1/3, 1/4, ...} in R (without 0) compact?

Key Ideas

**Connectedness**: cannot be split into two non-empty open sets; continuous image of a connected space is connected - giving the intermediate value theorem
**Path-connected**: between any two points there is a path; strictly stronger than connected (topologist's sine curve is the counterexample); manifold hypothesis assumes path-connectedness
**Compactness**: every open cover has a finite subcover; guarantees max/min (Weierstrass), convergent subsequences (Bolzano), finite TDA barcodes
**Heine-Borel**: in R^n, compact = closed + bounded; fails in general metric spaces

Вопросы для размышления

Why does the Heine-Borel theorem fail for arbitrary metric spaces? Build a counterexample using the metric d(x,y) = min(|x-y|, 1).
The Cantor set is a closed subset of [0,1]. Is it compact? Connected? Path-connected? Justify each answer.
Persistent homology (Giotto-TDA) builds a barcode on a finite point cloud. Why does compactness guarantee that the barcode contains only finitely many intervals?

Связанные уроки

top-02 — Open sets and homeomorphisms are the language of connectedness
top-04 — Metric spaces give geometric meaning to compactness
top-05 — Fundamental group counts holes in connected spaces
fa-01 — Compact operators on Banach spaces - direct generalization
calc-01-sequences — Sequential compactness is Bolzano-Weierstrass in topological form
calc-15-convergence
calc-14-improper