Topology
Connectedness and Compactness
In 2023 a team at Ayasdi analyzed breast cancer records - thousands of patients, hundreds of parameters. Classical clustering found nothing. Then came TDA: build a Vietoris-Rips complex, watch connected components merge and split as the scale grows - persistent homology. The result: a new tumor subtype with high survival rates, invisible to k-means and PCA. The mathematical foundation of that discovery is connectedness and compactness. H_0 homology is the count of connected components. Compactness guarantees the algorithm terminates. Abstractions deliver.
- **TDA / Giotto-TDA** - persistent homology on medical and genomic data: connected components (H_0) and cycles (H_1) as topological features; compactness of the finite point cloud guarantees a finite persistence barcode
- **Manifold hypothesis (UMAP, t-SNE)** - data lies on a compact path-connected manifold in R^n; local connectedness is used to build a fuzzy topological structure before projecting to 2D
- **Extreme value theorem in ML** - if the hyperparameter search domain is compact (closed and bounded), a loss minimum is guaranteed to exist; this is why learning rate schedules often operate on closed intervals
Предварительные знания
Connectedness
In 2023, the company Ayasdi applied Topological Data Analysis to medical records and found a subtype of breast cancer that had escaped every classical method. The tool: persistent homology. Take a point cloud, build a Vietoris-Rips complex, watch connected components emerge and merge as the scale grows. H_0 homology is literally the count of connected components. The entire machinery starts from a single definition.
A space is **connected** if it cannot be split into two non-empty open sets. Formally: X is connected if the only subsets that are simultaneously open and closed (clopen) are the empty set and X itself. No middle ground.
**Three equivalent definitions of connectedness:** 1. X is not the union of two non-empty disjoint open sets. 2. The only clopen subsets are the empty set and X. 3. Every continuous function f: X -> {0, 1} with the discrete topology is constant.
The key consequence, used throughout mathematics: **the continuous image of a connected space is connected**. If f: X -> Y is continuous and X is connected, then f(X) is connected in Y. This immediately gives the intermediate value theorem: a continuous function on [a,b] takes every value between f(a) and f(b). In TDA: if data lies on a connected manifold, its projection via UMAP is also connected.
| Space | Connected? | Why |
|---|---|---|
| R | Yes | Intervals are the only connected subsets of R |
| [0,1] | Yes | Interval |
| (0,1) union (2,3) | No | Two disjoint open intervals |
| Q (rationals) | No | Q = (Q ∩ (-∞, √2)) ∪ (Q ∩ (√2, +∞)) |
| R^n | Yes | Any two points are joined by a line segment |
Q (rational numbers) with the topology induced from R:
Path Connectedness
The manifold hypothesis in ML states: natural data (images, text, audio) does not live in all of R^40000 but on a compact manifold of dimension 50-100. UMAP and t-SNE exploit this - both assume the manifold is **path-connected**: between any two data points there exists a continuous path along the surface. This is strictly stronger than connectedness.
**Path-connected => connected** (always). **Connected => path-connected** (not always!). Counterexample: the topologist's sine curve - connected but not path-connected. One of the most counterintuitive constructions in analysis.
For manifolds (open subsets of R^n, surfaces) connectedness and path connectedness coincide. Pathological examples like the topologist's sine curve do not arise in ML. This is exactly why the manifold hypothesis holds in practice: real data lies on manifolds, not on theoretical pathologies.
| Property | Definition | Status |
|---|---|---|
| Connected | No partition into two non-empty open sets | Weaker than path-connected |
| Path-connected | Between any two points there exists a path gamma: [0,1] -> X | Path-connected => connected |
| Locally connected | Every point has connected neighborhoods | Does not follow from connectedness |
| Simply connected | Path-connected + every loop is contractible | Stronger than path-connected |
The topologist's sine curve is connected but not path-connected. Why is there no path from (0, 0) to (1, sin 1)?
Compactness
Here is what happens inside Giotto-TDA when analyzing medical data: a Vietoris-Rips complex is built from thousands of points, then the radius grows from zero to its maximum. At every step the algorithm is guaranteed to terminate. The reason: compactness. A finite point cloud gives a compact complex - one where **every open cover has a finite subcover**. Without this property the persistence barcode would be infinite.
An **open cover** of X is a collection of open sets {U_alpha} whose union contains X. **Compactness**: for ANY open cover there exist finitely many U_1, ..., U_n that already cover X. This is a statement about all covers, not one specific cover.
The Weierstrass theorem follows: a continuous function on a compact set attains its maximum and minimum. In ML this is the extreme value theorem - if the search domain for parameters is compact, a loss minimum is guaranteed to exist. This is why optimization problems are often stated on closed bounded domains rather than all of R^n.
| Space | Compact? | Why |
|---|---|---|
| [0, 1] | Yes | Heine-Borel: closed + bounded in R |
| (0, 1) | No | Cover {(1/n, 1)} has no finite subcover |
| R | No | Cover {(-n, n)} has no finite subcover |
| S^n (n-sphere) | Yes | Closed bounded subset of R^{n+1} |
| Z (discrete) | No | Infinitely many singleton opens - no finite subcover |
| {1,...,n} (finite) | Yes | Every finite space is compact |
(0, 1) is not compact. Which cover has no finite subcover?
Heine-Borel and Continuous Maps
The cover definition is abstract. For R^n there is a practical criterion - the **Heine-Borel theorem**: a subset of R^n is compact if and only if it is **closed AND bounded**. This is the working tool: checking closedness and boundedness is far easier than inspecting all possible covers.
**Heine-Borel works ONLY in R^n!** In a general metric space, closed + bounded does not imply compact. Example: R with the metric d(x,y) = min(|x-y|, 1) - the whole space is bounded (diameter = 1) and closed, but NOT compact. Infinity can hide in the topology, not in the distances.
Two key facts about continuous maps. First: **the continuous image of a connected space is connected** - this proves the Bolzano intermediate value theorem. Second: **the continuous image of a compact space is compact** - this gives the Weierstrass theorem immediately. Both hold because continuous maps preserve topological properties.
| Property | Consequence of compactness |
|---|---|
| Weierstrass theorem | A continuous f on a compact set attains its max and min |
| Bolzano-Weierstrass | Every sequence has a convergent subsequence |
| Lebesgue number lemma | Every open cover has a Lebesgue number delta > 0 |
| Compact in Hausdorff | A compact subset is closed |
| Continuous bijection from compact | Automatically a homeomorphism (into Hausdorff) |
A connected space is the same as a path-connected space
Path-connected always implies connected, but not the other way around. The topologist's sine curve is connected but not path-connected. The curve sin(1/x) approaches the segment {0}×[-1,1], ensuring connectedness via closure, but no continuous path from (0,0) to a point on the graph exists.
Connectedness means the space cannot be cut into two open pieces. Path-connected means there is a continuous curve between any two points. The first is weaker: a space can be uncut-able while still being too winding near boundary points to admit any path.
Is the set {1, 1/2, 1/3, 1/4, ...} in R (without 0) compact?
Key Ideas
- **Connectedness**: cannot be split into two non-empty open sets; continuous image of a connected space is connected - giving the intermediate value theorem
- **Path-connected**: between any two points there is a path; strictly stronger than connected (topologist's sine curve is the counterexample); manifold hypothesis assumes path-connectedness
- **Compactness**: every open cover has a finite subcover; guarantees max/min (Weierstrass), convergent subsequences (Bolzano), finite TDA barcodes
- **Heine-Borel**: in R^n, compact = closed + bounded; fails in general metric spaces
Related Topics
Connectedness and compactness are the main invariants for distinguishing spaces:
- Continuity and Homeomorphism — Connectedness and compactness are preserved under homeomorphisms - tool for distinguishing spaces
- Topological Spaces — Connectedness and compactness are defined via open sets
- Metric Spaces — In metric spaces compactness equals sequential compactness
- Fundamental Group — Connectedness is a prerequisite for a non-trivial fundamental group
Вопросы для размышления
- Why does the Heine-Borel theorem fail for arbitrary metric spaces? Build a counterexample using the metric d(x,y) = min(|x-y|, 1).
- The Cantor set is a closed subset of [0,1]. Is it compact? Connected? Path-connected? Justify each answer.
- Persistent homology (Giotto-TDA) builds a barcode on a finite point cloud. Why does compactness guarantee that the barcode contains only finitely many intervals?
Связанные уроки
- top-02 — Open sets and homeomorphisms are the language of connectedness
- top-04 — Metric spaces give geometric meaning to compactness
- top-05 — Fundamental group counts holes in connected spaces
- fa-01 — Compact operators on Banach spaces - direct generalization
- calc-01-sequences — Sequential compactness is Bolzano-Weierstrass in topological form
- calc-15-convergence
- calc-14-improper