Topology
Persistent Homology
How do finding the "holes" in a cloud of a million data points? How do one distinguish a real loop from noise? Persistent homology answers this question, turning the abstract Morse theory into a practical tool for data analysis.
- **Cancer cell analysis:** Topological signatures of histological images allow distinguishing subtypes of breast cancer with accuracy exceeding classical methods
- **Neural networks and learning:** Topological regularization adds a penalty for unwanted topology of activations, improving generalization and interpretability
- **Molecular shape analysis:** Persistent homology describes the 3D shape of proteins via topological descriptors for drug discovery tasks
Предварительные знания
Filtrations and Persistence
TDA (Topological Data Analysis, 2011) discovered a new breast cancer subtype using persistent homology on 25,000-gene expression data. **Persistent homology** studies how topological properties of a space are "born" and "die" as the scale parameter changes. At its core is the notion of a **filtration**, a nested sequence of spaces.
In practice the Čech complex is expensive to compute. The Vietoris-Rips complex VR_ε(P): a simplex is added if all its vertices are pairwise within distance ≤ ε. It is cheaper to compute and approximates Čech: Č_ε ⊂ VR_ε ⊂ Č_{2ε}.
What does persistent homology study, compared to ordinary homology?
Barcodes and Persistence Diagrams
Persistent homology is described by two equivalent representations: the **barcode** and the **persistence diagram**. Both encode pairs (birth, death), the scale at which each topological feature appears and disappears.
Key idea: **persistence** = death - birth. Features with large persistence are a reliable topological signal. Features with small persistence are likely noise or discretization artifacts.
What does a point close to the diagonal y=x in a persistence diagram represent?
The Stability Theorem
The key property of persistent homology is **stability**: small changes in the data lead to small changes in the persistence diagram. This makes persistent homology suitable for working with noisy data.
The stability theorem guarantees that persistence diagrams can be used as input features for machine learning: a small amount of noise in the data will not destroy the topological signals. This distinguishes TDA from "brittle" topological invariants.
What does the stability theorem for persistent homology guarantee?
Persistent Homology in Practice
Persistent homology has found applications in a diverse range of fields, from data analysis to neuroscience. Its key advantage: it works with data of arbitrary shape without assuming linearity or convexity.
To use persistence diagrams as neural network inputs, they need to be "vectorized". Methods include: persistence images, persistence landscapes, and vectorization via tropical geometry. Each approach preserves part of the diagram's structure.
Why are points near the diagonal of a persistence diagram typically interpreted as noise?
Key Ideas
- **Filtration**, nested sequence of spaces; persistent homology tracks birth/death of cycles
- **Barcode / persistence diagram**, pairs (birth, death); points far from the diagonal = topological signal
- **Stability theorem:** d_B(Dgm(f), Dgm(g)) ≤ ||f-g||∞, robustness to noise
- **Applications:** TDA in ML, molecular analysis, neuroscience, porous material analysis
Related Topics
Persistent homology bridges topology and computational mathematics:
- Homology — Persistent homology is a parametric family of ordinary homologies organized by a filtration
- Morse Theory — Filtration by a Morse function = handle theorem; persistence = pairing of critical points
- Topological Data Analysis — TDA is the applied realization of persistent homology for real-world data analysis
Вопросы для размышления
- How do one choose a "persistence threshold" to separate signal from noise in real data?
- Why does the algorithm for computing persistent homology have cubic complexity, and how can this be addressed?
- Can persistent homology be used as a "distance" between shapes? Which metric properties does it satisfy?