Topology
Topological Data Analysis
How does an algorithm find a hole in a molecule? How is a loop detected in neural firing data? How can the 'shape' of two point clouds be compared? TDA turns these intuitive questions into algorithms: persistent homology computes a stable 'topological barcode' of the data.
- **Drug discovery:** the shape of protein binding pockets is characterized by persistent homology; Gudhi/Ripser are used in drug discovery pipelines
- **Neuroscience:** hippocampal place cells encode space; TDA detects topological invariants of neural codes
- **Computer vision:** Mapper on CNN features reveals latent space structure; TDA as an explainability tool
Предварительные знания
The Vietoris-Rips Complex
The **Vietoris-Rips complex VR(X, ε)** is a simplicial complex built from a point cloud X: vertices are points of X, and a simplex [x₀, ..., xₖ] is added whenever d(xᵢ, xⱼ) ≤ ε for all pairs i, j. At ε=0 only vertices exist; as ε → ∞ the full simplex appears.
Choosing ε is the main problem: too small and the graph fragments; too large and everything collapses into one simplex. **Persistent homology** resolves this by building VR for all values of ε simultaneously and tracking when topological features are born and die.
For a point cloud sampled from a circle S¹ at the right scale ε, what Betti numbers do we expect?
Persistent Homology
**Persistent homology** studies how Hₙ(VR(X, ε)) changes as ε increases. Each topological feature is **born** at ε_birth and **dies** at ε_death. The pair (ε_birth, ε_death) is a **persistence pair**. Long-lived features are signal; short-lived ones are noise.
The **persistence diagram** is the set of points (b, d) in R² for each persistence pair. Points far from the diagonal (b = d) correspond to significant features. Points near the diagonal are noise. The **barcode** is an equivalent representation: horizontal bars [b, d].
On a persistence diagram, points far from the diagonal represent:
The Mapper Algorithm and Wasserstein Distance
The **Mapper algorithm** (Singh, Memoli, Carlsson 2007) builds a topological skeleton of data. Steps: 1. Choose a 'filter' f: X → R (e.g., projection onto the first PCA component). 2. Cover the image of f with overlapping intervals. 3. Cluster the preimage of each interval. 4. Connect clusters that share points.
The **Wasserstein distance** between two persistence diagrams is a metric stable under small perturbations of the data: W_p(D₁, D₂) = inf_{γ: D₁→D₂} (Σ ||p − γ(p)||_∞^p)^{1/p}, where the infimum is over all bijections (with the diagonal as a 'garbage bin'). This allows comparing topological shapes of different datasets.
**TDA libraries:** Gudhi (Python/C++) - full toolkit: VR, alpha complexes, cubical homology, Mapper, distances. Ripser (C++/Python) - fast persistent homology. scikit-tda - scikit-learn integration. Applications: drug discovery (molecule pocket shape), neuroscience (place cell topology), materials science (pore analysis).
A small Wasserstein distance between two persistence diagrams means:
TDA Applications
TDA has found applications across diverse fields. **Biomedicine:** tumor shape analysis (persistent homology distinguishes malignant from benign), protein pocket detection (β₁ identifies binding sites). **Neuroscience:** loops in hippocampal spike data encode spatial maps (place cells). **Materials:** pore analysis (β₂ counts closed voids in porous media).
| Domain | TDA Tool | What is Detected |
|---|---|---|
| Medicine | Persistent homology of images | Tumor shape, β₁ in microstructure |
| Neuroscience | VR on spike trains | Place cells, topology of neural codes |
| Finance | Persistent homology of time series | Market cycle loops, volatility structure |
| Materials science | Cubical homology | Pores, β₂ = closed voids |
| NLP | Persistence on word embeddings | Semantic clusters, analogy cycles |
| Computer vision | Mapper on CNN features | Latent space structure |
Why is H₁ (persistent 1D holes) useful for analyzing molecules?
Key Ideas
- **VR(X, ε):** simplices from points within distance ε; bridge from point cloud to topology
- **Persistent homology:** (birth, death) for each hole as ε varies; robust to noise
- **Persistence diagram:** points far from diagonal = significant features; near diagonal = noise
- **Mapper:** topological skeleton via filter + clustering; Wasserstein = metric on diagrams
Related Topics
TDA connects pure topology to real data:
- Homology — Persistent homology = parametric version of simplicial homology
- Topology in ML — TDA used in ML as feature engineering, regularization, and neural network analysis
Вопросы для размышления
- The stability theorem for persistent homology says: if data is perturbed by eps, the persistence diagram changes by at most eps (in Bottleneck distance). Why does this make TDA practical for noisy data?
- Alpha complexes are an alternative to Vietoris-Rips. What are their memory and time advantages? When is VR preferable?
- How can TDA be applied to a time series (e.g., a stock index)? Which filtration and complex fit best, and which topological features deserve attention?