Topology

Topological Data Analysis

How does an algorithm find a hole in a molecule? How is a loop detected in neural firing data? How can the 'shape' of two point clouds be compared? TDA turns these intuitive questions into algorithms: persistent homology computes a stable 'topological barcode' of the data.

**Drug discovery:** the shape of protein binding pockets is characterized by persistent homology; Gudhi/Ripser are used in drug discovery pipelines
**Neuroscience:** hippocampal place cells encode space; TDA detects topological invariants of neural codes
**Computer vision:** Mapper on CNN features reveals latent space structure; TDA as an explainability tool

Предварительные знания

de Rham Cohomology

The Vietoris-Rips Complex

The **Vietoris-Rips complex VR(X, ε)** is a simplicial complex built from a point cloud X: vertices are points of X, and a simplex [x₀, ..., xₖ] is added whenever d(xᵢ, xⱼ) ≤ ε for all pairs i, j. At ε=0 only vertices exist; as ε → ∞ the full simplex appears.

Choosing ε is the main problem: too small and the graph fragments; too large and everything collapses into one simplex. **Persistent homology** resolves this by building VR for all values of ε simultaneously and tracking when topological features are born and die.

For a point cloud sampled from a circle S¹ at the right scale ε, what Betti numbers do we expect?

Persistent Homology

**Persistent homology** studies how Hₙ(VR(X, ε)) changes as ε increases. Each topological feature is **born** at ε_birth and **dies** at ε_death. The pair (ε_birth, ε_death) is a **persistence pair**. Long-lived features are signal; short-lived ones are noise.

The **persistence diagram** is the set of points (b, d) in R² for each persistence pair. Points far from the diagonal (b = d) correspond to significant features. Points near the diagonal are noise. The **barcode** is an equivalent representation: horizontal bars [b, d].

On a persistence diagram, points far from the diagonal represent:

The Mapper Algorithm and Wasserstein Distance

The **Mapper algorithm** (Singh, Memoli, Carlsson 2007) builds a topological skeleton of data. Steps: 1. Choose a 'filter' f: X → R (e.g., projection onto the first PCA component). 2. Cover the image of f with overlapping intervals. 3. Cluster the preimage of each interval. 4. Connect clusters that share points.

The **Wasserstein distance** between two persistence diagrams is a metric stable under small perturbations of the data: W_p(D₁, D₂) = inf_{γ: D₁→D₂} (Σ ||p − γ(p)||_∞^p)^{1/p}, where the infimum is over all bijections (with the diagonal as a 'garbage bin'). This allows comparing topological shapes of different datasets.

**TDA libraries:** Gudhi (Python/C++) - full toolkit: VR, alpha complexes, cubical homology, Mapper, distances. Ripser (C++/Python) - fast persistent homology. scikit-tda - scikit-learn integration. Applications: drug discovery (molecule pocket shape), neuroscience (place cell topology), materials science (pore analysis).

A small Wasserstein distance between two persistence diagrams means:

TDA Applications

TDA has found applications across diverse fields. **Biomedicine:** tumor shape analysis (persistent homology distinguishes malignant from benign), protein pocket detection (β₁ identifies binding sites). **Neuroscience:** loops in hippocampal spike data encode spatial maps (place cells). **Materials:** pore analysis (β₂ counts closed voids in porous media).

Domain	TDA Tool	What is Detected
Medicine	Persistent homology of images	Tumor shape, β₁ in microstructure
Neuroscience	VR on spike trains	Place cells, topology of neural codes
Finance	Persistent homology of time series	Market cycle loops, volatility structure
Materials science	Cubical homology	Pores, β₂ = closed voids
NLP	Persistence on word embeddings	Semantic clusters, analogy cycles
Computer vision	Mapper on CNN features	Latent space structure

Why is H₁ (persistent 1D holes) useful for analyzing molecules?

Key Ideas

**VR(X, ε):** simplices from points within distance ε; bridge from point cloud to topology
**Persistent homology:** (birth, death) for each hole as ε varies; robust to noise
**Persistence diagram:** points far from diagonal = significant features; near diagonal = noise
**Mapper:** topological skeleton via filter + clustering; Wasserstein = metric on diagrams

Вопросы для размышления

The stability theorem for persistent homology says: if data is perturbed by eps, the persistence diagram changes by at most eps (in Bottleneck distance). Why does this make TDA practical for noisy data?
Alpha complexes are an alternative to Vietoris-Rips. What are their memory and time advantages? When is VR preferable?
How can TDA be applied to a time series (e.g., a stock index)? Which filtration and complex fit best, and which topological features deserve attention?

Связанные уроки

ml-19-pca