Topology

Topological Data Analysis

How does an algorithm find a hole in a molecule? How is a loop detected in neural firing data? How can the 'shape' of two point clouds be compared? TDA turns these intuitive questions into algorithms: persistent homology computes a stable 'topological barcode' of the data.

  • **Drug discovery:** the shape of protein binding pockets is characterized by persistent homology; Gudhi/Ripser are used in drug discovery pipelines
  • **Neuroscience:** hippocampal place cells encode space; TDA detects topological invariants of neural codes
  • **Computer vision:** Mapper on CNN features reveals latent space structure; TDA as an explainability tool

Предварительные знания

  • de Rham Cohomology

The Vietoris-Rips Complex

The **Vietoris-Rips complex VR(X, ε)** is a simplicial complex built from a point cloud X: vertices are points of X, and a simplex [x₀, ..., xₖ] is added whenever d(xᵢ, xⱼ) ≤ ε for all pairs i, j. At ε=0 only vertices exist; as ε → ∞ the full simplex appears.

Choosing ε is the main problem: too small and the graph fragments; too large and everything collapses into one simplex. **Persistent homology** resolves this by building VR for all values of ε simultaneously and tracking when topological features are born and die.

For a point cloud sampled from a circle S¹ at the right scale ε, what Betti numbers do we expect?

Persistent Homology

**Persistent homology** studies how Hₙ(VR(X, ε)) changes as ε increases. Each topological feature is **born** at ε_birth and **dies** at ε_death. The pair (ε_birth, ε_death) is a **persistence pair**. Long-lived features are signal; short-lived ones are noise.

The **persistence diagram** is the set of points (b, d) in R² for each persistence pair. Points far from the diagonal (b = d) correspond to significant features. Points near the diagonal are noise. The **barcode** is an equivalent representation: horizontal bars [b, d].

On a persistence diagram, points far from the diagonal represent:

The Mapper Algorithm and Wasserstein Distance

The **Mapper algorithm** (Singh, Memoli, Carlsson 2007) builds a topological skeleton of data. Steps: 1. Choose a 'filter' f: X → R (e.g., projection onto the first PCA component). 2. Cover the image of f with overlapping intervals. 3. Cluster the preimage of each interval. 4. Connect clusters that share points.

The **Wasserstein distance** between two persistence diagrams is a metric stable under small perturbations of the data: W_p(D₁, D₂) = inf_{γ: D₁→D₂} (Σ ||p − γ(p)||_∞^p)^{1/p}, where the infimum is over all bijections (with the diagonal as a 'garbage bin'). This allows comparing topological shapes of different datasets.

**TDA libraries:** Gudhi (Python/C++) - full toolkit: VR, alpha complexes, cubical homology, Mapper, distances. Ripser (C++/Python) - fast persistent homology. scikit-tda - scikit-learn integration. Applications: drug discovery (molecule pocket shape), neuroscience (place cell topology), materials science (pore analysis).

A small Wasserstein distance between two persistence diagrams means:

TDA Applications

TDA has found applications across diverse fields. **Biomedicine:** tumor shape analysis (persistent homology distinguishes malignant from benign), protein pocket detection (β₁ identifies binding sites). **Neuroscience:** loops in hippocampal spike data encode spatial maps (place cells). **Materials:** pore analysis (β₂ counts closed voids in porous media).

DomainTDA ToolWhat is Detected
MedicinePersistent homology of imagesTumor shape, β₁ in microstructure
NeuroscienceVR on spike trainsPlace cells, topology of neural codes
FinancePersistent homology of time seriesMarket cycle loops, volatility structure
Materials scienceCubical homologyPores, β₂ = closed voids
NLPPersistence on word embeddingsSemantic clusters, analogy cycles
Computer visionMapper on CNN featuresLatent space structure

Why is H₁ (persistent 1D holes) useful for analyzing molecules?

Key Ideas

  • **VR(X, ε):** simplices from points within distance ε; bridge from point cloud to topology
  • **Persistent homology:** (birth, death) for each hole as ε varies; robust to noise
  • **Persistence diagram:** points far from diagonal = significant features; near diagonal = noise
  • **Mapper:** topological skeleton via filter + clustering; Wasserstein = metric on diagrams

Related Topics

TDA connects pure topology to real data:

  • Homology — Persistent homology = parametric version of simplicial homology
  • Topology in ML — TDA used in ML as feature engineering, regularization, and neural network analysis

Вопросы для размышления

  • The stability theorem for persistent homology says: if data is perturbed by eps, the persistence diagram changes by at most eps (in Bottleneck distance). Why does this make TDA practical for noisy data?
  • Alpha complexes are an alternative to Vietoris-Rips. What are their memory and time advantages? When is VR preferable?
  • How can TDA be applied to a time series (e.g., a stock index)? Which filtration and complex fit best, and which topological features deserve attention?

Связанные уроки

  • ml-19-pca
Topological Data Analysis

0

1

Sign In