Topology
Stability Theorem: Why Persistence is Robust to Noise
Цели урока
- Understand the bottleneck-distance definition and the role of the diagonal
- State the stability theorem and recognise its conditions
- Distinguish situations calling for Wasserstein vs bottleneck
- Apply stability to estimate TDA-feature reproducibility
Предварительные знания
- Persistent homology and barcodes
- Vietoris-Rips and Cech filtrations
- Critical points and tame functions
- L_p norms on function spaces
Without stability TDA would be mathematical exotica with no noisy-data applications.
- **Biomedical imaging**: repeat fMRI/CT/MRI yields reproducible barcodes for one patient
- **Drug screening**: molecular conformations classified by topological invariants without sensitivity to thermal noise
- **Anomaly detection**: bottleneck distance acts as a quality-control metric in industrial tomography
- **ML robustness**: activation persistence diagrams compared for clean vs adversarial inputs
The 2007 proof
The paper Stability of Persistence Diagrams in Discrete & Computational Geometry was a point of no return for TDA. Before this publication, persistence was elegant mathematics with unclear practical status: it was unclear whether the theory applied to real data at all. The authors showed that bottleneck distance between diagrams is bounded by the L_infinity norm of the difference of generating functions. Edelsbrunner and Carlsson then extended the result to Rips filtrations on point clouds, and TDA finally became an applicable engineering discipline. In 2009 Chazal et al. obtained an algebraic version of stability for arbitrary persistence modules.
Bottleneck Distance: A Metric on Diagrams
A protein structure measured twice by independent crystallography setups differs by nanometers. A point cloud of one thousand tumor cells looks slightly different in every biopsy. If persistent homology produces a barcode and each new measurement shifts that barcode entirely, the whole TDA discipline becomes useless for biology. In the mid-2000s, TDA faced exactly this existential question: is the tool stable under noise at all?
To answer, one must first measure distance between two persistence diagrams. A diagram is a multiset of points (b_i, d_i) above the diagonal birth = death. Points on the diagonal represent trivial features - born and dying at the same instant. Any reasonable metric must allow a point of one diagram to be matched either to a point of the other or to its projection onto the diagonal - meaning the feature disappears or appears.
The name bottleneck highlights the essence: the metric is a max over pairs, not a sum. One badly matched point determines the entire distance, even if a thousand others match perfectly.
Geometric intuition
Two diagrams of H_1 from separate biopsies
First diagram has points (0.10, 1.20), (0.30, 0.90), (0.20, 0.25). Second diagram has (0.12, 1.18), (0.31, 0.88), and a point (0.21, 0.23) near the diagonal. Optimal matching: the first two pairs match each other, the third point of each diagram matches the diagonal. Bottleneck distance is the maximum shift among these matches.
Bottleneck distance compares topological signatures of distinct neural-network architectures: compute the persistence diagram of layer activations on a test set, then compare networks by d_B. Architectures with low bottleneck distance learn similar features.
The norm inside bottleneck distance is L_infinity on R^2, not Euclidean. Distance between (b1, d1) and (b2, d2) equals max(|b1 - b2|, |d1 - d2|). This reflects the nature of sublevel-set filtration: a level shift of epsilon moves birth and death simultaneously by the same amount.
Why does the bottleneck-distance definition augment diagrams with the full diagonal?
Persistence diagrams can have differing numbers of off-diagonal points. Without the diagonal, no bijection exists. Adding the diagonal with infinite multiplicity lets excess features pair up with trivial ones.
Stability Theorem: Statement and Meaning
December 2005. David Cohen-Steiner, Herbert Edelsbrunner, and John Harer finish a manuscript stating an inequality that still grounds all of TDA today. The idea is almost embarrassingly simple: if two scalar functions f and g are close in L_infinity, their persistence diagrams are close in bottleneck distance. A small data perturbation produces a small topological-portrait perturbation.
Geometric reformulation: if two point clouds are epsilon-close in Hausdorff distance, their Vietoris-Rips persistence diagrams are epsilon-close in bottleneck distance. This extends the theorem from functions to actual datasets.
Controlled noise
Signal corruption
Let f(x) be a point-cloud density function, and g(x) = f(x) + epsilon * noise(x) where noise has amplitude 0.05. Then L_infinity distance is at most 0.05, hence bottleneck distance between diagrams is also at most 0.05. Long-lived features with persistence above 0.10 are guaranteed to survive - none vanish, none spurious ones appear.
For functions without tameness (infinitely oscillating critical points such as sin(1/x) near zero), the theorem in its original form fails. In practice this is rare, but careful application to theoretical function spaces requires checking.
Stability justifies applying TDA to noisy production-system data: IoT sensors, biomedical measurements, photos from various cameras. Long-lived topological features become genuine object properties, not artefacts of one particular measurement.
The proof rests on a simple idea: consider the sublevel-set filtration of f and the parallel filtration of g, offset by epsilon = L_infinity-norm of the difference. Every homology class living in the f-filtration over [b, d] lives in the g-filtration no later than b - epsilon and no earlier than d + epsilon. This bounds the shift of each diagram point.
What does stability mean practically for biological data?
Stability bounds the maximum shift of diagram points. A feature with persistence p and shift below p/2 will not vanish: its point cannot approach the diagonal. This is exactly noise robustness of biological topological signatures.
Wasserstein Stability and Lipschitz Variants
Bottleneck distance sees only the worst match. Sometimes the whole diagram matters: if one feature shifted strongly while a thousand shifted slightly, bottleneck cannot distinguish this from the case where only one shifted. Wasserstein metrics give a more sensitive estimate by summing individual shifts.
Algebraic stability (Chazal, de Silva, Glisse, Oudot, 2009): every 1-Lipschitz bound on interleaving distance for persistence modules yields a bound on bottleneck distance. This generalises the classical 2007 theorem to abstract modules.
Choosing p
When to use which metric
Anomaly detection on a production line: one strongly shifted point signals defect, smaller shifts are noise. Bottleneck (W_infinity) is ideal. Comparing two molecular shapes: full similarity matters, not just worst case. W_2 provides a richer distance. Searching duplicates in a 3D-scan dataset: W_1 is most robust to registration error.
Computing W_p for two diagrams with n points is an optimal-transport problem of complexity O(n^3) via the Hungarian algorithm or O(n^{2.5}) with auction methods. For n above several thousand, approximations are needed (sliced Wasserstein, entropic regularization).
Differentiability of sliced Wasserstein enabled neural-network training with topological losses. The PersLay and TopNet architectures embed persistence diagrams directly in the forward pass, minimising a W-metric between batch diagram and template.
Practically: at p=1 algebraic stability does not hold in general - an extra condition on total persistence is required. At p=2 the stability inequality holds almost always, but with a constant depending on the diameter of function support. At p=infinity the classical 2007 theorem holds with no extra conditions.
Which metric is best for anomaly detection where a single strongly deviating feature matters?
Anomaly detection wants to spot one rare deviation against a backdrop of normality. Bottleneck does not dilute the outlier with other matches and returns a large value the moment one feature shifts strongly. W_1 and W_2 would dampen the outlier among thousands of normal matches.
Applications: Why TDA Can Be Trusted
The stability theorem is not abstract decoration. It is a quality certificate for every downstream TDA application. Without it, persistence would be no different from a random feature: pretty, but unpredictable. With it, TDA becomes an engineering discipline with stability guarantees and error estimates.
Three application types where stability is critical: (1) noisy sensor data where each measurement yields a slightly different curve, (2) biological experiments with natural sample variability, (3) approximate simplicial-complex computations when the full Cech complex is intractable.
fMRI: measurement reproducibility
Two scans of one brain
A patient receives fMRI scans one hour apart. Raw activity time series differ due to thermal noise, head motion, and scanner drift. Hausdorff distance between activity point clouds is around 0.03 normalized signal units. By the stability theorem, bottleneck distance between persistence diagrams is also at most 0.03. Features with persistence above 0.10 (corresponding to real functional brain networks) are guaranteed to reproduce. Without stability, the discovered topology could not be claimed as a brain property rather than a measurement artefact.
Adversarial perturbations alter neural-network activations on input. If the topology of the decision boundary before and after an attack has a large bottleneck distance, the model is catastrophically sensitive. Comparing persistence diagrams of clean vs noisy classifications quantifies adversarial robustness.
The theorem provides an upper bound, not a guarantee that a feature retains its position. If a feature lies near the diagonal (short persistence) and noise is comparable in size, it disappears completely. Production systems therefore filter by minimum persistence.
Drug discovery
Comparing molecular conformations
A molecule in solution constantly switches conformations. Molecular dynamics generates thousands of 3D structures of one protein. Persistence diagrams of each conformation differ slightly. Stability bounds the bottleneck distance between diagrams by the average conformational RMSD. This allows clustering conformations by topological class and discovering functionally distinct protein states.
In production TDA, stability is usually invoked implicitly: users of Gudhi, Ripser, or Giotto-TDA libraries seldom think about the theorem. Yet stability is exactly what justifies the cloud to simplicial complex to persistence diagram to vectorization to ML-model pipeline as reproducible.
Why does stability give TDA an edge over classical shape descriptors for biomedical data?
The practical edge is a quantitative guarantee that identical objects yield identical features and distinct objects yield distinct features. Without stability, these guarantees would need empirical verification on each dataset.
Where stability appears next
The stability theorem is the root of trust for every downstream TDA application in ML and biology.
- Bottleneck and Wasserstein in depth — Related topic
- Mapper advanced — Related topic
- TDA in neural networks — Related topic
- TDA for time series — Related topic
Key ideas
- Bottleneck distance - a metric on persistence diagrams via optimal matching with the diagonal
- Stability theorem: d_B(Dgm(f), Dgm(g)) <= ||f - g||_infinity for tame functions
- Wasserstein metrics span a sensitivity spectrum from worst-case to total deviation
- Stability is the certificate of TDA applicability on noisy real-world data
- Algebraic stability extends to arbitrary persistence modules
Вопросы для размышления
- Which feature of modern production TDA systems directly relies on the 2007 stability theorem?
- Why is bottleneck distance sensitive to outliers while Wasserstein is not?
- How would the status of TDA differ if a stability analogue did not exist?