Information Geometry

Persistent Homology for Data

Clustering and PCA miss loops, tunnels, and voids in data. Topological data analysis via persistent homology captures multi-scale shape features and is provably stable to noise: the stability theorem guarantees small data perturbations produce small changes in topological summaries.

New breast cancer subtype (Ayasdi, 2013): persistent homology found a subtype with 100% survival rate invisible to standard clustering - topological loops in genomic data flagged a new structure
Neural network topology: persistent homology of weight space correlates with generalization ability and adversarial robustness; topological regularization improves model training
Ripser (2016): computes persistent homology for 10,000 points in R^{100} in 2 seconds; used in structural bioinformatics for protein conformational space analysis
Financial time series: persistence diagrams change at market regime transitions - topological features provide leading indicators of volatility spikes

Предварительные знания

Simplicial complex homology and Betti numbers
Hausdorff distance between sets
Basic algebraic topology

Persistent Homology and Persistence Diagrams

Persistent homology tracks topological features of a point cloud across all scales simultaneously. A filtration is built - a nested sequence of simplicial complexes K_0 subset K_1 subset ... subset K_n indexed by scale t. Topological features (connected components, loops, voids) are born at scale b and die at scale d > b. Each birth-death pair (b, d) defines a point on the persistence diagram. Persistence = d - b measures feature longevity: large means real structure, small means noise.

Wasserstein distance W_p(D_1, D_2)^p = inf_gamma sum_{p in D_1} ||p - gamma(p)||^p is an alternative to bottleneck distance. More sensitive to small features. Differentiable almost everywhere, making it usable as a loss function in topology-regularized learning tasks.

What does a point on the persistence diagram far from the diagonal mean?

Persistence = d - b is the scale range over which the feature exists. A large value means real structure (a cycle or component). Points near the diagonal have persistence near zero: they are born and die quickly, characteristic of noise. The stability theorem guarantees this separation is robust to small data perturbations.

Topological Features in Machine Learning

To apply TDA in ML pipelines, persistence diagrams must be converted to vectors. Three main approaches: persistence landscape (L^2 function), persistent image (2D histogram), and topological losses (Wasserstein-regularized). Each preserves different aspects of topological information with different degrees of differentiability for optimization.

Giotto-TDA, Gudhi, and Ripser are Python TDA libraries. Giotto-TDA integrates with scikit-learn and PyTorch for end-to-end differentiable learning with topological features. Ripser computes H_0 and H_1 for 10,000 points in R^100 in seconds using sparse distance matrices and algorithmic reductions from discrete Morse theory.

Why do we need persistent images or landscapes instead of directly using persistence diagrams in ML?

A persistence diagram is a multiset of points (b_i, d_i). It cannot be directly fed into an SVM or neural network as a fixed-size input. Persistent image and landscape convert it to a vector in R^n or L^2 where standard ML tools work. The stability theorem guarantees that the vectorization inherits the stability: small data perturbations give small image changes.

Morse Theory and the Foundation of Persistence

Persistent homology is a computational extension of Morse theory from smooth functions to arbitrary point cloud data. For a smooth function f: M to R, Morse theory relates critical points to topological changes in sublevel sets. Persistence captures exactly the same information: each birth-death pair corresponds to a pair of critical points, with persistence = difference in critical values.

Discrete Morse theory (Forman, 1998) extends Morse theory to CW complexes and simplicial complexes. It provides an efficient reduction algorithm: collapse simplices in gradient pairs to produce a smaller complex with the same homology. Ripser exploits discrete Morse reductions internally to achieve its speed - the theoretical backbone enabling TDA at scale.

What is the connection between Morse cancellation and noise filtering in TDA?

The Morse cancellation theorem says: a birth-death pair with persistence epsilon can be cancelled by a perturbation of size O(epsilon) to f. So features with small persistence are removable by small function changes - they are noise. Features with large persistence require large perturbations to remove - they are signal. This is the mathematical foundation of the signal/noise separation in TDA.

Connections to other topics

Persistent homology connects algebraic topology with machine learning and statistics via stable, computable topological invariants.

Algebraic topology — Related topic
Metric geometry — Related topic
Machine learning features — Related topic
Morse theory — Related topic

Итоги

Rips filtration K_t: simplicial complex from pairwise distances, growing monotonically with scale t
Persistence diagram: points (b_i, d_i) above the diagonal; pers = d_i - b_i is the feature lifetime
Far from diagonal = long-lived topological feature (real structure); near diagonal = noise
Stability theorem: d_B(Dgm(f), Dgm(g)) <= ||f - g||_inf - robustness to data perturbations
Persistence landscape lambda_k in L^2: vectorization of diagrams for statistics and machine learning

Persistent Homology and Persistence Diagrams

What does a point on the persistence diagram far from the diagonal mean?

Topological Features in Machine Learning

Why do we need persistent images or landscapes instead of directly using persistence diagrams in ML?

Morse Theory and the Foundation of Persistence

What is the connection between Morse cancellation and noise filtering in TDA?

Итоги

Rips filtration K_t: simplicial complex from pairwise distances, growing monotonically with scale t

Persistence diagram: points (b_i, d_i) above the diagonal; pers = d_i - b_i is the feature lifetime

Far from diagonal = long-lived topological feature (real structure); near diagonal = noise

Stability theorem: d_B(Dgm(f), Dgm(g)) <= ||f - g||_inf - robustness to data perturbations

Persistence landscape lambda_k in L^2: vectorization of diagrams for statistics and machine learning