Information Geometry
e-connection and m-connection: dual geometry
In 1982 Amari proved that on the space of Gaussian distributions two incompatible straight-line geometries exist: e and m. The EM algorithm, used to train 90% of mixture models in industry, turned out to be an alternation of two projections in these geometries.
- Google DeepMind uses natural gradient (K-FAC) with e-curvature for training large language models -- a direct application of e/m-connection theory to real neural networks.
- Variational inference in probabilistic programming (Pyro, Stan) relies on m-projections onto factorized families.
Exponential (e) connection
In 1982 Shun-ichi Amari proved that on a statistical manifold there exist two canonical affine connections: exponential (+1) and mixture (-1). Together they form a dually flat structure that explains why the EM algorithm converges without leaving the manifold. In 2024 neural network optimizers like K-FAC use e-curvature for gradient preconditioning.
The e-geodesic between two distributions p and q in an exponential family is...
Geometry of the EM algorithm
The EM algorithm (Dempster, Laird, Rubin, 1977) alternates two steps. Amari showed in 1995: the E-step is an m-projection onto the subset of distributions with fixed latent sufficient statistics; the M-step is an e-projection onto the parametric family. Convergence is guaranteed by the KL Pythagorean theorem.
The E-step of the EM algorithm geometrically corresponds to...
Key results
- e-connection (+1) has straight geodesics in exponential (eta) coordinates.
- m-connection (-1) has straight geodesics in moment (mu) coordinates.
- KL Pythagorean theorem guarantees monotone likelihood increase in EM.
- E-step = m-projection; M-step = e-projection.