Information Geometry

Dually flat manifolds and Bregman divergences

Amari's dually flat structure (1985) unified in one theorem: KL divergence, EM algorithm, exponential families, and mirror descent. AlphaFold 2 (DeepMind, 2021) uses projections onto valid conformations -- essentially Bregman m-projections.

  • Adagrad and its descendants (Adam, RMSProp) implicitly approximate the dually flat geometry.
  • Explicit natural gradient with e-connection reduces iterations by 30-40% on language model training tasks.

Dually flat structure

A manifold is dually flat if there exist coordinates theta and eta such that the e-connection is flat in theta, the m-connection is flat in eta, and they are linked by a Legendre transform through potential functions psi(theta) and phi(eta). Amari showed this is exactly the class of exponential families. In 2019 DeepMind applied this structure to analyze policy spaces in RL.

The Fisher information matrix on a dually flat manifold equals...

Projection theorem and mirror descent

Projection onto an e-flat subset minimizes KL(q||p) over q, while projection onto an m-flat subset minimizes KL(p||q) over q. Mirror descent (Nemirovsky, Yudin 1983) is iterative m-projection onto constraints; AlphaFold 2 uses this to project structures onto valid conformations.

Mirror descent with potential phi(x) = sum x_i log x_i (Shannon entropy) on the simplex is...

Key results

  • A dually flat manifold has two potentials psi and phi linked by Legendre transform.
  • Fisher metric = Hess(psi) = [Hess(phi)]^{-1}.
  • Bregman divergence generalizes KL and Euclidean distance.
  • Mirror descent = iterative m-projections with potential phi.
Dually flat manifolds and Bregman divergences

0

1

Sign In