Riemann Curvature Tensor
Nickel and Kiela embedded WordNet (82k nouns, deep hierarchy) into a 2-dimensional hyperbolic disk in 2017 and beat 200-dimensional Euclidean embeddings. Same Riemann tensor that bends spacetime around a black hole - now packing semantic trees.
- **General relativity:** Einstein equations G_μν = 8πT_μν. Gravity is the curvature of 4D spacetime
- **Poincaré embeddings:** embedding WordNet in H² reduces distortion from 50% to 10% versus Euclidean space of the same dimension
- **Riemannian optimization:** curvature affects convergence - in K < 0 spaces some algorithms converge faster, in K > 0 spaces geodesics focus
Предварительные знания
The Riemann Curvature Tensor
The **Riemann tensor** measures how covariant differentiation fails to commute: R(X,Y)Z = ∇ₓ∇ᵧZ − ∇ᵧ∇ₓZ − ∇_{[X,Y]}Z. When curvature vanishes, differentiation order does not matter - the space is 'flat.'
In coordinates: Rˡₖᵢⱼ = ∂ᵢΓˡⱼₖ − ∂ⱼΓˡᵢₖ + ΓˡᵢₘΓᵐⱼₖ − ΓˡⱼₘΓᵐᵢₖ. This is a (1,3)-tensor. In n dimensions: n²(n²−1)/12 independent components (for n=4: 20).
| Space | Sectional K | Riemann tensor |
|---|---|---|
| Rⁿ (Euclidean) | 0 | R = 0 (flat) |
| Sⁿ(r) (sphere) | 1/r² | R_{ijkl} = (1/r²)(g_{ik}g_{jl}−g_{il}g_{jk}) |
| Hⁿ(r) (hyperbolic) | −1/r² | R_{ijkl} = −(1/r²)(g_{ik}g_{jl}−g_{il}g_{jk}) |
| General Riemannian | varies | R ≠ 0 (meaningful curvature) |
The Riemann tensor R(X,Y)Z measures the non-commutativity of covariant derivatives. What is the physical/geometric meaning?
Sectional Curvature, Ricci, and Scalar Curvature
**Sectional curvature** K(X,Y) = R(X,Y,Y,X)/(|X|²|Y|²−(X·Y)²) is the Gaussian curvature of the 2-dimensional section spanned by X and Y. It generalizes Gaussian curvature to higher dimensions. For n = 2 it coincides with the Gaussian curvature K.
**Ricci tensor** Ric(X,Y) = tr(Z ↦ R(Z,X)Y) - a contraction of the Riemann tensor. **Scalar curvature** S = tr(g⁻¹ Ric) - a further contraction. In general relativity: Einstein equations G_μν = Ric_μν − (S/2)g_μν = 8πT_μν.
**Hopf's theorem:** a complete connected Riemannian manifold with constant sectional curvature K: K > 0 is (covered by) Sⁿ; K = 0 is Rⁿ or a flat torus; K < 0 is hyperbolic space Hⁿ. In ML: Poincaré embeddings use Hⁿ (K < 0) for hierarchical data.
Scalar curvature of S³(R=1) equals:
Einstein Equations and Constant-Curvature Spaces
**Einstein's equations (1915):** G_μν ≡ Ric_μν − (S/2)g_μν = 8πG/c⁴ · T_μν. The left side is the geometry of spacetime (Einstein tensor); the right side is the distribution of energy and momentum. Wheeler: 'Matter tells space how to curve; space tells matter how to move.'
In ML, hyperbolic spaces (K < 0) embed hierarchical structures with low distortion: trees 'grow exponentially' and Hⁿ accommodates them naturally. **Poincaré disk model:** H² ≅ {x ∈ R²: |x| < 1} with metric ds² = 4/(1−|x|²)² Σ dxᵢ².
**Poincaré embeddings (Nickel & Kiela, 2017):** embedding WordNet into H² achieves ~10% distortion versus ~50% in Euclidean space of the same dimension. The reason: volume in Hⁿ grows as e^{(n-1)r}, matching the exponential growth of tree branching.
Why is hyperbolic space Hⁿ better suited for embedding hierarchical data (trees) than Euclidean Rⁿ?
Key Ideas
- **Riemann tensor** R(X,Y)Z = ∇ₓ∇ᵧZ − ∇ᵧ∇ₓZ − ∇_{[X,Y]}Z measures non-commutativity of covariant differentiation
- **Hierarchy:** R_{ijkl} → Ric_{ij} (contraction) → S (scalar). For Sⁿ(R): K=1/R², Ric=(n−1)K·g, S=n(n−1)K
- **Einstein equations:** Ric − (S/2)g = 8πT. Geometry (Ricci) equals matter (stress-energy tensor)
- **Constant-curvature spaces:** K>0 sphere, K=0 Euclidean, K<0 hyperbolic Hⁿ (used in Poincaré embeddings)
Related Topics
The Riemann tensor is the apex of the differential geometry hierarchy:
- Connections and Covariant Derivative — The Riemann tensor measures the non-commutativity of ∇ₓ and ∇ᵧ
- Gauss-Bonnet Theorem — Integrating scalar curvature S gives a topological invariant (Chern-Gauss-Bonnet)
- Differential Geometry in ML — Hyperbolic NNs, Poincaré GNN, natural gradient (Fisher metric as Ricci tensor)
Вопросы для размышления
- The Bianchi identity ∇[ₓR(Y,Z)] + (cyclic) = 0 implies ∇_μ G^μν = 0 in GR - conservation of energy-momentum. How does a geometric identity produce a physical conservation law?
- Poincaré embeddings place hierarchies in H². Why can a 2-dimensional hyperbolic space often outperform a 100-dimensional Euclidean space? What happens to the capacity of Hⁿ as curvature increases (R decreases)?
- In Riemannian optimization, curvature K affects convergence: K > 0 makes geodesics converge (sphere), K < 0 makes them diverge (hyperbolic). How should this influence the choice of learning rate in Riemannian gradient descent?