Optimal Transport
Flow Matching and Continuous OT
2022: Flow Matching (Lipman et al., Meta AI) generates images in 10 steps instead of 1000 for diffusion models. The key is straight-line trajectories via optimal transport rather than random zigzags.
- **Stable Diffusion 3, Flux:** built on OT-FM; SD3 generates in 28 steps what SDXL needs 50+ steps for, at better quality
- **Molecular generation:** FoldFlow, FrameFlow, and AlphaFlow apply flow matching to 3D protein structure generation - straight paths in conformation space (RFDiffusion and AlphaFold 3 are pure diffusion; FM models are the next generation)
- **Speech synthesis:** Voicebox (Meta), E2 TTS - flow matching generates speech in a handful of iterations vs hundreds for diffusion-based models
Предварительные знания
Continuous Flows and the Continuity Equation
A generative model is a map from a simple distribution (Gaussian noise) to a complex one (images, speech, molecules). Instead of a single-step mapping, we can parameterize a continuous family of maps indexed by time t ∈ [0, 1]: a **flow**.
A **flow** is defined by an ODE: dx/dt = v_t(x), with x(0) ~ p_0 (noise) and x(1) ~ p_1 (data). The ODE solution defines the flow map phi_t: phi_t(x_0) = x_t. The **continuity equation** describes how the density evolves: ∂ρ_t/∂t + div(ρ_t · v_t) = 0 Given v_t, the density ρ_t is uniquely determined. Given endpoints ρ_0 and ρ_1, the task is to find a v_t that does the job.
The continuity equation is a conservation law: mass (probability) is neither created nor destroyed, only transported. Given the velocity field v_t, the evolution of the density is determined uniquely. Learning a generative model reduces to finding v_t that transforms p_0 into p_1.
The continuity equation ∂ρ_t/∂t + div(ρ_t · v_t) = 0 states that:
Flow Matching: Learning the Velocity Field
How is a neural network v_θ(t, x) trained so that the resulting flow transports noise into data? Directly minimizing the distance between ρ_1 and p_1 requires simulating the ODE at every training step, which is computationally prohibitive. Lipman et al. (2022) proposed a smarter approach.
**Conditional Flow Matching (CFM):** for each pair (x_0, x_1) construct the conditional path x_t = (1-t)x_0 + t*x_1 with conditional velocity u_t(x|x_1) = x_1 - x_0. Training objective: L_CFM(θ) = E_{t, x_0, x_1} ||v_θ(t, x_t) - u_t(x_t|x_1)||² where x_t = (1-t)x_0 + t*x_1. No ODE simulation required. Lipman et al. proved: the marginal velocity field minimizing L_CFM equals the optimal Flow Matching field - same optimum, cheaper loss.
The core insight: instead of the intractable marginal velocity field, train on the tractable conditional velocity for specific pairs (x_0, x_1). The two losses share the same global minimum. This makes Flow Matching as simple to train as diffusion models - without complex score matching formulas.
What is the main computational advantage of Conditional Flow Matching over directly minimizing the distance between ρ_1 and p_1?
OT Flow Matching: Straight-Line Paths
2022: Flow Matching (Lipman et al., Meta AI) generates images in 10 steps instead of 1000 for diffusion models. The key is straight-line trajectories via optimal transport rather than random zigzags.
**OT-FM:** replace independent coupling (x_0 independent of x_1) with the OT coupling pi* from Brenier's theorem (minimum transport cost). This pairs up close points. The linear paths x_t = (1-t)x_0 + t*x_1 then become nearly straight - minimal curvature. **Why this matters:** the network learns an almost-constant velocity field, so numerical ODE integration needs very few steps - generation in 10-30 steps vs 1000 for DDPM.
OT-FM powers Stable Diffusion 3 (SD3), Flux, and other production models. The advantage over DDPM goes beyond speed: straight paths mean more interpretable interpolation in latent space, which is valuable for image editing and guided generation.
Why does the OT coupling in OT-FM reduce the number of generation steps compared to independent coupling?
Summary
- **Flow** dx/dt = v_t(x) defines a continuous map p_0 → p_1; the continuity equation ∂ρ/∂t + div(ρv) = 0 links the velocity field to density evolution
- **Conditional Flow Matching:** train v_θ on conditional velocities u_t(x|x_1) = x_1 - x_0 without ODE simulation - mathematically equivalent to direct minimization, but orders of magnitude cheaper
- **OT-FM:** OT coupling pairs close points → straight paths → nearly constant velocity field → generation in 10-30 steps vs 1000 for DDPM; powers SD3, Flux, Voicebox
Related Topics
Flow Matching bridges optimal transport and modern generative modeling:
- Brenier's Theorem and OT — OT-FM uses the OT coupling from Brenier's theorem to construct optimal (x_0, x_1) pairs with minimum total transport cost
- Diffusion Models (DDPM) — FM is the natural evolution of diffusion: same noise-to-data principle, but via a deterministic ODE instead of a stochastic SDE, yielding straight paths and fewer steps
Вопросы для размышления
- Linear paths x_t = (1-t)x_0 + t*x_1 are the simplest choice but not the only one. What other path families are available and what trade-offs do they introduce?
- OT-FM with W_2 coupling gives straight paths in the original data space. What happens when OT is applied in a latent space instead of pixel space?
- Flow Matching uses a deterministic ODE; DDPM uses a stochastic SDE. Are there tasks where the stochasticity is essential and FM would be at a disadvantage?