Optimal Transport

Flow Matching and Continuous OT

2022: Flow Matching (Lipman et al., Meta AI) generates images in 10 steps instead of 1000 for diffusion models. The key is straight-line trajectories via optimal transport rather than random zigzags.

**Stable Diffusion 3, Flux:** built on OT-FM; SD3 generates in 28 steps what SDXL needs 50+ steps for, at better quality
**Molecular generation:** FoldFlow, FrameFlow, and AlphaFlow apply flow matching to 3D protein structure generation - straight paths in conformation space (RFDiffusion and AlphaFold 3 are pure diffusion; FM models are the next generation)
**Speech synthesis:** Voicebox (Meta), E2 TTS - flow matching generates speech in a handful of iterations vs hundreds for diffusion-based models

Предварительные знания

Brenier's Theorem

Continuous Flows and the Continuity Equation

A generative model is a map from a simple distribution (Gaussian noise) to a complex one (images, speech, molecules). Instead of a single-step mapping, we can parameterize a continuous family of maps indexed by time t ∈ [0, 1]: a **flow**.

A **flow** is defined by an ODE: dx/dt = v_t(x), with x(0) ~ p_0 (noise) and x(1) ~ p_1 (data). The ODE solution defines the flow map phi_t: phi_t(x_0) = x_t. The **continuity equation** describes how the density evolves: ∂ρ_t/∂t + div(ρ_t · v_t) = 0 Given v_t, the density ρ_t is uniquely determined. Given endpoints ρ_0 and ρ_1, the task is to find a v_t that does the job.

The continuity equation is a conservation law: mass (probability) is neither created nor destroyed, only transported. Given the velocity field v_t, the evolution of the density is determined uniquely. Learning a generative model reduces to finding v_t that transforms p_0 into p_1.

The continuity equation ∂ρ_t/∂t + div(ρ_t · v_t) = 0 states that:

Flow Matching: Learning the Velocity Field

How is a neural network v_θ(t, x) trained so that the resulting flow transports noise into data? Directly minimizing the distance between ρ_1 and p_1 requires simulating the ODE at every training step, which is computationally prohibitive. Lipman et al. (2022) proposed a smarter approach.

**Conditional Flow Matching (CFM):** for each pair (x_0, x_1) construct the conditional path x_t = (1-t)x_0 + t*x_1 with conditional velocity u_t(x|x_1) = x_1 - x_0. Training objective: L_CFM(θ) = E_{t, x_0, x_1} ||v_θ(t, x_t) - u_t(x_t|x_1)||² where x_t = (1-t)x_0 + t*x_1. No ODE simulation required. Lipman et al. proved: the marginal velocity field minimizing L_CFM equals the optimal Flow Matching field - same optimum, cheaper loss.

The core insight: instead of the intractable marginal velocity field, train on the tractable conditional velocity for specific pairs (x_0, x_1). The two losses share the same global minimum. This makes Flow Matching as simple to train as diffusion models - without complex score matching formulas.

What is the main computational advantage of Conditional Flow Matching over directly minimizing the distance between ρ_1 and p_1?

OT Flow Matching: Straight-Line Paths

2022: Flow Matching (Lipman et al., Meta AI) generates images in 10 steps instead of 1000 for diffusion models. The key is straight-line trajectories via optimal transport rather than random zigzags.

**OT-FM:** replace independent coupling (x_0 independent of x_1) with the OT coupling pi* from Brenier's theorem (minimum transport cost). This pairs up close points. The linear paths x_t = (1-t)x_0 + t*x_1 then become nearly straight - minimal curvature. **Why this matters:** the network learns an almost-constant velocity field, so numerical ODE integration needs very few steps - generation in 10-30 steps vs 1000 for DDPM.

OT-FM powers Stable Diffusion 3 (SD3), Flux, and other production models. The advantage over DDPM goes beyond speed: straight paths mean more interpretable interpolation in latent space, which is valuable for image editing and guided generation.

Why does the OT coupling in OT-FM reduce the number of generation steps compared to independent coupling?

Summary

**Flow** dx/dt = v_t(x) defines a continuous map p_0 → p_1; the continuity equation ∂ρ/∂t + div(ρv) = 0 links the velocity field to density evolution
**Conditional Flow Matching:** train v_θ on conditional velocities u_t(x|x_1) = x_1 - x_0 without ODE simulation - mathematically equivalent to direct minimization, but orders of magnitude cheaper
**OT-FM:** OT coupling pairs close points → straight paths → nearly constant velocity field → generation in 10-30 steps vs 1000 for DDPM; powers SD3, Flux, Voicebox

Вопросы для размышления

Linear paths x_t = (1-t)x_0 + t*x_1 are the simplest choice but not the only one. What other path families are available and what trade-offs do they introduce?
OT-FM with W_2 coupling gives straight paths in the original data space. What happens when OT is applied in a latent space instead of pixel space?
Flow Matching uses a deterministic ODE; DDPM uses a stochastic SDE. Are there tasks where the stochasticity is essential and FM would be at a disadvantage?

Связанные уроки

calc-01-sequences

Continuous Flows and the Continuity Equation

The continuity equation ∂ρ_t/∂t + div(ρ_t · v_t) = 0 states that:

Flow Matching: Learning the Velocity Field

What is the main computational advantage of Conditional Flow Matching over directly minimizing the distance between ρ_1 and p_1?

OT Flow Matching: Straight-Line Paths

2022: Flow Matching (Lipman et al., Meta AI) generates images in 10 steps instead of 1000 for diffusion models. The key is straight-line trajectories via optimal transport rather than random zigzags.

Why does the OT coupling in OT-FM reduce the number of generation steps compared to independent coupling?

Summary

**Flow** dx/dt = v_t(x) defines a continuous map p_0 → p_1; the continuity equation ∂ρ/∂t + div(ρv) = 0 links the velocity field to density evolution

**Conditional Flow Matching:** train v_θ on conditional velocities u_t(x|x_1) = x_1 - x_0 without ODE simulation - mathematically equivalent to direct minimization, but orders of magnitude cheaper

**OT-FM:** OT coupling pairs close points → straight paths → nearly constant velocity field → generation in 10-30 steps vs 1000 for DDPM; powers SD3, Flux, Voicebox

Flow Matching and Continuous OT

Предварительные знания

Continuous Flows and the Continuity Equation

Flow Matching: Learning the Velocity Field

OT Flow Matching: Straight-Line Paths

Summary

Related Topics

Вопросы для размышления

Связанные уроки

Flow Matching and Continuous OT

Предварительные знания

Continuous Flows and the Continuity Equation

Flow Matching: Learning the Velocity Field

OT Flow Matching: Straight-Line Paths

Summary

Related Topics

Вопросы для размышления

Связанные уроки