Optimal Transport
Causal OT and Adapted Wasserstein Distance
Цели урока
- Define causal couplings via filtration constraints and explain why they generalize martingale couplings
- Compute the adapted Wasserstein distance via the nested distance backward recursion
- Apply causal OT to time series model comparison, counterfactual generation, and Granger causality measurement
Предварительные знания
- Martingale OT and semi-static hedging duality
- Filtrations and adapted stochastic processes
- Granger causality and its relationship to conditional independence
- Wasserstein distances for distributions on path spaces
Standard Wasserstein distance is blind to time. Two stock price models with identical daily return distributions but completely different serial correlations have W2 = 0 - but produce wildly different option prices, risk measures, and stress scenarios. Causal OT fixes this blindness.
- Time series generative models (neural SDEs, diffusion-based): AW distance is the correct evaluation metric capturing temporal dynamics, not just marginal fit
- Stress testing in banking: generating adversarial economic scenarios via causal transport preserves temporal causality between macro variables
- Drug clinical trials: causal OT provides counterfactual outcomes for patients in the control arm - what would have happened under treatment
- Climate attribution: causal transport measures how much observed climate trajectories deviate from counterfactual no-emission scenarios
Nested Distance to Adapted Wasserstein
The adapted metric for stochastic processes was introduced as the 'nested distance' by Pflug and Pichler (2012) in the context of stochastic programming - optimizing decisions under uncertainty over time requires a metric that respects the information structure. Independently, Veraguas et al. (2020) established the adapted Wasserstein distance on continuous-time path spaces and proved its topological equivalence to weak convergence of adapted processes. Bartl, Beiglbock, Pammer (2021) developed the causal OT duality theory. The connection to machine learning - time series evaluation, counterfactual generation, neural SDEs - emerged from 2021 onwards and is an active research area.
Causal Couplings: Transport Respecting Information Flow
A coupling of two stochastic processes is causal if knowing the future of one process does not require knowing the future of the other before that time. This condition - causality - is a filtration constraint that standard OT completely ignores.
Every martingale coupling is causal (the martingale property implies the causal constraint for the specific case of drift-free processes), but causal couplings are more general - they apply to any adapted processes, not just martingales.
What is the key difference between standard OT couplings and causal couplings for stochastic processes?
Standard OT couples distributions at a single time point without temporal structure. Causal OT for processes requires that the transport plan does not use future information - reflecting that in real stochastic systems, actions must be adapted to the current information.
The Adapted Wasserstein Distance
The adapted Wasserstein distance (AW) is the OT cost minimized over bicausal couplings instead of all couplings. It metrizes weak convergence of adapted processes and is sensitive to the filtration structure - properties that standard Wasserstein distances lack.
The AW distance is strictly finer than W on the terminal marginals. Two identical Gaussian processes with different correlation structures have W = 0 on marginals but AW > 0. This matters for time series model comparison.
When can two stochastic processes have zero standard Wasserstein distance but positive adapted Wasserstein distance?
Standard W measures distance between marginal distributions. If two processes have the same marginals at every time t but different temporal correlations (e.g., AR(1) vs. i.i.d.), W=0 but AW>0 because the couplings must respect the filtration.
Causal OT in Time Series and Counterfactual Analysis
Causal OT provides the correct framework for comparing time series models, generating counterfactual scenarios, and measuring causal influence in dynamical systems. AW distance is the right metric when temporal structure matters.
Backhoff-Veraguas, Bartl, Beiglbock, and Eder (2020) proved that AW metrizes weak convergence of adapted processes in full generality - making it the canonical metric for stochastic process comparison.
Why is the adapted Wasserstein distance more appropriate than standard W2 for comparing time series generative models?
Two generative models might have identical marginal distributions at every time point but very different temporal structure (correlations, volatility clustering). W2 cannot distinguish them; AW can because the bicausal constraint forces the coupling to respect the filtration.
Causality as a Transport Constraint
Causal OT unifies optimal transport theory with the theory of stochastic processes by embedding the filtration structure into the coupling constraints. Where standard OT is timeless geometry, causal OT is temporal geometry - it measures not just how far apart two distributions are, but how far apart they are when the transport must respect the arrow of time. This is the correct framework for any machine learning task involving sequential data.
- Optimal Transport — Related topic
Итоги
- A coupling is causal if the future of Y given the joint past does not depend on the future of X - filtration compatibility
- The adapted Wasserstein distance AW minimizes expected path cost over bicausal couplings; it is always >= W on terminal marginals
- AW is computed via the nested distance backward recursion: solve standard OT at each time step conditioned on the past
- Causal OT enables time series model selection (use AW not W), counterfactual generation, and rigorous Granger causality measurement
Вопросы для размышления
- The adapted Wasserstein distance uses bicausal couplings (causal in both directions). Why is symmetry needed for AW to be a metric? Is there a causal (one-directional) transport distance, and what would it measure?
- For Markov processes, is the adapted W2 computable in polynomial time? What is the recursive structure that makes it tractable compared to the general path-space case?
- Causal OT counterfactuals require solving an infinite-dimensional optimization. In practice, neural causal transport (LSTM-based) is used. What guarantees can be given that the neural map is actually causal, and how would one verify this empirically?
Связанные уроки
- ot-27 — Causal OT generalizes martingale OT by replacing the martingale condition with a general causality constraint
- ot-01-monge — Causal OT is built on the Monge-Kantorovich framework extended with filtration constraints
- ot-29 — Unbalanced causal transport handles mass creation/destruction in temporal processes
- ot-26-multi-marginal