Differential Equations
Ito Stochastic Differential Equations
Цели урока
- Define Brownian motion and understand its quadratic variation
- Derive Ito's formula and apply it to geometric Brownian motion
- Connect SDEs to the Fokker-Planck equation for the probability density
- Understand how score-based diffusion models and RLHF use stochastic calculus
Предварительные знания
- Probability theory (Gaussian processes)
- ODE (Cauchy problem)
- Functional analysis
How does the chaotic collision of water molecules with a pollen grain lead to a formula pricing 9.2 trillion dollars of derivatives daily?
- Black-Scholes formula (1973) - a direct consequence of Ito's formula for asset prices, Nobel Prize 1997
- Score-based diffusion models (Stable Diffusion, DALL-E 3) - reverse SDE from noise to image
- RLHF for ChatGPT - stochastic control over the space of language model policies
- Molecular dynamics in drug discovery: Langevin SDEs for protein conformational changes
Ito, Einstein, and the Birth of Stochastic Calculus
Brown observed in 1827, Bachelier applied the idea to stock prices in 1900, Einstein explained the physics in 1905. But the rigorous calculus for such processes was created by Kiyoshi Ito in 1944 during World War II while working in Japan. His 1944 paper went almost unnoticed. By the 1950s Western mathematicians arrived at similar ideas independently. Today Ito's formula is one of the most cited results in applied mathematics: finance, physics, biology, and machine learning all depend on it.
Brownian Motion and the Stochastic Integral
Robert Brown observed chaotic pollen motion in water in 1827. Einstein explained it in 1905: a particle undergoes 10^21 collisions per second with water molecules. Norbert Wiener gave the rigorous mathematical model in 1923 - the Wiener process W_t with independent Gaussian increments. Kiyoshi Ito defined the integral with respect to such processes in 1944, opening stochastic calculus.
The Stratonovich integral (an alternative to Ito) uses the midpoint of each interval: int H dW_S = lim sum H_{(t_k+t_{k+1})/2} * Delta W_k. It satisfies the ordinary chain rule but is not a martingale. Stratonovich integrals arise naturally in physics; Ito integrals are standard in finance.
Why is (dW_t)^2 = dt and not zero?
Ito's Formula and Applications
Ito's formula is the chain rule for stochastic processes. The key difference from deterministic calculus: an extra term (1/2)*f''*sigma^2*dt appears from the quadratic variation. Without it the Black-Scholes formula - pricing derivatives worth 9.2 trillion dollars daily - would be wrong.
Ito's formula requires the integrand to be non-anticipating (adapted): sigma cannot depend on future values of W_t. If it does, the integral is undefined in the Ito sense. In physics the Stratonovich convention is sometimes used, giving a different result for the same SDE.
Why does the drift of log(S_t) equal mu - sigma^2/2 and not mu?
Fokker-Planck Equation and Stationary Distributions
Instead of tracking individual trajectories, one can describe the evolution of the probability density p(x,t). The Fokker-Planck equation (Kolmogorov forward equation) is a PDE for p(x,t) corresponding to an SDE. It is the dual of Ito's formula: every SDE has a corresponding FPE and vice versa. Score-based diffusion models (Stable Diffusion, DALL-E) are built on exactly this duality.
Score-based diffusion models (Song et al. 2020) train a neural network to approximate the score function nabla_x log p_t(x). Sampling is then realized by solving the reverse SDE from pure noise to data. This is exactly what Stable Diffusion and DALL-E 3 do.
What does the Fokker-Planck equation describe compared to the SDE?
Stochastic Control and RLHF
The Pontryagin principle and the HJB equation extend naturally to stochastic systems: control of the SDE dX = f(X,u)dt + sigma(X)dW. The stochastic HJB gains a term (sigma^2/2)*V_xx. This is the mathematical foundation of RLHF (Reinforcement Learning from Human Feedback) - the method used to train ChatGPT.
RLHF and Stochastic Control
ChatGPT as a solution to a stochastic control problem
RLHF trains a language model through a policy pi(a|s) - a probability distribution over the next token. This is a stochastic control problem: state s = context, action a = token, reward = human preference score. The PPO algorithm (Schulman 2017) maximizes the expected reward by gradient ascent on the policy parameters. Mathematically this is stochastic gradient ascent on a variational lower bound of the stochastic functional J.
Why does the optimal control for stochastic LQR coincide with the deterministic LQR gain?
Connections to Other Areas
Stochastic calculus is the mathematical foundation of financial mathematics, diffusion models, and stochastic control.
- Diffusion Models (Stable Diffusion) — Related topic
- Financial Mathematics — Related topic
- RLHF and PPO — Related topic
- Molecular Dynamics — Related topic
Итоги
- Brownian motion has quadratic variation [W,W]_t = t, implying (dW)^2 = dt
- Ito's formula: df(X_t) = f_x dX + (1/2) f_xx sigma^2 dt - extra term from quadratic variation
- FPE is the dual of an SDE: it describes density evolution and underlies score-based diffusion models
- Stochastic LQR uses the same Riccati matrix as deterministic LQR - noise adds only a constant cost
Вопросы для размышления
- What is the difference between Ito and Stratonovich integrals, and when is each preferable?
- Why do score-based diffusion models use a reverse SDE rather than simply inverting the forward process?
- How does the Girsanov change-of-measure theorem allow option pricing without knowing the real drift mu?
Связанные уроки
- diff-equations-28 — Markov semigroups are the generators of the corresponding SDE processes
- de-27-schrodinger — Stochastic quantization connects SDEs to the Schrodinger equation
- de-26-optimal-control — Stochastic control generalizes the Pontryagin principle to SDEs