Differential Equations

Ito Stochastic Differential Equations

Цели урока

Define Brownian motion and understand its quadratic variation
Derive Ito's formula and apply it to geometric Brownian motion
Connect SDEs to the Fokker-Planck equation for the probability density
Understand how score-based diffusion models and RLHF use stochastic calculus

Предварительные знания

Probability theory (Gaussian processes)
ODE (Cauchy problem)
Functional analysis

Wave Equation

How does the chaotic collision of water molecules with a pollen grain lead to a formula pricing 9.2 trillion dollars of derivatives daily?

Black-Scholes formula (1973) - a direct consequence of Ito's formula for asset prices, Nobel Prize 1997
Score-based diffusion models (Stable Diffusion, DALL-E 3) - reverse SDE from noise to image
RLHF for ChatGPT - stochastic control over the space of language model policies
Molecular dynamics in drug discovery: Langevin SDEs for protein conformational changes

Ito, Einstein, and the Birth of Stochastic Calculus

Brown observed in 1827, Bachelier applied the idea to stock prices in 1900, Einstein explained the physics in 1905. But the rigorous calculus for such processes was created by Kiyoshi Ito in 1944 during World War II while working in Japan. His 1944 paper went almost unnoticed. By the 1950s Western mathematicians arrived at similar ideas independently. Today Ito's formula is one of the most cited results in applied mathematics: finance, physics, biology, and machine learning all depend on it.

Brownian Motion and the Stochastic Integral

Robert Brown observed chaotic pollen motion in water in 1827. Einstein explained it in 1905: a particle undergoes 10^21 collisions per second with water molecules. Norbert Wiener gave the rigorous mathematical model in 1923 - the Wiener process W_t with independent Gaussian increments. Kiyoshi Ito defined the integral with respect to such processes in 1944, opening stochastic calculus.

The Stratonovich integral (an alternative to Ito) uses the midpoint of each interval: int H dW_S = lim sum H_{(t_k+t_{k+1})/2} * Delta W_k. It satisfies the ordinary chain rule but is not a martingale. Stratonovich integrals arise naturally in physics; Ito integrals are standard in finance.

Why is (dW_t)^2 = dt and not zero?

Ito's Formula and Applications

Ito's formula is the chain rule for stochastic processes. The key difference from deterministic calculus: an extra term (1/2)*f''*sigma^2*dt appears from the quadratic variation. Without it the Black-Scholes formula - pricing derivatives worth 9.2 trillion dollars daily - would be wrong.

Ito's formula requires the integrand to be non-anticipating (adapted): sigma cannot depend on future values of W_t. If it does, the integral is undefined in the Ito sense. In physics the Stratonovich convention is sometimes used, giving a different result for the same SDE.

Why does the drift of log(S_t) equal mu - sigma^2/2 and not mu?

Fokker-Planck Equation and Stationary Distributions

Instead of tracking individual trajectories, one can describe the evolution of the probability density p(x,t). The Fokker-Planck equation (Kolmogorov forward equation) is a PDE for p(x,t) corresponding to an SDE. It is the dual of Ito's formula: every SDE has a corresponding FPE and vice versa. Score-based diffusion models (Stable Diffusion, DALL-E) are built on exactly this duality.

Score-based diffusion models (Song et al. 2020) train a neural network to approximate the score function nabla_x log p_t(x). Sampling is then realized by solving the reverse SDE from pure noise to data. This is exactly what Stable Diffusion and DALL-E 3 do.

What does the Fokker-Planck equation describe compared to the SDE?

Stochastic Control and RLHF

The Pontryagin principle and the HJB equation extend naturally to stochastic systems: control of the SDE dX = f(X,u)dt + sigma(X)dW. The stochastic HJB gains a term (sigma^2/2)*V_xx. This is the mathematical foundation of RLHF (Reinforcement Learning from Human Feedback) - the method used to train ChatGPT.

RLHF and Stochastic Control

ChatGPT as a solution to a stochastic control problem

RLHF trains a language model through a policy pi(a|s) - a probability distribution over the next token. This is a stochastic control problem: state s = context, action a = token, reward = human preference score. The PPO algorithm (Schulman 2017) maximizes the expected reward by gradient ascent on the policy parameters. Mathematically this is stochastic gradient ascent on a variational lower bound of the stochastic functional J.

Why does the optimal control for stochastic LQR coincide with the deterministic LQR gain?

Connections to Other Areas

Stochastic calculus is the mathematical foundation of financial mathematics, diffusion models, and stochastic control.

Diffusion Models (Stable Diffusion) — Related topic
Financial Mathematics — Related topic
RLHF and PPO — Related topic
Molecular Dynamics — Related topic

Итоги

Brownian motion has quadratic variation [W,W]_t = t, implying (dW)^2 = dt
Ito's formula: df(X_t) = f_x dX + (1/2) f_xx sigma^2 dt - extra term from quadratic variation
FPE is the dual of an SDE: it describes density evolution and underlies score-based diffusion models
Stochastic LQR uses the same Riccati matrix as deterministic LQR - noise adds only a constant cost

Вопросы для размышления

What is the difference between Ito and Stratonovich integrals, and when is each preferable?
Why do score-based diffusion models use a reverse SDE rather than simply inverting the forward process?
How does the Girsanov change-of-measure theorem allow option pricing without knowing the real drift mu?

Связанные уроки

diff-equations-28 — Markov semigroups are the generators of the corresponding SDE processes
de-27-schrodinger — Stochastic quantization connects SDEs to the Schrodinger equation
de-26-optimal-control — Stochastic control generalizes the Pontryagin principle to SDEs