Stochastic Processes
Stochastic Control
How does SpaceX land rockets in a storm? How does Tesla's Autopilot hold a highway lane? Behind both is stochastic control: the mathematics that turns randomness into manageability. The LQG regulator flies in every commercial aircraft in the world.
- **Aviation:** autopilot and stabilization systems, LQG regulator with partial observations through the Kalman filter
- **Finance:** Merton's portfolio optimization, stochastic control with logarithmic utility (HJB with explicit solution)
- **Robotics / RL:** PPO, SAC, and Actor-Critic algorithms, numerical solution of Bellman's equation without a system model
Предварительные знания
Problem Formulation
SpaceX Falcon 9 uses stochastic optimal control: minimize fuel cost E[∫(u² + x²)dt] while landing within 10 m of the target. **Stochastic optimal control** is the problem of choosing a control policy u(t) that minimizes a cost functional when the system dynamics contain random disturbances.
Feedback (Markov) control: u(t) = π(t, X(t)), depends only on the current state. Optimal by Bellman's principle. Open-loop control: u(t) = u(t), a deterministic function of time only. Simpler to compute but less flexible.
Why are feedback control policies u(t) = π(t, X(t)) preferred in stochastic control?
The Hamilton - Jacobi - Bellman Equation
The **HJB equation** is a nonlinear PDE for the value function V(t, x). Its solution yields the optimal control via the gradient of V.
An alternative approach: Pontryagin's maximum principle formulates necessary optimality conditions through the Hamiltonian H(x, p, u) = L(x,u) + p·f(x,u), where p(t) is the costate variable satisfying dp = -∂H/∂x dt. Connection to HJB: p(t) = ∂V/∂x(t, X(t)).
What mathematical object is the 'price' of state x in the HJB equation?
LQG Regulator: Explicit Solution
The **Linear-Quadratic-Gaussian (LQG) problem** is the only class where the HJB equation is solved analytically. It is the foundation of classical control theory.
Under partial observation: Y(t) = CX(t) + noise, optimal control separates (separation theorem): 1) Kalman filter recovers X̂(t) = E[X(t)|Y]; 2) LQ regulator applies u* = -K·X̂(t). The filter and regulator are designed independently.
In the LQG problem, the Riccati equation is used to find:
Connection to Reinforcement Learning
Reinforcement learning (RL) is stochastic control without a known model. The HJB equation and Bellman equation underlie Q-learning, Actor-Critic, and PPO.
In the LQG problem with unknown dynamics (A, B unknown), Linear Quadratic Regulator with policy iteration (model-free LQR) is used. At each step P(t) is estimated from data (LSTD), then K is updated. Converges to optimal K* without knowing A and B.
The HJB equation is the continuous-time analogue of which equation from RL?
Key Ideas
- **Value function V(t,x)**: optimal cost from time t; satisfies the HJB equation
- **HJB equation**: nonlinear PDE: -∂V/∂t = min_u{L + f·∂V/∂x + ½σ²·∂²V/∂x²}
- **LQG**: explicit solution via Riccati equation: u* = -K(t)X, K = R⁻¹B^T P
- **RL = HJB without a model**, Q-learning and Actor-Critic numerically approximate Bellman's equation
Related Topics
Stochastic control bridges SDEs, martingales, and ML:
- Stochastic Differential Equations — System dynamics are given by SDEs; Itô's lemma is used to derive the HJB equation
- Martingales — The optimal process V(t, X(t)) is a martingale under the optimal strategy
- Financial Mathematics — Merton's portfolio problem, stochastic control with HJB
Вопросы для размышления
- How does the HJB equation degenerate to the deterministic case (σ = 0)? What is the Hamilton - Jacobi equation?
- Why is the separation theorem practically important, what would happen without it?
- How would one formulate an insulin pump control problem for a diabetic patient in HJB terms?