Stochastic Processes

Stochastic Control

How does SpaceX land rockets in a storm? How does Tesla's Autopilot hold a highway lane? Behind both is stochastic control: the mathematics that turns randomness into manageability. The LQG regulator flies in every commercial aircraft in the world.

  • **Aviation:** autopilot and stabilization systems, LQG regulator with partial observations through the Kalman filter
  • **Finance:** Merton's portfolio optimization, stochastic control with logarithmic utility (HJB with explicit solution)
  • **Robotics / RL:** PPO, SAC, and Actor-Critic algorithms, numerical solution of Bellman's equation without a system model

Предварительные знания

  • Stochastic Differential Equations

Problem Formulation

SpaceX Falcon 9 uses stochastic optimal control: minimize fuel cost E[∫(u² + x²)dt] while landing within 10 m of the target. **Stochastic optimal control** is the problem of choosing a control policy u(t) that minimizes a cost functional when the system dynamics contain random disturbances.

Feedback (Markov) control: u(t) = π(t, X(t)), depends only on the current state. Optimal by Bellman's principle. Open-loop control: u(t) = u(t), a deterministic function of time only. Simpler to compute but less flexible.

Why are feedback control policies u(t) = π(t, X(t)) preferred in stochastic control?

The Hamilton - Jacobi - Bellman Equation

The **HJB equation** is a nonlinear PDE for the value function V(t, x). Its solution yields the optimal control via the gradient of V.

An alternative approach: Pontryagin's maximum principle formulates necessary optimality conditions through the Hamiltonian H(x, p, u) = L(x,u) + p·f(x,u), where p(t) is the costate variable satisfying dp = -∂H/∂x dt. Connection to HJB: p(t) = ∂V/∂x(t, X(t)).

What mathematical object is the 'price' of state x in the HJB equation?

LQG Regulator: Explicit Solution

The **Linear-Quadratic-Gaussian (LQG) problem** is the only class where the HJB equation is solved analytically. It is the foundation of classical control theory.

Under partial observation: Y(t) = CX(t) + noise, optimal control separates (separation theorem): 1) Kalman filter recovers X̂(t) = E[X(t)|Y]; 2) LQ regulator applies u* = -K·X̂(t). The filter and regulator are designed independently.

In the LQG problem, the Riccati equation is used to find:

Connection to Reinforcement Learning

Reinforcement learning (RL) is stochastic control without a known model. The HJB equation and Bellman equation underlie Q-learning, Actor-Critic, and PPO.

In the LQG problem with unknown dynamics (A, B unknown), Linear Quadratic Regulator with policy iteration (model-free LQR) is used. At each step P(t) is estimated from data (LSTD), then K is updated. Converges to optimal K* without knowing A and B.

The HJB equation is the continuous-time analogue of which equation from RL?

Key Ideas

  • **Value function V(t,x)**: optimal cost from time t; satisfies the HJB equation
  • **HJB equation**: nonlinear PDE: -∂V/∂t = min_u{L + f·∂V/∂x + ½σ²·∂²V/∂x²}
  • **LQG**: explicit solution via Riccati equation: u* = -K(t)X, K = R⁻¹B^T P
  • **RL = HJB without a model**, Q-learning and Actor-Critic numerically approximate Bellman's equation

Related Topics

Stochastic control bridges SDEs, martingales, and ML:

  • Stochastic Differential Equations — System dynamics are given by SDEs; Itô's lemma is used to derive the HJB equation
  • Martingales — The optimal process V(t, X(t)) is a martingale under the optimal strategy
  • Financial Mathematics — Merton's portfolio problem, stochastic control with HJB

Вопросы для размышления

  • How does the HJB equation degenerate to the deterministic case (σ = 0)? What is the Hamilton - Jacobi equation?
  • Why is the separation theorem practically important, what would happen without it?
  • How would one formulate an insulin pump control problem for a diabetic patient in HJB terms?

Связанные уроки

  • calc-19-gradient
Stochastic Control

0

1

Sign In