Stochastic Processes

Stochastic Control

How does SpaceX land rockets in a storm? How does Tesla's Autopilot hold a highway lane? Behind both is stochastic control: the mathematics that turns randomness into manageability. The LQG regulator flies in every commercial aircraft in the world.

**Aviation:** autopilot and stabilization systems, LQG regulator with partial observations through the Kalman filter
**Finance:** Merton's portfolio optimization, stochastic control with logarithmic utility (HJB with explicit solution)
**Robotics / RL:** PPO, SAC, and Actor-Critic algorithms, numerical solution of Bellman's equation without a system model

Предварительные знания

Stochastic Differential Equations

Problem Formulation

SpaceX Falcon 9 uses stochastic optimal control: minimize fuel cost E[∫(u² + x²)dt] while landing within 10 m of the target. **Stochastic optimal control** is the problem of choosing a control policy u(t) that minimizes a cost functional when the system dynamics contain random disturbances.

Feedback (Markov) control: u(t) = π(t, X(t)), depends only on the current state. Optimal by Bellman's principle. Open-loop control: u(t) = u(t), a deterministic function of time only. Simpler to compute but less flexible.

Why are feedback control policies u(t) = π(t, X(t)) preferred in stochastic control?

The Hamilton - Jacobi - Bellman Equation

The **HJB equation** is a nonlinear PDE for the value function V(t, x). Its solution yields the optimal control via the gradient of V.

An alternative approach: Pontryagin's maximum principle formulates necessary optimality conditions through the Hamiltonian H(x, p, u) = L(x,u) + p·f(x,u), where p(t) is the costate variable satisfying dp = -∂H/∂x dt. Connection to HJB: p(t) = ∂V/∂x(t, X(t)).

What mathematical object is the 'price' of state x in the HJB equation?

LQG Regulator: Explicit Solution

The **Linear-Quadratic-Gaussian (LQG) problem** is the only class where the HJB equation is solved analytically. It is the foundation of classical control theory.

Under partial observation: Y(t) = CX(t) + noise, optimal control separates (separation theorem): 1) Kalman filter recovers X̂(t) = E[X(t)|Y]; 2) LQ regulator applies u* = -K·X̂(t). The filter and regulator are designed independently.

In the LQG problem, the Riccati equation is used to find:

Connection to Reinforcement Learning

Reinforcement learning (RL) is stochastic control without a known model. The HJB equation and Bellman equation underlie Q-learning, Actor-Critic, and PPO.

In the LQG problem with unknown dynamics (A, B unknown), Linear Quadratic Regulator with policy iteration (model-free LQR) is used. At each step P(t) is estimated from data (LSTD), then K is updated. Converges to optimal K* without knowing A and B.

The HJB equation is the continuous-time analogue of which equation from RL?

Key Ideas

**Value function V(t,x)**: optimal cost from time t; satisfies the HJB equation
**HJB equation**: nonlinear PDE: -∂V/∂t = min_u{L + f·∂V/∂x + ½σ²·∂²V/∂x²}
**LQG**: explicit solution via Riccati equation: u* = -K(t)X, K = R⁻¹B^T P
**RL = HJB without a model**, Q-learning and Actor-Critic numerically approximate Bellman's equation

Вопросы для размышления

How does the HJB equation degenerate to the deterministic case (σ = 0)? What is the Hamilton - Jacobi equation?
Why is the separation theorem practically important, what would happen without it?
How would one formulate an insulin pump control problem for a diabetic patient in HJB terms?

Связанные уроки

calc-19-gradient

Problem Formulation

Why are feedback control policies u(t) = π(t, X(t)) preferred in stochastic control?

The Hamilton - Jacobi - Bellman Equation

The **HJB equation** is a nonlinear PDE for the value function V(t, x). Its solution yields the optimal control via the gradient of V.

What mathematical object is the 'price' of state x in the HJB equation?

LQG Regulator: Explicit Solution

The **Linear-Quadratic-Gaussian (LQG) problem** is the only class where the HJB equation is solved analytically. It is the foundation of classical control theory.

In the LQG problem, the Riccati equation is used to find:

Connection to Reinforcement Learning

Reinforcement learning (RL) is stochastic control without a known model. The HJB equation and Bellman equation underlie Q-learning, Actor-Critic, and PPO.

The HJB equation is the continuous-time analogue of which equation from RL?

Key Ideas

**Value function V(t,x)**: optimal cost from time t; satisfies the HJB equation

**HJB equation**: nonlinear PDE: -∂V/∂t = min_u{L + f·∂V/∂x + ½σ²·∂²V/∂x²}

**LQG**: explicit solution via Riccati equation: u* = -K(t)X, K = R⁻¹B^T P

**RL = HJB without a model**, Q-learning and Actor-Critic numerically approximate Bellman's equation

Stochastic Control

Предварительные знания

Problem Formulation

The Hamilton - Jacobi - Bellman Equation

LQG Regulator: Explicit Solution

Connection to Reinforcement Learning

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки

Stochastic Control

Предварительные знания

Problem Formulation

The Hamilton - Jacobi - Bellman Equation

LQG Regulator: Explicit Solution

Connection to Reinforcement Learning

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки