Stochastic Processes

Continuous-Time Markov Chains

AWS Lambda cold start: a container wakes up, handles one request, then dies. Mean lifetime follows a Poisson process. The request queue is an M/M/c birth-death process. AWS engineers solve $\pi Q = 0$ for M/M/c - find the stationary queue-length distribution - and read off optimal provisioned concurrency. Kolmogorov 1931 -> AWS 2024. The same mathematics runs Folding@home (protein conformation simulations) and COVID-19 epidemiological models.

  • **ML infrastructure:** M/M/c queues for GPU cluster autoscaling - Microsoft AzureML, Google Vertex AI; Gillespie algorithm for biochemical simulations in AlphaFold protein dynamics
  • **Reinforcement learning:** Markov Decision Process in continuous time - CTMC with control; Hamilton-Jacobi-Bellman equation is the continuous-time analogue of Kolmogorov equations; policy gradient in continuous-time RL
  • **Reliability engineering:** component failure and repair modeled as CTMC; Erlang-B/C formulas in call-routing algorithms and SRE p99-latency optimization

Предварительные знания

  • Discrete-Time Markov Chains

CTMC and the Q-Matrix

AWS Lambda cold start. A container wakes up, handles one request, then dies. The mean lifetime follows a Poisson process. The queue of incoming requests is an M/M/c birth-death process. AWS engineers solve a single equation - $\pi Q = 0$ - to find optimal provisioned concurrency and shave p99 latency. The same mathematics drives Folding@home when simulating protein conformations and epidemiological models of COVID-19. That is continuous-time - CTMC.

In a DTMC, transitions happen at steps 0, 1, 2, ... In a **CTMC** (Continuous-Time Markov Chain) the system transitions at any moment of continuous time $t \geq 0$. Instead of a probability matrix $P$, one uses a **Q-matrix** (infinitesimal generator) that encodes transition rates. Physical meaning: $q_{ij} \cdot dt$ is the probability of jumping from state $i$ to state $j$ over a small interval $dt$.

**Q-matrix:** $q_{ij} \geq 0$ for $i \neq j$ - transition rate from $i$ to $j$. Diagonal: $q_{ii} = -\sum_{j \neq i} q_{ij}$, so every row sums to zero. Mean holding time in state $i$ equals $1/|q_{ii}|$.

The Gillespie algorithm (1977) - exact stochastic simulation of biochemical reactions - is precisely a CTMC simulator. AlphaFold uses it for protein dynamics: each protein conformation is a state, transition rates form the Q-matrix. Three decades later the same mathematics drives computational biology and drug discovery at scale.

Why are the diagonal elements of the Q-matrix negative?

Birth-Death Processes

A **birth-death process** is a CTMC where transitions are only possible between adjacent states: $n \to n+1$ (birth at rate $\lambda_n$) and $n \to n-1$ (death at rate $\mu_n$). A modest construction - yet the entire queueing theory powering ML infrastructure rests on it. Microsoft AzureML and Google Vertex AI use M/M/c queues for GPU cluster autoscaling: solve for the minimum number of GPUs $c$ to keep warm under peak training load.

The textbook example is the **M/M/1 queue**: jobs arrive as a Poisson process at rate $\lambda$ and are served at rate $\mu$. State $n$ = number of jobs in the system. Utilization $\rho = \lambda/\mu < 1$ is the stability condition. As $\rho \to 1$, the mean queue length $L = \rho/(1-\rho)$ explodes nonlinearly.

Queue modellambda_nmu_nApplication
M/M/1lambda (const)mu (const)Single GPU / single server
M/M/clambda (const)min(n,c)*muGPU cluster (AzureML, Vertex AI)
M/M/1/Klambda (n<K), 0 (n=K)mu (const)Bounded request buffer
M/M/inflambda (const)n*muServerless (each request gets own worker)

**Stability condition for M/M/1:** $\rho = \lambda/\mu < 1$. When $\rho \geq 1$ the queue grows without bound - no stationary distribution exists. M/M/c is stable when $\rho = \lambda/(c\mu) < 1$ - exactly the calculation behind choosing minimum GPU count $c$ for peak load.

In an M/M/1 queue with $\lambda=4$ and $\mu=5$, the utilization and mean number of jobs are:

Kolmogorov Equations and Stationary Distribution

1931. Kolmogorov derives two equations describing how CTMC probabilities evolve. Ninety-three years later, AWS Lambda optimizers solve the same thing - and call it cost optimization. **Forward equations:** $\frac{dP(t)}{dt} = P(t)Q$ - how probabilities propagate forward in time. **Backward equations:** $\frac{dP(t)}{dt} = QP(t)$ - the backward view. In steady state $\frac{dP}{dt} = 0$, giving $\pi Q = 0$.

**Stationary equation:** $\pi Q = 0$, $\sum_i \pi_i = 1$. Physical meaning: for each state $j$, total flow in equals total flow out. Markov Decision Processes in continuous-time RL are CTMCs with control; the Hamilton-Jacobi-Bellman equation is the continuous-time analogue of $\pi Q = 0$.

For birth-death processes $\pi Q = 0$ reduces to **detailed balance**: $\pi_n \lambda_n = \pi_{n+1} \mu_{n+1}$. Flow from $n$ to $n+1$ equals flow back from $n+1$ to $n$. This recurrence has an analytic closed form - no matrix algebra needed. For M/M/1: $\pi_n = (1-\rho)\rho^n$ - a geometric distribution.

M/M/1 formulaExpressionMeaning
Utilizationrho = lambda/muFraction of time server is busy
P(n jobs)pi_n = (1-rho)*rho^nGeometric distribution
Mean in systemL = rho/(1-rho)Little's law: L = lambda*W
Mean timeW = 1/(mu-lambda)Time from arrival to departure

Erlang and Kolmogorov: from telephone to AWS

1909. Danish engineer Agner Erlang builds the first CTMC model for the Copenhagen Telephone Company. Question: how many operators are needed to keep fewer than 1% of calls waiting more than one minute? The Erlang-B and Erlang-C formulas are still embedded in every call-routing algorithm. Kolmogorov provides the rigorous mathematical foundation in 1931. The unit of telephone traffic "erlang" - and every modern SRE p99-latency metric - are descendants of one formula.

DTMC and CTMC are the same thing, just in different time domains

CTMC is not a DTMC with faster steps. It requires exponentially distributed holding times (memoryless property). The Q-matrix encodes rates, not probabilities. Without the memoryless property there is no Markov property in continuous time.

The exponential distribution is the only continuous distribution with the memoryless property: $\mathbb{P}(T > s+t \mid T > s) = \mathbb{P}(T > t)$. If holding times are not exponential (e.g., deterministic service times), how long one has already waited carries information about the future - violating the Markov property.

What does $\pi Q = 0$ mean for a CTMC?

Key Ideas

  • **Q-matrix** (generator): $q_{ij} \geq 0$ for $i \neq j$, $q_{ii} = -\sum_{j \neq i} q_{ij}$, rows sum to zero; $P(t) = e^{Qt}$; mean holding time in state $i$ equals $1/|q_{ii}|$
  • **Birth-death process**: transitions only between adjacent states; M/M/c models GPU-cluster autoscaling; stability condition $\rho = \lambda/(c\mu) < 1$
  • **Kolmogorov equations**: forward $dP/dt = PQ$, backward $dP/dt = QP$; stationary $\pi Q = 0$; detailed balance $\pi_n \lambda_n = \pi_{n+1} \mu_{n+1}$ for birth-death
  • **Memoryless property**: exponential distribution is the only continuous distribution satisfying $\mathbb{P}(T>s+t|T>s)=\mathbb{P}(T>t)$; this is what makes the Markov property possible in continuous time

Related Topics

CTMC bridges theory and infrastructure:

  • Discrete-Time Markov Chains — CTMC generalizes DTMC to continuous time; Q-matrix of rates replaces P-matrix of probabilities
  • Stochastic Processes: Definitions — CTMC is the canonical example of a stationary ergodic process with the Markov property
  • MCMC and Sampling — Metropolis-Hastings and HMC construct ergodic CTMCs with the desired posterior as stationary distribution

Вопросы для размышления

  • An AWS Lambda optimizer solves $\pi Q = 0$ for M/M/c with five states (0..4 warm containers). Which Q-matrix parameter governs cold-start speed, and which governs throughput? How does shifting $\mu$ change the optimal container count?
  • Real GPU job durations are not exponential - they may be deterministic (fixed batch size) or log-normal. How does this affect M/M/1 accuracy? What is an M/G/1 queue and why is it harder to analyze?
  • At $\rho = 0.99$ the mean M/M/1 queue length is 99. The system is stable, but p99 latency is catastrophic. What two levers does an SRE engineer have - and how does detailed balance explain why adding a second server (M/M/2) reduces the queue nonlinearly?

Связанные уроки

  • sp-02 — CTMC extends DTMC - Q-matrix replaces P-matrix
  • sp-01 — Markov property and stochastic process foundations
  • sp-04 — Martingales and stopping times - next layer of stochastic analysis
  • sp-12 — MCMC builds on ergodic chains with target stationary distribution
  • prob-18-poisson — Poisson process is a degenerate CTMC with a single state
  • prob-17 — Parallel view of Markov chains from the probability course
Continuous-Time Markov Chains

0

1

Sign In