Stochastic Processes
Queueing Theory
Цели урока
- Derive the M/M/1 stationary distribution and key performance metrics
- Apply the Pollaczek-Khinchine formula for M/G/1 with general service
- Use Little's law to analyze any stable system
- Analyze queueing networks via Jackson's theorem
Предварительные знания
- Poisson processes
- Renewal theory
- Markov chains
Server load at 90% sounds fine. But M/M/1 says: at rho = 0.9 the mean queue length is 8.1. At rho = 0.5 it is 0.5. Sixteen times more at twice the load. Queueing theory is the mathematics of unexpected catastrophes.
- AWS: SLA optimization and autoscaling via M/M/1 and Little's law
- LLM inference: M/G/1 with heavy tails explains P99 latencies
- Netflix CDN: capacity planning through Jackson networks
- Kubernetes: autoscaling based on rho and Little's law
From telephony to cloud computing
Agner Erlang solved the Danish telephone company's staffing problem in 1909. The Erlang-C formula is still used in call centers today. Felix Pollaczek (1930) and Alexander Khinchine (1932) independently derived the P-K formula for M/G/1. John Little proved his universal law in 1961 with an elegant probabilistic argument. Jackson discovered product-form for networks in 1957.
M/M/1 Model and Little's Law
AWS manages 1.5 million servers. Every request is a job in a queue. Load rho = 0.9 sounds safe: 90% utilization. But L = rho/(1-rho) = 9. Nine requests waiting where rho = 0.5 had one. The nonlinearity kills.
SLA calculation for an API service
Applying M/M/1 to engineering decisions
An API service handles 800 RPS (lambda = 800) with mean processing time 1 ms (mu = 1000). Load rho = 0.8. Mean response time W = 1/(1000 - 800) = 5 ms. Adding 20% more load (rho = 0.96) raises W to 25 ms. Five times slower from a 20% load increase. Queue nonlinearity in action.
Little's law applies to any stable system - from M/M/1 to a Kubernetes cluster. Given lambda and L, W = L/lambda is instant without knowing the distributions.
What does Little's law L = lambda * W express?
Little's law L = lambda * W is a fundamental result for any stable service system. Its proof uses only the law of large numbers.
M/G/1 Queue and the Pollaczek-Khinchine Formula
Real services are not M/M/1. LLM API processing time has a heavy tail: most requests are fast, rare ones are very slow. This is M/G/1 with high coefficient of variation. Latencies skyrocket - even at the same mean load.
| Model | C_S^2 | L_q (at rho=0.8) | W_q / E[S] |
|---|---|---|---|
| M/D/1 (deterministic) | 0 | 1.60 | 2.0 |
| M/M/1 (exponential) | 1 | 3.20 | 4.0 |
| M/G/1 (log-normal sigma=1) | ~1.7 | ~4.3 | ~5.4 |
| M/G/1 (Pareto, alpha=2.5) | ~6.7 | ~11.7 | ~14.6 |
Heavy tails in service time distribution catastrophically increase latencies. An LLM inference endpoint with P99 = 30 sec at P50 = 0.5 sec has C_S^2 >> 1 - queue length can be 10x that of M/M/1 at the same load.
In M/G/1 at rho = 0.8 and E[S] = 1, but C_S^2 = 4 (instead of 1 for M/M/1). How does L_q change?
P-K formula: L_q = rho^2 * (1 + C_S^2) / (2*(1-rho)). At C_S^2=4 and rho=0.8: L_q = 0.64 * 5 / 0.4 = 8, vs 3.2 for M/M/1. A factor of 2.5.
Queueing Networks and Jackson's Theorem
A microservices architecture is a network of queues. A request passes through auth, API gateway, database, and cache. Each service is a queue. Jackson's theorem (1957) states something remarkable: under certain conditions, the network decomposes into independent M/M/1 queues.
Netflix uses Jackson's theorem to analyze its CDN infrastructure: edge nodes, origin servers, encoding pipelines. Each level is a Jackson network node. The stationary distribution is computed analytically - no simulation.
Jackson's theorem requires Poisson arrivals and exponential service times. Violating these requires approximate methods: decomposition, heavy-traffic approximations (diffusion approximations).
Jackson's theorem: stationary distribution of the network is a product of marginals. What does this mean for an engineer?
Product-form means the stationary distribution factorizes. Engineering implication: each node can be analyzed independently as M/M/1 with effective load lambda_j.
Connections to other topics
Queueing theory links probabilistic analysis, optimization, and system design
- Cloud systems — Related topic
- Renewal theory — Related topic
- Poisson processes — Related topic
- Diffusion approximations — Related topic
Итоги
- M/M/1: L = rho/(1-rho) grows nonlinearly - at rho=0.9 the queue is 9x longer than at rho=0.5
- Little's law L = lambda * W is universal for any stable system
- M/G/1 P-K formula: high C_S^2 proportionally amplifies latencies
- Jackson networks: product-form distribution allows analytic treatment of complex architectures
Вопросы для размышления
- Why is Little's law valid for any distributions while the P-K formula depends on E[S^2]?
- How does the M/G/1 analysis change when jobs arrive in batches?
- Why does Jackson's theorem break down with non-exponential service times?