Stochastic Processes

Queueing Theory

Цели урока

  • Derive the M/M/1 stationary distribution and key performance metrics
  • Apply the Pollaczek-Khinchine formula for M/G/1 with general service
  • Use Little's law to analyze any stable system
  • Analyze queueing networks via Jackson's theorem

Предварительные знания

  • Poisson processes
  • Renewal theory
  • Markov chains
  • Poisson processes
  • Renewal theory

Server load at 90% sounds fine. But M/M/1 says: at rho = 0.9 the mean queue length is 8.1. At rho = 0.5 it is 0.5. Sixteen times more at twice the load. Queueing theory is the mathematics of unexpected catastrophes.

  • AWS: SLA optimization and autoscaling via M/M/1 and Little's law
  • LLM inference: M/G/1 with heavy tails explains P99 latencies
  • Netflix CDN: capacity planning through Jackson networks
  • Kubernetes: autoscaling based on rho and Little's law

From telephony to cloud computing

Agner Erlang solved the Danish telephone company's staffing problem in 1909. The Erlang-C formula is still used in call centers today. Felix Pollaczek (1930) and Alexander Khinchine (1932) independently derived the P-K formula for M/G/1. John Little proved his universal law in 1961 with an elegant probabilistic argument. Jackson discovered product-form for networks in 1957.

M/M/1 Model and Little's Law

AWS manages 1.5 million servers. Every request is a job in a queue. Load rho = 0.9 sounds safe: 90% utilization. But L = rho/(1-rho) = 9. Nine requests waiting where rho = 0.5 had one. The nonlinearity kills.

SLA calculation for an API service

Applying M/M/1 to engineering decisions

An API service handles 800 RPS (lambda = 800) with mean processing time 1 ms (mu = 1000). Load rho = 0.8. Mean response time W = 1/(1000 - 800) = 5 ms. Adding 20% more load (rho = 0.96) raises W to 25 ms. Five times slower from a 20% load increase. Queue nonlinearity in action.

Little's law applies to any stable system - from M/M/1 to a Kubernetes cluster. Given lambda and L, W = L/lambda is instant without knowing the distributions.

What does Little's law L = lambda * W express?

Little's law L = lambda * W is a fundamental result for any stable service system. Its proof uses only the law of large numbers.

M/G/1 Queue and the Pollaczek-Khinchine Formula

Real services are not M/M/1. LLM API processing time has a heavy tail: most requests are fast, rare ones are very slow. This is M/G/1 with high coefficient of variation. Latencies skyrocket - even at the same mean load.

ModelC_S^2L_q (at rho=0.8)W_q / E[S]
M/D/1 (deterministic)01.602.0
M/M/1 (exponential)13.204.0
M/G/1 (log-normal sigma=1)~1.7~4.3~5.4
M/G/1 (Pareto, alpha=2.5)~6.7~11.7~14.6

Heavy tails in service time distribution catastrophically increase latencies. An LLM inference endpoint with P99 = 30 sec at P50 = 0.5 sec has C_S^2 >> 1 - queue length can be 10x that of M/M/1 at the same load.

In M/G/1 at rho = 0.8 and E[S] = 1, but C_S^2 = 4 (instead of 1 for M/M/1). How does L_q change?

P-K formula: L_q = rho^2 * (1 + C_S^2) / (2*(1-rho)). At C_S^2=4 and rho=0.8: L_q = 0.64 * 5 / 0.4 = 8, vs 3.2 for M/M/1. A factor of 2.5.

Queueing Networks and Jackson's Theorem

A microservices architecture is a network of queues. A request passes through auth, API gateway, database, and cache. Each service is a queue. Jackson's theorem (1957) states something remarkable: under certain conditions, the network decomposes into independent M/M/1 queues.

Netflix uses Jackson's theorem to analyze its CDN infrastructure: edge nodes, origin servers, encoding pipelines. Each level is a Jackson network node. The stationary distribution is computed analytically - no simulation.

Jackson's theorem requires Poisson arrivals and exponential service times. Violating these requires approximate methods: decomposition, heavy-traffic approximations (diffusion approximations).

Jackson's theorem: stationary distribution of the network is a product of marginals. What does this mean for an engineer?

Product-form means the stationary distribution factorizes. Engineering implication: each node can be analyzed independently as M/M/1 with effective load lambda_j.

Connections to other topics

Queueing theory links probabilistic analysis, optimization, and system design

  • Cloud systems — Related topic
  • Renewal theory — Related topic
  • Poisson processes — Related topic
  • Diffusion approximations — Related topic

Итоги

  • M/M/1: L = rho/(1-rho) grows nonlinearly - at rho=0.9 the queue is 9x longer than at rho=0.5
  • Little's law L = lambda * W is universal for any stable system
  • M/G/1 P-K formula: high C_S^2 proportionally amplifies latencies
  • Jackson networks: product-form distribution allows analytic treatment of complex architectures

Вопросы для размышления

  • Why is Little's law valid for any distributions while the P-K formula depends on E[S^2]?
  • How does the M/G/1 analysis change when jobs arrive in batches?
  • Why does Jackson's theorem break down with non-exponential service times?

Связанные уроки

  • sp-21 — Queueing theory builds on renewal process structure
  • sp-17 — Poisson arrivals are the foundation of the M/M/1 model
  • sp-23 — Point processes generalize arrival models
  • sp-15 — Markov chains underpin the M/M/1 analysis
Queueing Theory

0

1

Sign In