Probability Theory
Poisson Process
Cloudflare processes 41.7 million DNS requests per second. Each request arrives at a random moment - yet the aggregate stream obeys precise mathematical laws. The Poisson process describes this chaos with a single parameter $\lambda$.
- Queuing systems (AWS SQS, Kafka): inter-arrival times follow the exponential distribution. The rate $\lambda$ is estimated by MLE as $\hat\lambda = 1/\bar t$ - the reciprocal of the sample mean inter-arrival time.
- Neural spike trains in BCI: neuron firings are modeled as a Poisson process with rate $\lambda$ (spikes/sec). Estimating $\lambda$ from electrophysiology recordings is the first step in decoding motor intentions.
- DDoS detection: a sudden increase in estimated $\hat\lambda$ above the baseline is an anomaly signal. Hawkes processes extend the Poisson model to capture self-excitation (traffic bursts generating more bursts).
Цели урока
- State the three axioms of a Poisson process and derive the Poisson distribution of event counts
- Prove the memoryless property of the exponential distribution and link it to waiting times
- Estimate the intensity $\lambda$ by MLE and construct a confidence interval
Предварительные знания
- Discrete distributions: binomial and Poisson
- Exponential distribution and its properties
- Fundamentals of probabilistic inference: MLE
Three axioms and their consequences
A counting process $N(t)$ is a Poisson process with intensity $\lambda$ if: (1) $N(0)=0$; (2) independent increments - counts in non-overlapping intervals are independent; (3) stationary increments - the distribution of $N(t+s)-N(t)$ depends only on $s$; (4) ordinarity - $P(N(h)\geq 2) = o(h)$ as $h \to 0$. These axioms imply: $N(t) \sim \mathrm{Pois}(\lambda t)$, i.e. $P(N(t)=k) = e^{-\lambda t}(\lambda t)^k/k!$.
The Hawkes process $\lambda(t) = \mu + \sum_{t_i < t} \phi(t - t_i)$ extends the Poisson process: each event temporarily increases the rate by $\phi(t-t_i)$. Used for social media cascades, earthquake aftershocks, and HFT order flow. The process is stationary when $\|\phi\|_1 < 1$.
The MLE estimate $\hat\lambda = 1/\bar t$ is biased for small samples: $\mathbb{E}[1/\bar t] \neq 1/\mathbb{E}[\bar t]$ due to nonlinearity. The unbiased estimate is $(n-1)/\sum t_i$ (for $n > 1$). For $n > 30$ the difference is negligible; for small $n$ use Bayesian inference with a Gamma prior.
Definition and Three Axioms
Cloudflare handles 3.6 trillion DNS requests per day, averaging 41.7 million requests per second. The count of requests arriving in any time window obeys a Poisson process, a mathematical structure that describes random event streams where events are rare, independent, and can occur at any instant. Three axioms characterize it precisely.
In ML, event count modeling underpins recommendation systems (clicks as Poisson arrivals), anomaly detection (sudden rate changes), and network security (DDoS detection via rate estimation). The Poisson process is the canonical prior for count data in Bayesian models.
A server receives requests at rate lambda = 100/s. What is the probability that exactly 3 requests arrive in a 20ms window?
The answer follows directly from the definition and properties of the object under consideration.
Waiting Time Distribution
Knowing how many events occur in a window is useful. Knowing how long to wait for the next event is equally critical. In queuing systems at Google or AWS, the inter-arrival time between API calls determines whether buffers overflow. For a Poisson process with rate lambda, these waiting times obey the exponential distribution.
In neural data analysis, inter-spike intervals of Poisson neurons follow the exponential distribution. Fitting lambda to recorded spike trains gives a firing rate estimate used in brain-computer interfaces and computational neuroscience models.
Prove that the exponential distribution is the unique continuous memoryless distribution.
The answer follows directly from the definition and properties of the object under consideration.
ML Applications and Rate Estimation
Spotify processes 600 billion events per day from user interactions. Each stream, skip, and pause is modeled as a point process. The Poisson process provides the baseline model; deviations from it (e.g., burstiness, time-varying rates) motivate extensions like Cox processes and Hawkes processes, widely used in recommendation systems.
The Hawkes process extends the Poisson process by making the rate self-exciting: each event temporarily increases the probability of future events. This model captures social media cascades, earthquake aftershocks, and high-frequency trading order flow, and is actively used in production recommendation ranking systems.
Given 1000 inter-arrival times with sample mean 0.05s, compute the MLE of lambda and a 95% confidence interval.
The answer follows directly from the definition and properties of the object under consideration.
Cloudflare traffic analysis
Rate $\lambda = 41.7 \times 10^6$ requests/sec. In a 0.1 ms window: $\mu = \lambda t = 4170$. By CLT: $N(t) \approx \mathcal{N}(4170, 4170)$, std $\approx 64.6$. Probability of fewer than 4000 requests: $P(N < 4000) \approx \Phi(-2.63) \approx 0.4\%$. Monitoring systems use exactly such z-score deviations for anomaly detection.
Итоги
- The Poisson process is defined by three axioms, implying the Poisson distribution for event counts and the exponential distribution for waiting times.
- MLE of intensity: $\hat\lambda = 1/\bar t$, SE $= \lambda/\sqrt{n}$.
- The exponential distribution is the unique continuous memoryless distribution.
Connections to other topics
Brownian motion (prob-19) is the continuous analogue: Gaussian increments instead of Poisson. Bayesian networks (prob-27) use conditional Poisson models for event data. Hawkes processes, widely used in recommendation systems, are built on top of the Poisson foundation.
- Prob 19 — related
- Prob 27 — related
Вопросы для размышления
- A KS test shows a low $p$-value for inter-arrival times fitted with an exponential distribution. This suggests non-homogeneity. How does the thinning algorithm (Lewis-Shedler) allow simulating a non-homogeneous Poisson process with time-varying intensity $\lambda(t)$?