Discrete Mathematics
Random Variables
"O(n log n) on average" for quicksort is not hand-waving - it is a mathematically precise statement about E[number of comparisons] as a random variable. Hash table load factor is E[chain length] by linearity. Retry policies in distributed systems model as geometric random variables. SLA P99 latency is a tail probability bounded by Chebyshev. Random variables give one the exact language to reason about all of this.
- **Quicksort analysis:** E[comparisons] = O(n log n) proven via indicator variables and linearity of expectation - each pair of elements is compared with probability 2/(j-i+1)
- **Hash tables:** E[chain length] = load factor n/m by linearity; at load factor 0.75, expected O(1) lookup; above 1.0, performance degrades sharply
- **Retry logic:** number of attempts until success follows Geometric(p), so E[attempts]=1/p. Exponential backoff is designed around this expectation
- **SLA design:** Chebyshev bounds let one guarantee P99 latency from mean and variance alone, without assuming a specific distribution
Предварительные знания
Random Variables: PMF, E[X], Var[X]
Cloudflare filters 72 billion requests per day using Bloom filters , a 1% false positive rate requires only 10 bits per element, making random variable theory the core of modern network security. A random variable X is a function X: Omega -> R mapping outcomes to numbers. A discrete random variable takes a finite or countable set of values. The three core descriptors are the PMF (probability mass function), expected value E[X], and variance Var[X].
**Core definitions:** - **PMF:** P(X=k) = probability X takes value k. All values sum to 1. - **Expected value:** E[X] = sum_k k * P(X=k) - the probability-weighted average - **Variance:** Var[X] = E[(X - E[X])^2] = E[X^2] - (E[X])^2 - measures spread - **Standard deviation:** sigma = sqrt(Var[X])
When we say quicksort runs in O(n log n) on average, we mean E[number of comparisons] = O(n log n), where the expectation is over all possible pivot choices. The random variable here is the number of comparisons for a fixed input array when pivot selection is random.
The shortcut formula Var[X] = E[X^2] - (E[X])^2 is almost always faster to compute than expanding E[(X-mu)^2] directly. Compute E[X] and E[X^2] separately, then subtract.
A random variable X has P(X=0)=1/2, P(X=2)=1/4, P(X=4)=1/4. What is E[X]?
Linearity of Expectation
Linearity of expectation is one of the most powerful tools in probability: E[X + Y] = E[X] + E[Y] and E[cX] = c*E[X] hold for ANY random variables X and Y, even dependent ones. This lets one decompose complex expected values into simple indicator variables.
**Linearity of E[]:** - E[X + Y] = E[X] + E[Y] (for ANY X, Y, regardless of dependence) - E[c*X] = c*E[X] for constant c - E[X_1 + ... + X_n] = E[X_1] + ... + E[X_n] Caution: E[X*Y] = E[X]*E[Y] ONLY when X and Y are independent.
The indicator variable technique is a standard proof strategy: to find E[number of X-events], define X_i = 1 if event i occurs, 0 otherwise. Then X = sum X_i, and by linearity E[X] = sum P(X_i = 1). This works even when the events are dependent, which is the key insight.
Expected hash table chain length equals the load factor n/m directly by linearity. Each of the n elements lands in any given bucket independently with probability 1/m. So E[elements in bucket k] = n * (1/m) = n/m = alpha.
A hash table has m=200 slots and n=50 elements. What is the expected chain length at any given slot?
Binomial and Geometric Distributions
Two fundamental discrete distributions: the **binomial** B(n,p) counts successes in n independent trials with success probability p. The **geometric** Geom(p) counts trials until the first success. Both appear constantly in algorithm analysis and systems engineering.
**Binomial B(n,p):** P(X=k) = C(n,k) * p^k * (1-p)^(n-k) - E[X] = n*p - Var[X] = n*p*(1-p) **Geometric Geom(p):** P(X=k) = (1-p)^(k-1) * p for k = 1, 2, 3, ... - E[X] = 1/p - Var[X] = (1-p)/p^2 - **Memoryless property:** P(X > s+t | X > s) = P(X > t)
Retry logic in microservices: if each attempt succeeds with probability p, the number of attempts follows Geom(p), and E[attempts] = 1/p. With exponential backoff, the expected total wait time is the sum of E[wait before attempt k] times P(first success on attempt k). The geometric distribution is exactly the right model here.
The memoryless property means past failures do NOT increase the probability of future success. If a server has been down for 10 minutes, the probability of it coming back in the next minute is the same as when it first went down (under a geometric/exponential model). Real hardware failures often violate this with aging effects.
A service call succeeds with probability p=0.25. What is the expected number of attempts until the first success?
Markov and Chebyshev Inequalities
Markov's and Chebyshev's inequalities give upper bounds on tail probabilities without knowing the full distribution. They are the workhorses of algorithm analysis when it is necessary to prove that a randomized algorithm is unlikely to run much longer than its expected time.
**Markov's inequality** (for X >= 0): P(X >= a) <= E[X] / a **Chebyshev's inequality:** P(|X - E[X]| >= t) <= Var[X] / t^2 Equivalently: P(|X - mu| >= k*sigma) <= 1/k^2 Chebyshev is tighter than Markov: it uses both E[X] and Var[X], while Markov uses only E[X].
Markov and Chebyshev give weak but distribution-free bounds. In practice, tighter **Chernoff bounds** are used for sums of independent 0/1 variables, giving exponentially decaying tail probabilities. These are the reason we can confidently say a randomized algorithm that runs in O(n log n) expected time rarely takes more than O(n log^2 n) time.
Rule of thumb from Chebyshev: P(within 2 sigma) >= 3/4 (75%), P(within 3 sigma) >= 8/9 (89%). For normally distributed data these bounds are much tighter (~95% and ~99.7%), but Chebyshev applies to ANY distribution with finite variance.
E[response time] = 80ms. By Markov's inequality, what is the upper bound on P(response time >= 400ms)?
Key Ideas
- **E[X] = sum k*P(X=k):** probability-weighted average. Var[X] = E[X^2] - (E[X])^2.
- **Linearity of E[]:** E[X+Y] = E[X]+E[Y] for ANY X, Y. Combine with indicator variables to compute complex expected values.
- **Binomial B(n,p):** E[X]=np, Var[X]=np(1-p). Number of successes in n independent trials.
- **Geometric Geom(p):** E[X]=1/p, memoryless. Number of trials until first success - the right model for retry logic.
- **Markov/Chebyshev:** distribution-free tail bounds. Markov uses only E[X]; Chebyshev uses both E[X] and Var[X] for tighter bounds.
Related Topics
Random variables bridge probability theory and algorithm analysis:
- Discrete Probability — Random variables are built on the probability space and axioms covered in the probability lesson
- Discrete Math in CS — Probabilistic algorithms are analyzed using expected running time and Chernoff bounds (extensions of Markov/Chebyshev)
- Discrete Math in Interviews — Expected value calculations via indicator variables are classic interview problems at FAANG companies
Вопросы для размышления
- Linearity E[X+Y]=E[X]+E[Y] holds even for dependent variables, but E[X*Y]=E[X]*E[Y] requires independence. Can one think of an example where confusing these leads to a wrong algorithm analysis?
- A hash table rehashes when load factor exceeds 0.75. Using geometric expectations, estimate how many insertions trigger a rehash on average if the table starts at load factor 0.5 and each insertion has a small chance of pushing it over the threshold.
- Chebyshev says P(|X - mu| >= 2*sigma) <= 0.25. Under what circumstances is this bound tight (nearly achieved), and under what circumstances is the real probability much smaller?