Probability Theory

Generating Functions & MGF

Цели урока

  • Understand the probability generating function (PGF) as a distribution encoding
  • Master the moment generating function (MGF) and its properties
  • Extract moments by differentiating the MGF
  • Use the MGF uniqueness theorem to identify distributions
  • Compute MGFs numerically in Python and compare distributions

Предварительные знания

  • Expected value E[X]
  • Variance and moments Var[X]
  • Power series and derivatives
  • Expected Value
  • Variance

Suppose E[X³⁰] is required - the thirtieth moment! Computing it directly is a nightmare. But here's the trick: **encode the entire distribution in one function** - the generating function. Then any moment is extracted by simple differentiation. It's like having a master key to all the properties of a distribution.

  • Algorithm analysis: average depth of a search tree
  • Genetics: number of mutations per generation
  • Queueing theory: number of customers in a system
  • Insurance: aggregate claim amounts over a period
  • Finance: characteristic functions of asset returns

From Tricks to Theory

Laplace used generating functions as a computational trick for working with binomial coefficients in 1812. Chebyshev and his school turned them into a rigorous tool of probability theory. Moment generating functions became the foundation for proving the CLT via characteristic functions - Lyapunov's 1901 method.

Generating Functions & MGF

Generating functions are a remarkable bridge between algebra and probability. They **encode an entire distribution in a single function** and let you extract the expectation, variance, and any moments by differentiation. They are essentially the 'Fourier transform' for probabilities.

Two main types: the **probability generating function** (PGF) $G_X(z) = \mathbb{E}[z^X]$ for discrete $X \in \{0,1,2,\dots\}$, and the **moment-generating function** (MGF) $M_X(t) = \mathbb{E}[e^{tX}]$ for general $X$. A key property of the MGF: $M_{X+Y}(t) = M_X(t)\cdot M_Y(t)$ for independent $X, Y$.

Generating functions are standard tools in random-process analysis, large-deviation theory, and proofs of the CLT. In ML they appear in SGD convergence analysis and in models with latent variables.

The moment-generating function (MGF) $M_X(t) = \mathbb{E}[e^{tX}]$ is useful because:

The MGF encodes all moments in one function; differentiation extracts any of them. The MGF of an independent sum equals the product of MGFs - a powerful tool for analyzing sums.

1. Probability Generating Function (PGF)

1. Probability Generating Function (PGF)

For a **discrete** random variable X taking values in {0, 1, 2, ...}, the probability generating function is:

It's simply a power series where the coefficient of $z^k$ is $P(X=k)$. All information about the distribution is packed into one function!

**Key properties of PGF:** - $G_X(1) = 1$ (probabilities sum to 1) - $G_X'(1) = E[X]$ (first derivative at z=1 gives the mean) - $G_X''(1) = E[X(X-1)]$ - factorial moment

PGF for Bernoulli

X ~ Bernoulli(p)

$P(X=0) = 1-p$, $P(X=1) = p$ $G_X(z) = (1-p) \cdot z^0 + p \cdot z^1 = 1 - p + pz$ **Checking the mean:** $G_X'(z) = p$ $E[X] = G_X'(1) = p$ ✓ **PGF for Binomial Bin(n, p)** - product of n independent Bernoullis: $G_X(z) = (1 - p + pz)^n$

PGF for Poisson

X ~ Poisson(λ)

$G_X(z) = \sum_{k=0}^{\infty} e^{-\lambda} \frac{\lambda^k}{k!} z^k = e^{-\lambda} \sum_{k=0}^{\infty} \frac{(\lambda z)^k}{k!} = e^{-\lambda} \cdot e^{\lambda z}$ $G_X(z) = e^{\lambda(z-1)}$ **Mean:** $G_X'(z) = \lambda e^{\lambda(z-1)}$ $E[X] = G_X'(1) = \lambda$ ✓

If G_X(z) = (0.3 + 0.7z)⁵, then E[X] = ?

This is the PGF of Binomial(5, 0.7). G_X'(z) = 5·0.7·(0.3+0.7z)⁴. E[X] = G_X'(1) = 5·0.7·1 = 3.5. Or simply: E[Bin(n,p)] = np = 5·0.7 = 3.5.

2. Moment Generating Function (MGF)

2. Moment Generating Function (MGF)

For arbitrary (not only discrete) random variables, we use the **MGF - moment generating function**:

Why $e^{tX}$? Expand in a Taylor series:

**The MGF stores all moments as Taylor coefficients!**

MGFs of Standard Distributions

DistributionMGF $M_X(t)$
Bernoulli(p)$(1-p) + pe^t$
Binomial(n,p)$(1-p+pe^t)^n$
Poisson(λ)$e^{\lambda(e^t - 1)}$
Normal(μ,σ²)$e^{\mu t + \sigma^2 t^2/2}$
Exponential(λ)$\frac{\lambda}{\lambda - t}$, for $t < \lambda$

What is the MGF of N(0,1) evaluated at t=0?

M_X(0) = E[e^{0·X}] = E[1] = 1 for ANY distribution. Substituting t=0 into e^{μt + σ²t²/2} = e^0 = 1 ✓

3. Extracting Moments by Differentiation

3. Extracting Moments by Differentiation

The key trick: the **n-th moment** is obtained as the n-th derivative of the MGF evaluated at t=0:

Moments of the normal distribution

X ~ N(μ, σ²)

$M_X(t) = e^{\mu t + \sigma^2 t^2/2}$ **First derivative:** $M_X'(t) = (\mu + \sigma^2 t) \cdot e^{\mu t + \sigma^2 t^2/2}$ $E[X] = M_X'(0) = \mu \cdot e^0 = \mu$ ✓ **Second derivative:** $M_X''(0) = \sigma^2 + \mu^2$ $E[X^2] = \sigma^2 + \mu^2$ **Check:** $Var[X] = E[X^2] - (E[X])^2 = \sigma^2 + \mu^2 - \mu^2 = \sigma^2$ ✓

**Uniqueness theorem for MGF:** if $M_X(t) = M_Y(t)$ for all $t$ in a neighborhood of 0, then X and Y have the same distribution. The MGF **completely determines** the distribution!

Python: computing MGFs

Symbolic and numerical MGF computations

```python import numpy as np from scipy import stats import matplotlib.pyplot as plt # Numerical MGF via Monte Carlo def empirical_mgf(samples, t_values): """Estimate E[e^{tX}] from a sample""" return [np.mean(np.exp(t * samples)) for t in t_values] # Normal N(2, 9) mu, sigma = 2, 3 samples = np.random.normal(mu, sigma, size=100_000) t_vals = np.linspace(-0.5, 0.5, 100) # Theoretical MGF: exp(mu*t + sigma^2*t^2/2) theory = np.exp(mu * t_vals + sigma**2 * t_vals**2 / 2) # Empirical MGF empirical = empirical_mgf(samples, t_vals) plt.plot(t_vals, theory, 'b-', label='Theoretical MGF') plt.plot(t_vals, empirical, 'r--', label='Empirical MGF') plt.legend() plt.title('MGF of N(2, 9)') plt.show() # Moment extraction via numerical differentiation dt = 1e-5 E_X = (empirical_mgf(samples, [dt])[0] - 1) / dt # M'(0) print(f'E[X] ≈ {E_X:.3f} (theory: {mu})') # Via MGF formula E_X2 = mu**2 + sigma**2 print(f'E[X²] = {E_X2} (= σ² + μ² = {sigma**2} + {mu**2})') print(f'Var[X] = E[X²] - E[X]² = {E_X2 - mu**2}') ```

If M_X(t) = e^{2t + 8t²}, then Var[X] = ?

The MGF of N(μ, σ²) = e^{μt + σ²t²/2}. Here μt=2t → μ=2, and σ²t²/2=8t² → σ²/2=8 → σ²=16. Var[X] = σ² = 16.

Practice

Practice

Find the PGF for the geometric distribution P(X=k) = (1-p)^{k-1}·p, k=1,2,... Compute E[X] via the PGF.

G_X(z) = pz·Σ_{k=0}^∞ ((1-p)z)^k = pz/(1-(1-p)z) G_X'(z) = [p(1-(1-p)z) + pz(1-p)] / (1-(1-p)z)² = p / (1-(1-p)z)² E[X] = G_X'(1) = p / (1-(1-p))² = p/p² = 1/p ✓

Prove that if X ~ Poisson(λ₁) and Y ~ Poisson(λ₂) are independent, then X+Y ~ Poisson(λ₁+λ₂), using MGFs.

M_{X+Y}(t) = M_X(t) · M_Y(t) (independence) = e^{λ₁(e^t-1)} · e^{λ₂(e^t-1)} = e^{(λ₁+λ₂)(e^t-1)} This is the MGF of Poisson(λ₁+λ₂)! By the uniqueness theorem, X+Y ~ Poisson(λ₁+λ₂). ✓

X ~ Exponential(λ). Find E[X²] and Var[X] via the MGF without computing any integrals.

M_X(t) = λ/(λ-t) M_X'(t) = λ/(λ-t)² → E[X] = M_X'(0) = 1/λ M_X''(t) = 2λ/(λ-t)³ → E[X²] = M_X''(0) = 2/λ² Var[X] = E[X²] - (E[X])² = 2/λ² - 1/λ² = 1/λ² Std dev: σ = 1/λ = E[X] - a characteristic property of the exponential!

$X \sim Exp(\lambda)$ has $M_X(t) = \lambda/(\lambda - t)$ for $t < \lambda$. What is $Var[X]$ via the MGF?

$M'(t) = \lambda/(\lambda - t)^2 \Rightarrow E[X] = 1/\lambda$. $M''(t) = 2\lambda/(\lambda - t)^3 \Rightarrow E[X^2] = 2/\lambda^2$. Thus $Var[X] = 2/\lambda^2 - 1/\lambda^2 = 1/\lambda^2$. The MGF replaces integration with differentiation.

Generating Functions - Bridge to Advanced Theory

MGFs and PGFs open the door to deep results in probability theory.

  • Characteristic functions — φ_X(t) = E[e^{itX}] - work for all distributions (even without MGF)
  • Central Limit Theorem — Proved via the log of the MGF (cumulant generating function)
  • Conditional expectation — Next lesson: E[X|Y] - a generalization of expectation
  • Queueing theory — PGF for number of customers in M/G/1 queues

Итоги

  • **PGF:** $G_X(z) = E[z^X] = \sum p_k z^k$ for discrete distributions
  • **MGF:** $M_X(t) = E[e^{tX}]$ - works for any distribution (when finite)
  • **Moment extraction:** $E[X^n] = M_X^{(n)}(0)$ - just the n-th derivative at 0
  • **Uniqueness:** the MGF completely determines the distribution
  • **Sum of independents:** $M_{X+Y}(t) = M_X(t) \cdot M_Y(t)$ - product of MGFs
  • **Applications:** proving the CLT, identifying distributions, computing moments

Вопросы для размышления

  • Why is the MGF called the 'moment generating function'? How does the Taylor series link the MGF and moments?
  • Why does the MGF of the Cauchy distribution not exist - and what does that say about the distribution?
  • How does the property M_{X+Y} = M_X·M_Y for independent variables simplify the proof of the CLT?
  • In which problems is the PGF more convenient than the MGF, and vice versa?

Связанные уроки

  • stat-27-graphical-models
Generating Functions & MGF

0

1

Sign In