Probability Theory

Generating Functions & MGF

Цели урока

Understand the probability generating function (PGF) as a distribution encoding
Master the moment generating function (MGF) and its properties
Extract moments by differentiating the MGF
Use the MGF uniqueness theorem to identify distributions
Compute MGFs numerically in Python and compare distributions

Предварительные знания

Expected value E[X]
Variance and moments Var[X]
Power series and derivatives

Suppose E[X³⁰] is required - the thirtieth moment! Computing it directly is a nightmare. But here's the trick: **encode the entire distribution in one function** - the generating function. Then any moment is extracted by simple differentiation. It's like having a master key to all the properties of a distribution.

Algorithm analysis: average depth of a search tree
Genetics: number of mutations per generation
Queueing theory: number of customers in a system
Insurance: aggregate claim amounts over a period
Finance: characteristic functions of asset returns

From Tricks to Theory

Laplace used generating functions as a computational trick for working with binomial coefficients in 1812. Chebyshev and his school turned them into a rigorous tool of probability theory. Moment generating functions became the foundation for proving the CLT via characteristic functions - Lyapunov's 1901 method.

Generating Functions & MGF

Generating functions are a remarkable bridge between algebra and probability. They **encode an entire distribution in a single function** and let you extract the expectation, variance, and any moments by differentiation. They are essentially the 'Fourier transform' for probabilities.

Two main types: the **probability generating function** (PGF) $G_X(z) = \mathbb{E}[z^X]$ for discrete $X \in \{0,1,2,\dots\}$, and the **moment-generating function** (MGF) $M_X(t) = \mathbb{E}[e^{tX}]$ for general $X$. A key property of the MGF: $M_{X+Y}(t) = M_X(t)\cdot M_Y(t)$ for independent $X, Y$.

Generating functions are standard tools in random-process analysis, large-deviation theory, and proofs of the CLT. In ML they appear in SGD convergence analysis and in models with latent variables.

The moment-generating function (MGF) $M_X(t) = \mathbb{E}[e^{tX}]$ is useful because:

The MGF encodes all moments in one function; differentiation extracts any of them. The MGF of an independent sum equals the product of MGFs - a powerful tool for analyzing sums.

1. Probability Generating Function (PGF)

For a **discrete** random variable X taking values in {0, 1, 2, ...}, the probability generating function is:

It's simply a power series where the coefficient of $z^k$ is $P(X=k)$. All information about the distribution is packed into one function!

**Key properties of PGF:** - $G_X(1) = 1$ (probabilities sum to 1) - $G_X'(1) = E[X]$ (first derivative at z=1 gives the mean) - $G_X''(1) = E[X(X-1)]$ - factorial moment

PGF for Bernoulli

X ~ Bernoulli(p)

$P(X=0) = 1-p$, $P(X=1) = p$ $G_X(z) = (1-p) \cdot z^0 + p \cdot z^1 = 1 - p + pz$ **Checking the mean:** $G_X'(z) = p$ $E[X] = G_X'(1) = p$ ✓ **PGF for Binomial Bin(n, p)** - product of n independent Bernoullis: $G_X(z) = (1 - p + pz)^n$

PGF for Poisson

X ~ Poisson(λ)

$G_X(z) = \sum_{k=0}^{\infty} e^{-\lambda} \frac{\lambda^k}{k!} z^k = e^{-\lambda} \sum_{k=0}^{\infty} \frac{(\lambda z)^k}{k!} = e^{-\lambda} \cdot e^{\lambda z}$ $G_X(z) = e^{\lambda(z-1)}$ **Mean:** $G_X'(z) = \lambda e^{\lambda(z-1)}$ $E[X] = G_X'(1) = \lambda$ ✓

If G_X(z) = (0.3 + 0.7z)⁵, then E[X] = ?

This is the PGF of Binomial(5, 0.7). G_X'(z) = 5·0.7·(0.3+0.7z)⁴. E[X] = G_X'(1) = 5·0.7·1 = 3.5. Or simply: E[Bin(n,p)] = np = 5·0.7 = 3.5.

2. Moment Generating Function (MGF)

For arbitrary (not only discrete) random variables, we use the **MGF - moment generating function**:

Why $e^{tX}$? Expand in a Taylor series:

**The MGF stores all moments as Taylor coefficients!**

MGFs of Standard Distributions

Distribution	MGF $M_X(t)$
Bernoulli(p)	$(1-p) + pe^t$
Binomial(n,p)	$(1-p+pe^t)^n$
Poisson(λ)	$e^{\lambda(e^t - 1)}$
Normal(μ,σ²)	$e^{\mu t + \sigma^2 t^2/2}$
Exponential(λ)	$\frac{\lambda}{\lambda - t}$, for $t < \lambda$

What is the MGF of N(0,1) evaluated at t=0?

M_X(0) = E[e^{0·X}] = E[1] = 1 for ANY distribution. Substituting t=0 into e^{μt + σ²t²/2} = e^0 = 1 ✓

3. Extracting Moments by Differentiation

The key trick: the **n-th moment** is obtained as the n-th derivative of the MGF evaluated at t=0:

Moments of the normal distribution

X ~ N(μ, σ²)

$M_X(t) = e^{\mu t + \sigma^2 t^2/2}$ **First derivative:** $M_X'(t) = (\mu + \sigma^2 t) \cdot e^{\mu t + \sigma^2 t^2/2}$ $E[X] = M_X'(0) = \mu \cdot e^0 = \mu$ ✓ **Second derivative:** $M_X''(0) = \sigma^2 + \mu^2$ $E[X^2] = \sigma^2 + \mu^2$ **Check:** $Var[X] = E[X^2] - (E[X])^2 = \sigma^2 + \mu^2 - \mu^2 = \sigma^2$ ✓

**Uniqueness theorem for MGF:** if $M_X(t) = M_Y(t)$ for all $t$ in a neighborhood of 0, then X and Y have the same distribution. The MGF **completely determines** the distribution!

Python: computing MGFs

Symbolic and numerical MGF computations

```python import numpy as np from scipy import stats import matplotlib.pyplot as plt # Numerical MGF via Monte Carlo def empirical_mgf(samples, t_values): """Estimate E[e^{tX}] from a sample""" return [np.mean(np.exp(t * samples)) for t in t_values] # Normal N(2, 9) mu, sigma = 2, 3 samples = np.random.normal(mu, sigma, size=100_000) t_vals = np.linspace(-0.5, 0.5, 100) # Theoretical MGF: exp(mu*t + sigma^2*t^2/2) theory = np.exp(mu * t_vals + sigma**2 * t_vals**2 / 2) # Empirical MGF empirical = empirical_mgf(samples, t_vals) plt.plot(t_vals, theory, 'b-', label='Theoretical MGF') plt.plot(t_vals, empirical, 'r--', label='Empirical MGF') plt.legend() plt.title('MGF of N(2, 9)') plt.show() # Moment extraction via numerical differentiation dt = 1e-5 E_X = (empirical_mgf(samples, [dt])[0] - 1) / dt # M'(0) print(f'E[X] ≈ {E_X:.3f} (theory: {mu})') # Via MGF formula E_X2 = mu**2 + sigma**2 print(f'E[X²] = {E_X2} (= σ² + μ² = {sigma**2} + {mu**2})') print(f'Var[X] = E[X²] - E[X]² = {E_X2 - mu**2}') ```

If M_X(t) = e^{2t + 8t²}, then Var[X] = ?

The MGF of N(μ, σ²) = e^{μt + σ²t²/2}. Here μt=2t → μ=2, and σ²t²/2=8t² → σ²/2=8 → σ²=16. Var[X] = σ² = 16.

Practice

Find the PGF for the geometric distribution P(X=k) = (1-p)^{k-1}·p, k=1,2,... Compute E[X] via the PGF.

G_X(z) = pz·Σ_{k=0}^∞ ((1-p)z)^k = pz/(1-(1-p)z) G_X'(z) = [p(1-(1-p)z) + pz(1-p)] / (1-(1-p)z)² = p / (1-(1-p)z)² E[X] = G_X'(1) = p / (1-(1-p))² = p/p² = 1/p ✓

Prove that if X ~ Poisson(λ₁) and Y ~ Poisson(λ₂) are independent, then X+Y ~ Poisson(λ₁+λ₂), using MGFs.

M_{X+Y}(t) = M_X(t) · M_Y(t) (independence) = e^{λ₁(e^t-1)} · e^{λ₂(e^t-1)} = e^{(λ₁+λ₂)(e^t-1)} This is the MGF of Poisson(λ₁+λ₂)! By the uniqueness theorem, X+Y ~ Poisson(λ₁+λ₂). ✓

X ~ Exponential(λ). Find E[X²] and Var[X] via the MGF without computing any integrals.

M_X(t) = λ/(λ-t) M_X'(t) = λ/(λ-t)² → E[X] = M_X'(0) = 1/λ M_X''(t) = 2λ/(λ-t)³ → E[X²] = M_X''(0) = 2/λ² Var[X] = E[X²] - (E[X])² = 2/λ² - 1/λ² = 1/λ² Std dev: σ = 1/λ = E[X] - a characteristic property of the exponential!

$X \sim Exp(\lambda)$ has $M_X(t) = \lambda/(\lambda - t)$ for $t < \lambda$. What is $Var[X]$ via the MGF?

$M'(t) = \lambda/(\lambda - t)^2 \Rightarrow E[X] = 1/\lambda$. $M''(t) = 2\lambda/(\lambda - t)^3 \Rightarrow E[X^2] = 2/\lambda^2$. Thus $Var[X] = 2/\lambda^2 - 1/\lambda^2 = 1/\lambda^2$. The MGF replaces integration with differentiation.

Generating Functions - Bridge to Advanced Theory

MGFs and PGFs open the door to deep results in probability theory.

Characteristic functions — φ_X(t) = E[e^{itX}] - work for all distributions (even without MGF)
Central Limit Theorem — Proved via the log of the MGF (cumulant generating function)
Conditional expectation — Next lesson: E[X|Y] - a generalization of expectation
Queueing theory — PGF for number of customers in M/G/1 queues

Итоги

**PGF:** $G_X(z) = E[z^X] = \sum p_k z^k$ for discrete distributions
**MGF:** $M_X(t) = E[e^{tX}]$ - works for any distribution (when finite)
**Moment extraction:** $E[X^n] = M_X^{(n)}(0)$ - just the n-th derivative at 0
**Uniqueness:** the MGF completely determines the distribution
**Sum of independents:** $M_{X+Y}(t) = M_X(t) \cdot M_Y(t)$ - product of MGFs
**Applications:** proving the CLT, identifying distributions, computing moments

Вопросы для размышления

Why is the MGF called the 'moment generating function'? How does the Taylor series link the MGF and moments?
Why does the MGF of the Cauchy distribution not exist - and what does that say about the distribution?
How does the property M_{X+Y} = M_X·M_Y for independent variables simplify the proof of the CLT?
In which problems is the PGF more convenient than the MGF, and vice versa?

Связанные уроки

stat-27-graphical-models

Probability Theory

Generating Functions & MGF

Цели урока

Understand the probability generating function (PGF) as a distribution encoding
Master the moment generating function (MGF) and its properties
Extract moments by differentiating the MGF
Use the MGF uniqueness theorem to identify distributions
Compute MGFs numerically in Python and compare distributions

Предварительные знания

Expected value E[X]
Variance and moments Var[X]
Power series and derivatives

Algorithm analysis: average depth of a search tree
Genetics: number of mutations per generation
Queueing theory: number of customers in a system
Insurance: aggregate claim amounts over a period
Finance: characteristic functions of asset returns

From Tricks to Theory

Generating Functions & MGF

Generating functions are standard tools in random-process analysis, large-deviation theory, and proofs of the CLT. In ML they appear in SGD convergence analysis and in models with latent variables.

The moment-generating function (MGF) $M_X(t) = \mathbb{E}[e^{tX}]$ is useful because:

The MGF encodes all moments in one function; differentiation extracts any of them. The MGF of an independent sum equals the product of MGFs - a powerful tool for analyzing sums.

1. Probability Generating Function (PGF)

For a **discrete** random variable X taking values in {0, 1, 2, ...}, the probability generating function is:

It's simply a power series where the coefficient of $z^k$ is $P(X=k)$. All information about the distribution is packed into one function!

**Key properties of PGF:** - $G_X(1) = 1$ (probabilities sum to 1) - $G_X'(1) = E[X]$ (first derivative at z=1 gives the mean) - $G_X''(1) = E[X(X-1)]$ - factorial moment

PGF for Bernoulli

X ~ Bernoulli(p)

PGF for Poisson

X ~ Poisson(λ)

If G_X(z) = (0.3 + 0.7z)⁵, then E[X] = ?

This is the PGF of Binomial(5, 0.7). G_X'(z) = 5·0.7·(0.3+0.7z)⁴. E[X] = G_X'(1) = 5·0.7·1 = 3.5. Or simply: E[Bin(n,p)] = np = 5·0.7 = 3.5.

2. Moment Generating Function (MGF)

For arbitrary (not only discrete) random variables, we use the **MGF - moment generating function**:

Why $e^{tX}$? Expand in a Taylor series:

**The MGF stores all moments as Taylor coefficients!**

MGFs of Standard Distributions

Distribution	MGF $M_X(t)$
Bernoulli(p)	$(1-p) + pe^t$
Binomial(n,p)	$(1-p+pe^t)^n$
Poisson(λ)	$e^{\lambda(e^t - 1)}$
Normal(μ,σ²)	$e^{\mu t + \sigma^2 t^2/2}$
Exponential(λ)	$\frac{\lambda}{\lambda - t}$, for $t < \lambda$

What is the MGF of N(0,1) evaluated at t=0?

M_X(0) = E[e^{0·X}] = E[1] = 1 for ANY distribution. Substituting t=0 into e^{μt + σ²t²/2} = e^0 = 1 ✓

3. Extracting Moments by Differentiation

The key trick: the **n-th moment** is obtained as the n-th derivative of the MGF evaluated at t=0:

Moments of the normal distribution

X ~ N(μ, σ²)

**Uniqueness theorem for MGF:** if $M_X(t) = M_Y(t)$ for all $t$ in a neighborhood of 0, then X and Y have the same distribution. The MGF **completely determines** the distribution!

Python: computing MGFs

Symbolic and numerical MGF computations

If M_X(t) = e^{2t + 8t²}, then Var[X] = ?

The MGF of N(μ, σ²) = e^{μt + σ²t²/2}. Here μt=2t → μ=2, and σ²t²/2=8t² → σ²/2=8 → σ²=16. Var[X] = σ² = 16.

Practice

Find the PGF for the geometric distribution P(X=k) = (1-p)^{k-1}·p, k=1,2,... Compute E[X] via the PGF.

G_X(z) = pz·Σ_{k=0}^∞ ((1-p)z)^k = pz/(1-(1-p)z) G_X'(z) = [p(1-(1-p)z) + pz(1-p)] / (1-(1-p)z)² = p / (1-(1-p)z)² E[X] = G_X'(1) = p / (1-(1-p))² = p/p² = 1/p ✓

Prove that if X ~ Poisson(λ₁) and Y ~ Poisson(λ₂) are independent, then X+Y ~ Poisson(λ₁+λ₂), using MGFs.

X ~ Exponential(λ). Find E[X²] and Var[X] via the MGF without computing any integrals.

$X \sim Exp(\lambda)$ has $M_X(t) = \lambda/(\lambda - t)$ for $t < \lambda$. What is $Var[X]$ via the MGF?

Generating Functions - Bridge to Advanced Theory

MGFs and PGFs open the door to deep results in probability theory.

Characteristic functions — φ_X(t) = E[e^{itX}] - work for all distributions (even without MGF)
Central Limit Theorem — Proved via the log of the MGF (cumulant generating function)
Conditional expectation — Next lesson: E[X|Y] - a generalization of expectation
Queueing theory — PGF for number of customers in M/G/1 queues

Итоги

**PGF:** $G_X(z) = E[z^X] = \sum p_k z^k$ for discrete distributions
**MGF:** $M_X(t) = E[e^{tX}]$ - works for any distribution (when finite)
**Moment extraction:** $E[X^n] = M_X^{(n)}(0)$ - just the n-th derivative at 0
**Uniqueness:** the MGF completely determines the distribution
**Sum of independents:** $M_{X+Y}(t) = M_X(t) \cdot M_Y(t)$ - product of MGFs
**Applications:** proving the CLT, identifying distributions, computing moments

Вопросы для размышления

Why is the MGF called the 'moment generating function'? How does the Taylor series link the MGF and moments?
Why does the MGF of the Cauchy distribution not exist - and what does that say about the distribution?
How does the property M_{X+Y} = M_X·M_Y for independent variables simplify the proof of the CLT?
In which problems is the PGF more convenient than the MGF, and vice versa?

Связанные уроки

stat-27-graphical-models