Probability Theory

The Normal Distribution

1733: Abraham de Moivre derives the bell as the limit of the binomial. 1809: Gauss rediscovers it while reducing errors in astronomical observations. 1810: Laplace proves the first general form of the CLT. 1986: Motorola engineer Bill Smith turns this distribution into Six Sigma - manufacture so that defects fall more than 6 standard deviations from the target, or 3.4 per million. Motorola survived; the method became an industry benchmark.

  • Quality control: Six Sigma, component tolerances - Boeing, Toyota, GE
  • Biology: height, weight, IQ - all bell-shaped because of the CLT
  • Finance: the Black-Scholes option pricing model assumes log-normal prices
  • Physics: thermal motion of molecules (Maxwell distribution)
  • ML: neural network weight initialization, BatchNorm → ≈N(0,1)

Предварительные знания

  • Continuous distributions and probability density
  • Expected value and variance
  • The concept of standard deviation
  • Continuous Distributions
  • Variance

N(μ, σ²): the bell and its parameters

In 1733, Abraham de Moivre showed in "Approximatio ad summam terminorum binomii" that the binomial distribution takes a bell shape for large n. That was the first appearance of the normal law - 76 years before Gauss rediscovered it in 1809 in "Theoria Motus" while modeling errors in astronomical observations. Laplace proved the first general form of the CLT in 1810; Lyapunov gave a rigorous proof in 1901.

0

1

Sign In

The normal distribution X ~ N(μ, σ²) is defined by two parameters: **μ** (mu) - the center of the bell, **σ** (sigma) - its width. The density function:

**Symmetry rule:** Mean = median = mode = μ. The curve is symmetric about μ. Doubling σ makes the bell twice as wide and half as tall - the area under the curve always equals 1.

The **68-95-99.7 rule** is the main tool for working with the normal distribution mentally, without tables:

IQ ~ N(100, 225), meaning μ=100, σ=15. Approximately what percentage of people have IQ above 130?

The Central Limit Theorem: why normal is everywhere

The normal distribution does not appear because the world is inherently "normal". It appears because we often observe **sums** or **averages** of many independent quantities. The Central Limit Theorem (CLT) explains this phenomenon.

**CLT:** Let X₁, X₂, ..., Xₙ be independent, identically distributed random variables with finite μ and σ². Then as n → ∞: $$\bar{X}_n = \frac{X_1 + \ldots + X_n}{n} \xrightarrow{d} N\left(\mu,\, \frac{\sigma^2}{n}\right)$$ Regardless of the shape of the original distribution!

This is why the normal distribution appears in nature and science: human height is a sum of genetic and environmental factors; measurement errors are a sum of many small inaccuracies; electronic noise is a superposition of thermal fluctuations.

The CLT says the sample mean of n=100 observations from Exp(λ=2) (μ=0.5, σ=0.5) has approximate distribution:

Z-tests, confidence intervals, log-normal

Standardization z = (x - μ)/σ converts any N(μ, σ²) to the standard N(0,1). This means one Φ(z) table covers all normal distributions. The z-score measures how unusual a value is.

**When normal does not fit:** incomes (heavy right tail, log-normal is better), time to failure (exponential/Weibull), event counts (Poisson), proportions (beta distribution). Always check a QQ-plot before applying normal-based methods.

X ~ N(50, 100). The standardized value of x = 70 is z = ?

Summary

  • **N(μ, σ²):** bell-shaped, symmetric, median = mode = mean = μ
  • **68-95-99.7 rule:** fractions within 1σ, 2σ, 3σ of the mean
  • **CLT:** the mean of n iid variables → N(μ, σ²/n) - explains why normal appears in nature
  • **Z-score:** z = (x - μ)/σ - standardizes to work with the Φ(z) table
  • **Six Sigma:** tolerance at 6σ from the mean = 3.4 defects per million

The normal distribution - the foundation of all statistics

Understanding the normal distribution connects probability theory to practical statistical methods.

  • Central Limit Theorem — Explains why the normal appears everywhere
  • Confidence Intervals — Constructed using z-critical values
  • Hypothesis Testing — Z-tests and t-tests are built on the normal

Вопросы для размышления

  • How can a company reduce σ in its manufacturing process? What does that require technically?
  • Why are incomes NOT normally distributed, while human heights are? What distinguishes these two situations?
  • A z-score of 4 for an athlete's performance - how rare is that? How many people in a million have such a z-score?

Связанные уроки

  • ml-06-linear-regression
  • ml-33-gan
  • stat-06-t-test
  • stat-04-confidence
The Normal Distribution