Digital Signal Processing
Sampling and the Nyquist-Shannon Theorem
Цели урока
- Understand sampling as taking discrete measurements and its connection to spectral periodicity
- Explain aliasing and why it is irreversible
- State the Nyquist-Shannon theorem and apply it to practical problems
- Calculate SNR using the formula 6.02B + 1.76 dB
Предварительные знания
MP3 killed the CD industry. JPEG killed film photography. AAC in AirPods delivers 256 kbps instead of 1411 kbps. All of this runs on the Discrete Fourier Transform. The Cooley-Tukey algorithm from 1965 compressed the world's music by a factor of 8. But before compressing - the signal must be correctly sampled.
- **Spotify/Apple Music:** all digital music at 44.1 kHz is a direct consequence of the Nyquist theorem (fmax = 20 kHz)
- **AirPods (Apple H2 chip):** 1-bit DeltaSigma ADC at 5 MHz - digital filter - 256 kbps AAC with 11 ms latency
- **Tesla LiDAR:** 100K points/sec at 905 nm - ADC at 1+ GHz for time-of-flight measurements
- **Medical ultrasound:** probe frequency 1-20 MHz - fs = 40-100 MHz - Nyquist theorem at the megahertz scale
- **ECG (Apple Watch):** fs = 512 Hz with maximum cardiac signal frequency ~150 Hz - 3.4x margin
James Cooley and John Tukey: the algorithm that changed the world
In 1965, James Cooley and John Tukey published "An Algorithm for the Machine Calculation of Complex Fourier Series" in **Mathematics of Computation** - the Fast Fourier Transform (FFT). The computational cost of DFT dropped from O(N^2) to O(N log N). This made real-time audio and image processing feasible. It was later discovered that Gauss had known this algorithm since 1805 - but never published it.
Sampling
MP3 killed the CD industry. JPEG killed film photography. AAC in AirPods delivers 256 kbps instead of 1411 kbps. All of this runs on the Discrete Fourier Transform. The Cooley-Tukey algorithm from 1965 compressed the world's music by a factor of 8. But before compressing - the signal must be **sampled**: continuous sound is captured as discrete measurements at equal time intervals.
Mathematically: x[n] = x(nT), where T is the sampling period and f_s = 1/T is the sampling rate. In the frequency domain, sampling multiplies by a Dirac comb, which causes the spectrum to **repeat periodically** with period f_s. This periodicity is exactly where aliasing hides.
| Sampling Rate | Standard | Application |
|---|---|---|
| 8 kHz | Telephone (G.711) | Voice 300-3400 Hz, VoIP |
| 44.1 kHz | CD Audio | Music up to 20 kHz, Spotify, Apple Music |
| 48 kHz | DVD/Professional | Video production, AirPods AAC |
| 96/192 kHz | Hi-Res Audio | Studio mastering, Tidal MQA |
| 1-20 MHz | Medical/Radar | Ultrasound, FMCW radar, LiDAR |
**44100 Hz is not a random number!** Chosen as 2 * 20000 + margin for the audible range. More precisely: 44100 = 2^2 * 3^2 * 5^2 * 7^2 - a convenient number for divisors used in PAL and NTSC video formats. Apple Music has used 44.1 kHz for CD compatibility since 1982.
A signal with maximum frequency 8 kHz is sampled at fs = 16 kHz. How many samples per second?
Aliasing
**Aliasing** is catastrophic: when f_s < 2 * f_max, spectrum copies overlap and high frequencies masquerade as low ones. The information is **permanently destroyed**. Not slightly distorted - destroyed. No amount of post-processing can recover it.
Wheels in cinema. A camera shoots at 24 frames/sec. When a wheel spins faster than 12 revolutions/sec it appears to rotate backwards or stand still. This is aliasing - the **stroboscopic effect**. The same mathematics appears in audio DSP: a high frequency impersonates a low one.
**Anti-aliasing filter** - an analog low-pass filter applied BEFORE the ADC. It cuts all frequencies above f_s/2, preventing spectral overlap. Physically present in every microphone, ADC, and camera. In computer graphics, anti-aliasing solves the same problem for spatial aliasing on diagonal lines.
| Signal freq | fs | Alias frequency | Result |
|---|---|---|---|
| 5 Hz | 100 Hz | 5 Hz | Correct |
| 45 Hz | 100 Hz | 45 Hz | Correct (borderline) |
| 55 Hz | 100 Hz | 45 Hz | Aliasing! 55->45 |
| 80 Hz | 100 Hz | 20 Hz | Aliasing! 80->20 |
| 100 Hz | 100 Hz | 0 Hz | Aliasing! Signal = const |
Aliasing is irreversible: after sampling, 80 Hz and 20 Hz produce identical samples. The only protection is filtering BEFORE sampling. This is exactly why the anti-aliasing filter must be analog, not digital.
A 900 Hz signal is sampled at fs = 1000 Hz. What frequency will the system see?
Nyquist-Kotelnikov-Shannon Theorem
**Sampling theorem:** a signal with maximum frequency f_max can be EXACTLY reconstructed from its samples if f_s >= 2 * f_max. The rate 2 * f_max is the **Nyquist rate**. This is not an empirical rule - it is a mathematical theorem with a proof.
Reconstruction via sinc interpolation: x(t) = sum(x[n] * sinc((t - nT)/T)), where sinc(x) = sin(pi * x) / (pi * x). Each sample generates a sinc pulse, and their sum exactly recreates the original signal. In practice the ideal sinc filter is physically unrealizable - hence oversampling.
| Parameter | Formula | Meaning |
|---|---|---|
| Nyquist rate | f_N = 2*f_max | Minimum fs |
| Sampling period | T = 1/f_s | Interval between samples |
| Samples over T_sig | N = f_s * T_sig | Amount of data |
| Frequency resolution | delta_f = f_s / N | Min distinguishable frequency |
| Nyquist band | [0, f_s/2] | Unambiguously representable frequencies |
Three authors of one theorem
The theorem was independently proved three times: Nyquist (1928, Bell Labs), Kotelnikov (1933, Moscow), Shannon (1949, Bell Labs). In Russia it is the Kotelnikov theorem; in the English-speaking world - the Nyquist-Shannon theorem. All three worked in telecommunications: maximizing information over a bandwidth-limited channel. Shannon later linked it to information theory through the channel capacity formula C = B * log2(1 + SNR).
**In practice oversampling.** CD Audio: fmax = 20 kHz, fs = 44.1 kHz (ratio 2.2). The margin is needed for a physically realizable anti-aliasing filter - an ideal rectangular filter is impossible. Studio recording uses 96-192 kHz for headroom during processing.
A signal contains frequencies from 0 to 4 kHz. What is the minimum sampling rate?
Quantization
After time-domain sampling the values are still real-valued. **Quantization** rounds them to the nearest level from a finite set. B-bit quantization gives 2^B levels. **Quantization error** is uniformly distributed in [-Delta/2, Delta/2] - this is "quantization noise".
**Golden SNR formula:** SNR = 6.02 * B + 1.76 dB. Each additional bit gives ~6 dB. 16-bit CD gives ~96 dB dynamic range: from a whisper to a loud concert. 24-bit studio gives ~146 dB, exceeding the physical limit of human hearing.
| Bit depth | Levels | SNR (dB) | Application |
|---|---|---|---|
| 8 bits | 256 | ~50 | Telephone G.711, old video games |
| 16 bits | 65,536 | ~98 | CD Audio, Spotify 44.1 kHz |
| 24 bits | 16,777,216 | ~146 | Studio recording, Tidal FLAC |
| 32 bits float | ~7 digits precision | ~150+ | DAW processing (Ableton, Pro Tools) |
| 1 bit (DeltaSigma) | 2 | depends on oversampling | DSD (SACD), AirPods H2 chip |
**Oversampling + noise shaping** - modern ADCs sample at very high rates with 1 bit, then a digital filter converts to the standard format. This is DSD (Direct Stream Digital) technology: a 1-bit stream at 2.8 MHz delivers higher SNR than 16-bit at 44.1 kHz.
Sampling + quantization = **analog-to-digital conversion (ADC)**. The entire digital audio world: microphone - ADC - numbers - DSP - DAC - speaker. The Nyquist-Shannon theorem guarantees: when the sampling rate is sufficient, no information is lost. AAC in AirPods compresses 1411 kbps down to 256 kbps - that is codec work on top of the theorem.
44.1 kHz is sufficient for any sound
44.1 kHz is sufficient for the audible range (up to ~20 kHz). Ultrasound (medical, echolocation), vibration analysis, and radar require sampling rates in the hundreds of kHz to MHz range.
Nyquist-Shannon: fs >= 2 * fmax. For human hearing (20 kHz), 44.1 kHz is enough. Medical ultrasound (1-20 MHz) requires GHz-range ADCs. Even in audio, studio mastering at 96-192 kHz provides headroom when applying nonlinear effects.
CD Audio: 16 bits, 44.1 kHz, stereo. How many bytes in one minute?
Key Ideas
- **Sampling:** x[n] = x(nT), spectrum repeats periodically with period fs
- **Aliasing:** when fs < 2*fmax, spectra overlap. Information is permanently destroyed
- **Nyquist-Shannon theorem:** fs >= 2*fmax - necessary and sufficient for exact reconstruction
- **Anti-aliasing filter:** analog LPF before the ADC - the only protection against aliasing
- **Quantization:** B bits - SNR = 6.02B + 1.76 dB. 16-bit CD ~= 96 dB dynamic range
- **Oversampling:** in practice fs >> 2*fmax because an ideal filter is unrealizable
Related Topics
Sampling is the bridge between the analog and digital worlds:
- Signals and Systems — Discrete signals are the result of sampling continuous ones
- Z-Transform — Z-transform analyzes discrete signals obtained after sampling
Вопросы для размышления
- Why must the anti-aliasing filter be analog (before the ADC) rather than digital (after)?
- Hi-Res Audio (96/192 kHz) - marketing or a real improvement? Humans cannot hear above 20 kHz; why use fs > 44.1 kHz?
- Delta-Sigma modulation (1 bit, very high fs) - how can 1 bit produce quality better than 16 bits?
Связанные уроки
- dsp-01 — Signals, spectra and Fourier transform are introduced in the previous lesson
- dsp-03 — Z-transform analyzes discrete signals obtained after sampling
- dsp-04 — Digital filters operate on discrete signals
- it-02 — Kotelnikov theorem and Shannon channel capacity formula - the same mathematics
- calc-01-sequences — Sequences and discrete functions - the mathematical foundation of discrete signals
- prob-11-normal — Quantization noise modeled as uniform distribution - connection to probability theory
- prob-06-random-vars