Algebra
Complex Numbers
MP3, JPEG, 5G, radar, MRI - all run on FFT. FFT runs on complex numbers. $e^{i\theta} = \cos\theta + i\sin\theta$ - Euler's formula - is not a textbook abstraction. It is the computational infrastructure of the digital world. In LLaMA and Mistral, it encodes the position of every token.
- **FFT and compression**: MP3, JPEG, AVIF - complex spectral analysis removes perceptually insignificant information. Every file downloaded from the internet
- **RoPE in transformers**: LLaMA, Mistral, Gemma encode token position via $e^{im\theta}$ - multiplication by a complex rotator
- **ComplexNet for MRI**: complex-valued neural network weights process k-space data directly, preserving phase information
Предварительные знания
i² = -1: geometry, not magic
**1572. Bologna.** Rafael Bombelli was solving $x^3 = 15x + 4$. Cardano had a formula for cubic equations - plug in numbers, get a root. Bombelli plugged them in. An intermediate step demanded $\sqrt{-121}$. Any reasonable 16th-century mathematician would have stopped there. Bombelli did not stop. He computed as if nothing was wrong - and obtained $x = 4$, a real root that checks out immediately: $4^3 = 64 = 15 \cdot 4 + 4$. Imaginary numbers were not invented as an end in themselves. They emerged because the path to a real answer ran through them.
The number line is a line. Complex numbers extend it to a plane. Real numbers - the horizontal axis. Imaginary - the vertical axis. $i = \sqrt{-1}$ is a 90-degree rotation. Multiplying by $i$ twice: two 90-degree rotations = 180 degrees = multiplication by $-1$. Not magic - geometry. $i^2 = -1$ is the statement that two 90-degree rotations compose to a 180-degree rotation.
**A complex number** $z = a + bi$, where $a, b \in \mathbb{R}$: - $\text{Re}(z) = a$ - real part (horizontal) - $\text{Im}(z) = b$ - imaginary part (vertical) - $|z| = \sqrt{a^2 + b^2}$ - modulus (distance from origin) - $\bar{z} = a - bi$ - conjugate (reflection across the real axis) - $z \cdot \bar{z} = a^2 + b^2 = |z|^2$ - the key identity
What is $|3 + 4i|$?
Arithmetic and the geometry of multiplication
Addition is component-wise, like vectors. Multiplication is where it gets interesting. $(a + bi)(c + di) = (ac - bd) + (ad + bc)i$. Expand like ordinary algebra, replace $i^2 = -1$. But geometrically: multiplying two complex numbers **multiplies their moduli** and **adds their arguments**. Lengths multiply, angles add. This is rotation and scaling in one operation - exactly what a $2 \times 2$ rotation matrix does, but in a single complex multiply.
**Division** via the conjugate: $\frac{a+bi}{c+di} = \frac{(a+bi)(c-di)}{c^2+d^2}$. The denominator becomes real: $|z|^2 = z \cdot \bar{z}$. The same technique appears in the normalization layer of ComplexNet - complex-valued neural networks for MRI and radar signal processing.
**Multiplying by $e^{i\theta}$** is a pure rotation by angle $\theta$, the modulus unchanged. This is the core of Rotary Position Embedding (RoPE) - the positional encoding used in LLaMA, Mistral, Gemma. Token position is encoded as a rotation angle in the complex plane.
What is $(2+3i)(1-i)$?
Euler's formula and roots of unity
The most celebrated formula in mathematics: $e^{i\theta} = \cos\theta + i\sin\theta$. At $\theta = \pi$: $e^{i\pi} + 1 = 0$ - all five fundamental constants in one equation. But more important than beauty is utility: the formula converts rotation by angle $\theta$ into multiplication by $e^{i\theta}$. Computationally cheaper than a $2 \times 2$ matrix. And it is the foundation of FFT.
The $n$-th roots of unity $\omega_n = e^{2\pi i / n}$ are equally spaced on the unit circle: $n$ points separated by angle $2\pi/n$ each. The orthogonality property: $\sum_{k=0}^{n-1} \omega_n^{jk} = n$ if $j \equiv 0 \pmod{n}$, else 0. This algebraic fact is what allows FFT to decompose a signal into its frequency components.
**RoPE (Rotary Position Embedding)** - the positional encoding in LLaMA, Mistral, Gemma. Token position $m$ is encoded by multiplying the embedding vector by $e^{im\theta}$ in the complex plane. This gave better long-context extrapolation than the sinusoidal encoding of the original Transformer. Euler's formula is not an academic artifact. It is the infrastructure of every modern LLM.
By De Moivre's theorem, $(\cos 60° + i \sin 60°)^3$ equals:
FFT, RoPE, and complex-valued networks
**FFT (Cooley-Tukey, 1965)** is one of the most important algorithms in the history of CS. Before it, the discrete Fourier transform required $O(n^2)$ operations. Cooley and Tukey noticed that evaluating a polynomial at $n$ roots of unity recursively splits into two evaluations at $n/2$ roots. The result: $O(n \log n)$. The difference between $n^2$ and $n \log n$ at $n = 10^6$ is from one hour to one second.
| Application | How complex math is used | Scale |
|---|---|---|
| MP3/AAC compression | MDCT via FFT - isolates inaudible frequencies | Billions of devices |
| JPEG/AVIF | DCT of 8x8 blocks - discrete analogue of FFT | Every image on the web |
| 5G OFDM | Signal = IFFT(complex symbols), demodulation = FFT | Every smartphone |
| MRI scanning | k-space data is the complex Fourier transform of the cross-section | Medical diagnostics |
| RoPE in transformers | $e^{im\theta}$ - token position as rotation angle | LLaMA, Mistral, Gemma |
**Random Fourier features (Rahimi-Recht 2007)** bridge kernel methods and neural networks. Bochner's theorem: any stationary kernel (RBF, Matérn) can be written as an integral over $e^{i\omega^T x}$. Sample random frequencies $\omega$ - get a linear approximation to a nonlinear kernel. This scaled SVMs to millions of points and provided theoretical connections to the first layer of a neural network.
How many roots (in $\mathbb{C}$, counting multiplicity) does $x^5 - 1$ have?
Key ideas
- **$z = a + bi$**: the complex plane extends the number line to 2D. $|z| = \sqrt{a^2+b^2}$, $z\bar{z} = |z|^2$
- **Multiplication = rotation + scaling**: moduli multiply, arguments add. Multiplying by $e^{i\theta}$ is a pure rotation
- **Euler's formula** $e^{i\theta} = \cos\theta + i\sin\theta$: the foundation of FFT, RoPE in transformers, and Fourier features
- **Fundamental Theorem**: a degree-$n$ polynomial has exactly $n$ roots in $\mathbb{C}$. Real polynomials have complex-conjugate pairs
What comes next
Complex numbers thread through the rest of the course and modern ML systems:
- Polynomials — Real polynomial roots can be complex - the motivation for extending from R to C
- Eigenvalues and Eigenvectors — Complex eigenvalues signal rotational dynamics. Spectral radius determines RNN stability
- Sequences — Geometric series in C: foundation of spectral analysis and FFT convergence proofs
Вопросы для размышления
- Why does FFT work best when signal length is a power of two? What happens with arbitrary lengths, as in numpy.fft.fft?
- Complex numbers were invented to solve $x^2 + 1 = 0$. How did they end up essential in signal processing - a seemingly unrelated domain?
- RoPE encodes token position as a rotation angle in the complex plane. Why does this give better long-context extrapolation than the sinusoidal encoding of the original Transformer?
Связанные уроки
- alg-04 — Polynomials over R have complex roots - the motivation for extending to C
- alg-10 — Complex eigenvalues of matrices: analyzing RNN dynamics and diffusion models
- calc-01-sequences — Geometric series in C: the foundation of spectral analysis and FFT
- prob-04-bayes — Random Fourier features (Rahimi-Recht 2007) connect kernel methods to neural nets via complex exponentials
- la-01-vectors-intro