Cryptography
Number Theory for Cryptography
Цели урока
- Understand how the prime number theorem makes RSA key generation practical
- Explain the asymmetry of difficulty between multiplication and factorization that RSA relies on
- Apply Fermat's and Euler's theorems to modular arithmetic computations
- Understand the role of Euler's totient in RSA key generation and why computing it requires factorization
Предварительные знания
2048 bits. That's the RSA key guarding an HTTPS connection right now. RSA-829 (829 bits) fell in 2020 after 2700 core-years of compute. RSA-2048 still stands. Between them sits no linear gap but a subexponential chasm in difficulty. The math that makes this possible boils down to one equation: $n = p \cdot q$.
- **RSA key generation** - every time a browser establishes an HTTPS connection, the server presents a certificate with an RSA key built from two large primes. The prime number theorem guarantees generation takes milliseconds
- **Cryptocurrencies** - Bitcoin and Ethereum use elliptic-curve cryptography whose security relies on analogous number-theory problems. Euler's totient and Fermat's theorem underlie finite-field arithmetic
- **Post-quantum cryptography** - NIST standardized CRYSTALS-Kyber and CRYSTALS-Dilithium in 2024 because Shor's algorithm on a quantum computer breaks factorization in polynomial time
Rivest, Shamir, Adleman - and a letter from the past
In 1977 three MIT scientists published the RSA algorithm. Rivest cracked the math in a single night after a party. Shamir and Adleman wrestled with the proofs. The paper was ready inside a week. Three centuries earlier - in 1640 - Pierre de Fermat wrote his theorem in a letter margin without a proof. Euler proved it a hundred years later. RSA put both to work 337 years after Fermat. Number theory has no expiration date.
Prime Numbers
Every HTTPS request kicks off with a handshake where the server presents an RSA-2048 key - the product of two random 1024-bit primes. Over the past 25 years, supercomputers cracked RSA-768 (2009, 2 years of compute) and RSA-829 (2020, 2700 core-years). RSA-2048 still stands untouched. The foundation: prime numbers and their properties.
A prime is a natural number greater than 1 divisible only by 1 and itself. The **fundamental theorem of arithmetic**: every natural number greater than 1 factors into primes in exactly one way. 84 = 2 * 2 * 3 * 7 - no other way exists. Primes are the atoms of integers, and every public-key cryptosystem rides on their properties.
How many primes exist? Infinitely many - Euclid proved it over 2300 years ago. The **prime number theorem** (proved in 1896): the number of primes up to n is approximately n / ln(n). Among 1024-bit numbers, roughly every 710th is prime (ln(2^1024) ~ 710). Generating large random primes is a practical task: pick a random odd number of the right length and run Miller-Rabin.
**Miller-Rabin Test - Probabilistic Primality Test:** Deterministic testing (trial division up to sqrt(n)) is hopeless for 1024-bit numbers. Miller-Rabin steps in instead: 1. Pick random a in [2, n-2] 2. Check a specific property of a relative to n 3. Property fails - n is composite (100% certain) 4. Property holds - n is **probably** prime Each round shrinks the error probability by a factor of 4. After 40 rounds: < 2^(-80) - far below the probability of a hardware fault.
Real cryptographic libraries (OpenSSL, libsodium) layer on extra checks: the number cannot sit too close to another prime (defense against Fermat's factorization attack), and p-1 must contain a large prime factor (defense against Pollard's p-1 algorithm). Every RSA key's security rides on the quality of these numbers.
RSA-2048 uses two 1024-bit primes. Why does generating such large primes take less than a second?
Factorization
Multiplying two 1024-bit primes takes microseconds. The reverse - recovering those factors from their product - has no known efficient algorithm on classical hardware. This **asymmetry of difficulty** (easy forward, brutal in reverse) is RSA's bedrock. Not impossible - just practically infeasible.
**Pollard rho** rides the birthday paradox: the sequence $x_{i+1} = x_i^2 + 1 \bmod n$ forms a cycle, and the cycle entry point is tied to a factor of n. Complexity O(n^(1/4)) - radically faster than trial division, still exponential at RSA scale. Each jump in the progression - trial division to Pollard to GNFS - is a revolution, not an evolution.
**RSA Factoring Challenge - a competition in factorization:** RSA Laboratories published challenge numbers: - **RSA-129** (426 bits, 1977) - factored in 1994, 17 years - **RSA-768** (768 bits, 2009) - 2 years on a cluster of hundreds of machines - **RSA-250** (829 bits, 2020) - ~2700 core-years of computation - **RSA-2048** (2048 bits) - **unfactored**, prize USD 200,000 GNFS estimate for RSA-2048: ~2^112 operations. At 10^18 ops/sec that is ~10^15 years - 100,000 times the age of the universe. Factorization is not impossible - it is **practically infeasible** at sufficient key size.
Two problems must not be confused: **primality testing** (given n, is it prime?) and **factorization** (given n, what are its factors?). Primality testing sits in class P (polynomial AKS algorithm, 2002). Factorization is conjectured harder, but no formal proof rules out a polynomial algorithm. That "conjectured" is RSA's Achilles' heel: a fast factorization algorithm would topple the digital world overnight. Nobody has come close in decades of research.
RSA-250 (829 bits) was factored in 2020 using ~2700 core-years. RSA-2048 remains unbroken. Why does doubling the key length make the problem incomparably harder?
Fermat's Little Theorem and Euler's Theorem
**Fermat's Little Theorem** (1640): if p is prime and a is not divisible by p, then $a^{p-1} \equiv 1 \pmod{p}$. Example: $2^6 = 64 \equiv 1 \pmod{7}$. Or $3^4 = 81 \equiv 1 \pmod{5}$. No mathematical curiosity - this is the engine behind fast modular exponentiation, primality tests, and the RSA correctness proof. Fermat scribbled it in 1640 in a letter margin. Proof not included.
Fermat's theorem only works for prime moduli. RSA uses a composite modulus n = p*q. **Euler's theorem** generalizes to any modulus: if gcd(a, n) = 1, then $a^{\varphi(n)} \equiv 1 \pmod{n}$, where $\varphi(n)$ is Euler's totient. For a prime p: $\varphi(p) = p-1$ and Euler's theorem collapses back to Fermat's. For n = p*q: $\varphi(n) = (p-1)(q-1)$ - this value plugs straight into RSA key generation.
**Carmichael Numbers - the Fermat test trap:** Fermat test: if $a^{n-1} \not\equiv 1 \pmod{n}$, then n is definitely composite. But composite numbers exist that pass the Fermat test for EVERY a! These are **Carmichael numbers**: 561 = 3 * 11 * 17 is the smallest. For every a with gcd(a, 561) = 1, $a^{560} \equiv 1 \pmod{561}$. Hence the plain Fermat test is unreliable. Miller-Rabin replaces it - catching composite numbers (including Carmichael numbers) with probability >= 3/4 per round.
**Fast modular exponentiation** - Python's pow(a, e, n) - uses square-and-multiply, reducing $a^e \bmod n$ to $O(\log e)$ multiplications. Euler's theorem cuts the exponent down first: $a^e \bmod n = a^{e \bmod \varphi(n)} \bmod n$ when gcd(a, n) = 1. Critical for RSA where exponents run thousands of bits long.
561 = 3 * 11 * 17 passes the Fermat test for all a coprime to 561. What follows from this?
Euler's Totient Function
**Euler's totient $\varphi(n)$** counts the integers from 1 to n that are coprime to n. $\varphi(12) = 4$ because only 4 numbers in {1, 2, ..., 12} are coprime to 12: {1, 5, 7, 11}. For a prime p: $\varphi(p) = p - 1$, since every number from 1 to p-1 is coprime to p. The totient function is the bridge from Euler's theorem to RSA in practice. Without it, RSA looks like a random sequence of operations with no explanation of why encryption is invertible.
Why $\varphi(pq) = (p-1)(q-1)$? Out of n = pq numbers, subtract those NOT coprime to n. Numbers divisible by p: q of them. Numbers divisible by q: p of them. pq is divisible by both - counted twice. Inclusion-exclusion gives: $\varphi(n) = n - q - p + 1 = (p-1)(q-1)$. This formula is **critical**: with the factorization in hand, $\varphi(n)$ is a one-liner. Without it, computing $\varphi(n)$ is just as hard as factoring n.
**Can phi(n) be computed without factorization?** Given phi(n) and n, recovering p and q is straightforward: - $\varphi(n) = n - p - q + 1$, so $p + q = n - \varphi(n) + 1$ - Sum and product of p and q in hand - solve $x^2 - (p+q) \cdot x + n = 0$ So computing $\varphi(n)$ is **computationally equivalent** to factoring n. A fast way to compute $\varphi(n)$ without knowing p and q would break RSA.
**The totient function is the bridge between number theory and RSA.** No $\varphi(n)$, no private key d (d = e^(-1) mod phi(n)). Computing $\varphi(n)$ demands the factorization. Factorization is computationally infeasible. This chain of dependencies is the mathematical foundation guarding every bank transaction, email, and medical record.
Large numbers cannot be factored in principle - RSA is absolutely secure forever
A quantum computer running Shor's algorithm can theoretically factor any number in polynomial time - which is exactly why post-quantum cryptography is being developed
Shor's algorithm (1994) solves factorization in O((log n)^3) on a quantum computer. Current quantum computers are too small (thousands of logical qubits needed), but progress is happening. NIST standardized post-quantum algorithms in 2024 (CRYSTALS-Kyber, CRYSTALS-Dilithium). RSA security is not eternal - it is grounded in current technological limitations.
In RSA the public key is (e, n) where n = p*q. Why does knowing n not allow computing the private key d?
Key Takeaways
- **Primes - the atoms of arithmetic:** the fundamental theorem guarantees unique factorization, and the prime number theorem (~n/ln(n)) ensures sufficient density even among 1024-bit numbers - generation takes milliseconds thanks to Miller-Rabin
- **Factorization - RSA's computational barrier:** the best classical algorithm (GNFS) is subexponential, RSA-2048 would require ~2^112 operations. Primality testing is in class P, factorization presumably is not. This asymmetry is the foundation of public-key cryptography
- **Fermat and Euler - RSA's mathematical engine:** $a^{\varphi(n)} \equiv 1 \pmod{n}$ guarantees encryption is invertible: $m^{e \cdot d} \equiv m \pmod{n}$. Carmichael numbers are why Miller-Rabin is needed over the simple Fermat test
- **Euler's totient - the keystone:** $\varphi(pq) = (p-1)(q-1)$ is used to compute the private key $d = e^{-1} \bmod \varphi(n)$, and computing $\varphi(n)$ without factoring n is computationally equivalent to breaking RSA
Related Topics
Number theory is the mathematical foundation of asymmetric cryptography, linking abstract algebra to practical security:
- RSA Mathematics — Direct application of all four concepts: primes for key generation, Euler's totient for computing the private key, Euler's theorem to prove correctness
- Modular Arithmetic — The previous lesson laid the groundwork: mod operations, GCD, multiplicative inverse. Number theory builds prime properties and the totient on top of that foundation
- Elliptic Curve Mathematics — An RSA alternative built on a different number-theory structure - the group of points on an elliptic curve. If RSA relies on factorization hardness, ECC relies on discrete logarithm hardness
- Quantum Threat — Shor's algorithm destroys factorization hardness, reducing it to polynomial on a quantum computer
Вопросы для размышления
- RSA security rests on the assumption that factorization is computationally hard - but this is not formally proved. How should one assess the risk that someone finds a polynomial factorization algorithm?
- Miller-Rabin is probabilistic. Is it acceptable to use probabilistic algorithms in cryptography, or is only the deterministic AKS test sufficient?
- Euler's totient phi(n) encodes all information about the factorization of n. If an oracle could instantly compute phi(n), which other cryptosystems beyond RSA would be broken?
Связанные уроки
- nt-01 — Number theory is the foundation of RSA modular arithmetic
- nt-02 — Extended Euclidean algorithm computes the inverse element mod phi(n)
- sec-03 — JWT authentication relies on RSA/ECDSA signatures
- bc-03-merkle — Blockchain uses the same elliptic-curve number theory
- dm-01 — Discrete math is the shared language for Fermat's theorem and group theory
- alg-01 — Big-O analysis explains why GNFS is subexponential
- qc-01 — Shor's algorithm on a quantum computer breaks factorization in polynomial time
- nt-03