Cryptography

Number Theory for Cryptography

Цели урока

  • Understand how the prime number theorem makes RSA key generation practical
  • Explain the asymmetry of difficulty between multiplication and factorization that RSA relies on
  • Apply Fermat's and Euler's theorems to modular arithmetic computations
  • Understand the role of Euler's totient in RSA key generation and why computing it requires factorization

Предварительные знания

  • Modular Arithmetic

2048 bits. That's the RSA key guarding an HTTPS connection right now. RSA-829 (829 bits) fell in 2020 after 2700 core-years of compute. RSA-2048 still stands. Between them sits no linear gap but a subexponential chasm in difficulty. The math that makes this possible boils down to one equation: $n = p \cdot q$.

  • **RSA key generation** - every time a browser establishes an HTTPS connection, the server presents a certificate with an RSA key built from two large primes. The prime number theorem guarantees generation takes milliseconds
  • **Cryptocurrencies** - Bitcoin and Ethereum use elliptic-curve cryptography whose security relies on analogous number-theory problems. Euler's totient and Fermat's theorem underlie finite-field arithmetic
  • **Post-quantum cryptography** - NIST standardized CRYSTALS-Kyber and CRYSTALS-Dilithium in 2024 because Shor's algorithm on a quantum computer breaks factorization in polynomial time

Rivest, Shamir, Adleman - and a letter from the past

In 1977 three MIT scientists published the RSA algorithm. Rivest cracked the math in a single night after a party. Shamir and Adleman wrestled with the proofs. The paper was ready inside a week. Three centuries earlier - in 1640 - Pierre de Fermat wrote his theorem in a letter margin without a proof. Euler proved it a hundred years later. RSA put both to work 337 years after Fermat. Number theory has no expiration date.

Prime Numbers

Every HTTPS request kicks off with a handshake where the server presents an RSA-2048 key - the product of two random 1024-bit primes. Over the past 25 years, supercomputers cracked RSA-768 (2009, 2 years of compute) and RSA-829 (2020, 2700 core-years). RSA-2048 still stands untouched. The foundation: prime numbers and their properties.

A prime is a natural number greater than 1 divisible only by 1 and itself. The **fundamental theorem of arithmetic**: every natural number greater than 1 factors into primes in exactly one way. 84 = 2 * 2 * 3 * 7 - no other way exists. Primes are the atoms of integers, and every public-key cryptosystem rides on their properties.

How many primes exist? Infinitely many - Euclid proved it over 2300 years ago. The **prime number theorem** (proved in 1896): the number of primes up to n is approximately n / ln(n). Among 1024-bit numbers, roughly every 710th is prime (ln(2^1024) ~ 710). Generating large random primes is a practical task: pick a random odd number of the right length and run Miller-Rabin.

**Miller-Rabin Test - Probabilistic Primality Test:** Deterministic testing (trial division up to sqrt(n)) is hopeless for 1024-bit numbers. Miller-Rabin steps in instead: 1. Pick random a in [2, n-2] 2. Check a specific property of a relative to n 3. Property fails - n is composite (100% certain) 4. Property holds - n is **probably** prime Each round shrinks the error probability by a factor of 4. After 40 rounds: < 2^(-80) - far below the probability of a hardware fault.

Real cryptographic libraries (OpenSSL, libsodium) layer on extra checks: the number cannot sit too close to another prime (defense against Fermat's factorization attack), and p-1 must contain a large prime factor (defense against Pollard's p-1 algorithm). Every RSA key's security rides on the quality of these numbers.

RSA-2048 uses two 1024-bit primes. Why does generating such large primes take less than a second?

Factorization

Multiplying two 1024-bit primes takes microseconds. The reverse - recovering those factors from their product - has no known efficient algorithm on classical hardware. This **asymmetry of difficulty** (easy forward, brutal in reverse) is RSA's bedrock. Not impossible - just practically infeasible.

**Pollard rho** rides the birthday paradox: the sequence $x_{i+1} = x_i^2 + 1 \bmod n$ forms a cycle, and the cycle entry point is tied to a factor of n. Complexity O(n^(1/4)) - radically faster than trial division, still exponential at RSA scale. Each jump in the progression - trial division to Pollard to GNFS - is a revolution, not an evolution.

**RSA Factoring Challenge - a competition in factorization:** RSA Laboratories published challenge numbers: - **RSA-129** (426 bits, 1977) - factored in 1994, 17 years - **RSA-768** (768 bits, 2009) - 2 years on a cluster of hundreds of machines - **RSA-250** (829 bits, 2020) - ~2700 core-years of computation - **RSA-2048** (2048 bits) - **unfactored**, prize USD 200,000 GNFS estimate for RSA-2048: ~2^112 operations. At 10^18 ops/sec that is ~10^15 years - 100,000 times the age of the universe. Factorization is not impossible - it is **practically infeasible** at sufficient key size.

Two problems must not be confused: **primality testing** (given n, is it prime?) and **factorization** (given n, what are its factors?). Primality testing sits in class P (polynomial AKS algorithm, 2002). Factorization is conjectured harder, but no formal proof rules out a polynomial algorithm. That "conjectured" is RSA's Achilles' heel: a fast factorization algorithm would topple the digital world overnight. Nobody has come close in decades of research.

RSA-250 (829 bits) was factored in 2020 using ~2700 core-years. RSA-2048 remains unbroken. Why does doubling the key length make the problem incomparably harder?

Fermat's Little Theorem and Euler's Theorem

**Fermat's Little Theorem** (1640): if p is prime and a is not divisible by p, then $a^{p-1} \equiv 1 \pmod{p}$. Example: $2^6 = 64 \equiv 1 \pmod{7}$. Or $3^4 = 81 \equiv 1 \pmod{5}$. No mathematical curiosity - this is the engine behind fast modular exponentiation, primality tests, and the RSA correctness proof. Fermat scribbled it in 1640 in a letter margin. Proof not included.

Fermat's theorem only works for prime moduli. RSA uses a composite modulus n = p*q. **Euler's theorem** generalizes to any modulus: if gcd(a, n) = 1, then $a^{\varphi(n)} \equiv 1 \pmod{n}$, where $\varphi(n)$ is Euler's totient. For a prime p: $\varphi(p) = p-1$ and Euler's theorem collapses back to Fermat's. For n = p*q: $\varphi(n) = (p-1)(q-1)$ - this value plugs straight into RSA key generation.

**Carmichael Numbers - the Fermat test trap:** Fermat test: if $a^{n-1} \not\equiv 1 \pmod{n}$, then n is definitely composite. But composite numbers exist that pass the Fermat test for EVERY a! These are **Carmichael numbers**: 561 = 3 * 11 * 17 is the smallest. For every a with gcd(a, 561) = 1, $a^{560} \equiv 1 \pmod{561}$. Hence the plain Fermat test is unreliable. Miller-Rabin replaces it - catching composite numbers (including Carmichael numbers) with probability >= 3/4 per round.

**Fast modular exponentiation** - Python's pow(a, e, n) - uses square-and-multiply, reducing $a^e \bmod n$ to $O(\log e)$ multiplications. Euler's theorem cuts the exponent down first: $a^e \bmod n = a^{e \bmod \varphi(n)} \bmod n$ when gcd(a, n) = 1. Critical for RSA where exponents run thousands of bits long.

561 = 3 * 11 * 17 passes the Fermat test for all a coprime to 561. What follows from this?

Euler's Totient Function

**Euler's totient $\varphi(n)$** counts the integers from 1 to n that are coprime to n. $\varphi(12) = 4$ because only 4 numbers in {1, 2, ..., 12} are coprime to 12: {1, 5, 7, 11}. For a prime p: $\varphi(p) = p - 1$, since every number from 1 to p-1 is coprime to p. The totient function is the bridge from Euler's theorem to RSA in practice. Without it, RSA looks like a random sequence of operations with no explanation of why encryption is invertible.

Why $\varphi(pq) = (p-1)(q-1)$? Out of n = pq numbers, subtract those NOT coprime to n. Numbers divisible by p: q of them. Numbers divisible by q: p of them. pq is divisible by both - counted twice. Inclusion-exclusion gives: $\varphi(n) = n - q - p + 1 = (p-1)(q-1)$. This formula is **critical**: with the factorization in hand, $\varphi(n)$ is a one-liner. Without it, computing $\varphi(n)$ is just as hard as factoring n.

**Can phi(n) be computed without factorization?** Given phi(n) and n, recovering p and q is straightforward: - $\varphi(n) = n - p - q + 1$, so $p + q = n - \varphi(n) + 1$ - Sum and product of p and q in hand - solve $x^2 - (p+q) \cdot x + n = 0$ So computing $\varphi(n)$ is **computationally equivalent** to factoring n. A fast way to compute $\varphi(n)$ without knowing p and q would break RSA.

**The totient function is the bridge between number theory and RSA.** No $\varphi(n)$, no private key d (d = e^(-1) mod phi(n)). Computing $\varphi(n)$ demands the factorization. Factorization is computationally infeasible. This chain of dependencies is the mathematical foundation guarding every bank transaction, email, and medical record.

Large numbers cannot be factored in principle - RSA is absolutely secure forever

A quantum computer running Shor's algorithm can theoretically factor any number in polynomial time - which is exactly why post-quantum cryptography is being developed

Shor's algorithm (1994) solves factorization in O((log n)^3) on a quantum computer. Current quantum computers are too small (thousands of logical qubits needed), but progress is happening. NIST standardized post-quantum algorithms in 2024 (CRYSTALS-Kyber, CRYSTALS-Dilithium). RSA security is not eternal - it is grounded in current technological limitations.

In RSA the public key is (e, n) where n = p*q. Why does knowing n not allow computing the private key d?

Key Takeaways

  • **Primes - the atoms of arithmetic:** the fundamental theorem guarantees unique factorization, and the prime number theorem (~n/ln(n)) ensures sufficient density even among 1024-bit numbers - generation takes milliseconds thanks to Miller-Rabin
  • **Factorization - RSA's computational barrier:** the best classical algorithm (GNFS) is subexponential, RSA-2048 would require ~2^112 operations. Primality testing is in class P, factorization presumably is not. This asymmetry is the foundation of public-key cryptography
  • **Fermat and Euler - RSA's mathematical engine:** $a^{\varphi(n)} \equiv 1 \pmod{n}$ guarantees encryption is invertible: $m^{e \cdot d} \equiv m \pmod{n}$. Carmichael numbers are why Miller-Rabin is needed over the simple Fermat test
  • **Euler's totient - the keystone:** $\varphi(pq) = (p-1)(q-1)$ is used to compute the private key $d = e^{-1} \bmod \varphi(n)$, and computing $\varphi(n)$ without factoring n is computationally equivalent to breaking RSA

Related Topics

Number theory is the mathematical foundation of asymmetric cryptography, linking abstract algebra to practical security:

  • RSA Mathematics — Direct application of all four concepts: primes for key generation, Euler's totient for computing the private key, Euler's theorem to prove correctness
  • Modular Arithmetic — The previous lesson laid the groundwork: mod operations, GCD, multiplicative inverse. Number theory builds prime properties and the totient on top of that foundation
  • Elliptic Curve Mathematics — An RSA alternative built on a different number-theory structure - the group of points on an elliptic curve. If RSA relies on factorization hardness, ECC relies on discrete logarithm hardness
  • Quantum Threat — Shor's algorithm destroys factorization hardness, reducing it to polynomial on a quantum computer

Вопросы для размышления

  • RSA security rests on the assumption that factorization is computationally hard - but this is not formally proved. How should one assess the risk that someone finds a polynomial factorization algorithm?
  • Miller-Rabin is probabilistic. Is it acceptable to use probabilistic algorithms in cryptography, or is only the deterministic AKS test sufficient?
  • Euler's totient phi(n) encodes all information about the factorization of n. If an oracle could instantly compute phi(n), which other cryptosystems beyond RSA would be broken?

Связанные уроки

  • nt-01 — Number theory is the foundation of RSA modular arithmetic
  • nt-02 — Extended Euclidean algorithm computes the inverse element mod phi(n)
  • sec-03 — JWT authentication relies on RSA/ECDSA signatures
  • bc-03-merkle — Blockchain uses the same elliptic-curve number theory
  • dm-01 — Discrete math is the shared language for Fermat's theorem and group theory
  • alg-01 — Big-O analysis explains why GNFS is subexponential
  • qc-01 — Shor's algorithm on a quantum computer breaks factorization in polynomial time
  • nt-03
Number Theory for Cryptography

0

1

Sign In