Discrete Mathematics
Functions: Injections, Surjections, Bijections
Every 'hash collision' error or 'duplicate key' violation in a database has mathematics behind it. SHA-256's collision resistance does not come from clever code: it comes from a codomain so vast that the pigeonhole guarantee only kicks in after an astronomically large number of attempts.
- **Hash tables (HashMap):** open addressing and chaining handle unavoidable collisions once load factor exceeds ~0.7
- **Cryptography (AES, RSA):** encryption functions must be bijections; without bijectivity, decryption is mathematically undefined
- **Databases (UNIQUE constraint):** schema-level enforcement of injectivity for selected columns
- **Birthday attacks on passwords:** without salting, identical passwords hash identically, making all accounts vulnerable simultaneously
Предварительные знания
Functions: Domain, Codomain, Image
A function f: A → B is a binary relation on A × B such that every element a ∈ A is paired with exactly one element f(a) ∈ B. Three distinct concepts must not be conflated: the domain A (where inputs come from), the codomain B (the declared output type), and the image f(A) = {f(a) | a ∈ A} (where outputs actually land).
**Three separate notions:** - **Domain:** the set of valid inputs-A - **Codomain:** the declared output set-B (like a function's return type) - **Image (range):** the set of values actually produced-f(A) ⊆ B, possibly a strict subset
The distinction between image and codomain matters in type systems. A TypeScript function returning number does not return every possible number: only a subset. Dependent types (in Idris or Agda) allow specifying the exact image in the signature, turning a runtime invariant into a compile-time guarantee.
Two functions f, g: A → B are equal (f = g) when f(a) = g(a) for all a ∈ A, regardless of their implementation. But injectivity and surjectivity depend on the codomain, so the codomain must be stated explicitly.
For f: ℝ → ℝ, f(x) = x², what is the image of f?
Injections, Surjections, and Bijections
These three kinds of functions differ in how they 'cover' the codomain and how 'uniquely' they map the domain. They determine invertibility and allow comparison of infinite set sizes.
**Three definitions:** - **Injection (one-to-one):** f(a) = f(b) ⟹ a = b. Distinct inputs produce distinct outputs: no collisions. - **Surjection (onto):** ∀b ∈ B ∃a ∈ A: f(a) = b. Every codomain element is reached. - **Bijection:** both injection and surjection simultaneously. A perfect one-to-one correspondence.
| Property | Condition | Image vs Codomain | Invertibility |
|---|---|---|---|
| Injection | f(a)=f(b) ⟹ a=b | Image ⊆ Codomain | Left inverse exists |
| Surjection | ∀b ∃a: f(a)=b | Image = Codomain | Right inverse exists |
| Bijection | both hold | Image = Codomain, no repeats | Full inverse f⁻¹ exists |
Hash functions ideally should be injective (no collisions), but for a finite codomain ({0..2³²-1}) and an infinite domain (all strings), an injection is mathematically impossible. MD5 and SHA-256 are not injections; their collision resistance is a computational approximation of injectivity.
Pigeonhole principle: if |A| > |B| and f: A → B, then f cannot be injective. A hash table with 1000 buckets and 1001 keys is guaranteed to have a collision, regardless of how good the hash function is.
AES encryption maps {0,1}¹²⁸ to {0,1}¹²⁸ for a fixed key. Which property must it satisfy?
Composition and Inverse Functions
Given f: A → B and g: B → C, their composition (g ∘ f): A → C is defined by (g ∘ f)(a) = g(f(a)). Apply f first, then g. An inverse function f⁻¹: B → A exists if and only if f is a bijection.
**Composition properties:** - If f and g are both injective, so is g ∘ f - If f and g are both surjective, so is g ∘ f - If f and g are both bijective, so is g ∘ f, and (g ∘ f)⁻¹ = f⁻¹ ∘ g⁻¹
In category theory, functions are morphisms and composition is the primary operation. Functors, monads (Promise, Maybe, IO) generalize functions while preserving compositional structure. The identity (g ∘ f)⁻¹ = f⁻¹ ∘ g⁻¹ is used in cryptography to invert compound ciphers.
f: A → B is injective. g: B → C is surjective. What can be said about g ∘ f?
The Pigeonhole Principle
The pigeonhole principle: if n+1 objects are placed into n boxes, at least one box contains two or more objects. Formally: if |A| > |B|, there is no injection f: A → B.
**Generalized pigeonhole:** if kn+r objects (0 < r ≤ n) are placed into n boxes, at least one box contains at least k+1 objects. This gives a lower bound on the maximum load of any box.
The birthday paradox: among 23 people, the probability that two share a birthday exceeds 50%. This is the pigeonhole principle applied probabilistically. Without password salting, birthday attacks work against an entire database at once, since unsalted identical passwords produce identical hashes.
LeetCode 287 'Find the Duplicate Number': the existence of a duplicate in an array of n+1 numbers from [1..n] is proven by pigeonhole. Finding it in O(1) space uses Floyd's cycle detection on the function graph, but the existence proof is pure discrete math.
A hash function returns a 32-bit value (2³² ≈ 4 billion possible outputs). How many files must be hashed to guarantee a collision?
Key Ideas
- **Function f: A → B:** a single-valued mapping; domain, codomain, and image are three distinct things
- **Injection** (no collisions): f(a)=f(b) ⟹ a=b; impossible when |A| > |B|
- **Surjection** (onto): image = codomain; every element of B is reached
- **Bijection** = injection + surjection: f⁻¹ exists, |A| = |B|; required for encryption
- **Pigeonhole principle:** |A| > |B| guarantees a collision for any f: A → B
Related Topics
Functions connect sets and underlie most mathematical structures in CS:
- Relations — A function is a binary relation with the extra uniqueness condition: each input maps to exactly one output
- Modular Arithmetic — RSA uses bijective functions in the ring ℤₙ; hash tables use x mod n as a surjection onto bucket indices
- Combinatorics — Counting injections gives permutations P(n,k); counting bijections gives n!; counting surjections uses Stirling numbers
Вопросы для размышления
- Python's built-in hash() is not injective by design, and its values change between interpreter restarts (PYTHONHASHSEED). How does this affect practical collision resistance, and why is this intentional?
- TypeScript's return type number describes the codomain but not the image. How do branded types or newtype patterns (e.g., UserId instead of number) let you narrow the effective codomain? Give a concrete example.
- OAuth access token generation must behave like an injection. What happens if two users receive the same token? Which techniques guarantee uniqueness at scale?