Discrete Mathematics

Functions: Injections, Surjections, Bijections

Every 'hash collision' error or 'duplicate key' violation in a database has mathematics behind it. SHA-256's collision resistance does not come from clever code: it comes from a codomain so vast that the pigeonhole guarantee only kicks in after an astronomically large number of attempts.

**Hash tables (HashMap):** open addressing and chaining handle unavoidable collisions once load factor exceeds ~0.7
**Cryptography (AES, RSA):** encryption functions must be bijections; without bijectivity, decryption is mathematically undefined
**Databases (UNIQUE constraint):** schema-level enforcement of injectivity for selected columns
**Birthday attacks on passwords:** without salting, identical passwords hash identically, making all accounts vulnerable simultaneously

Предварительные знания

Relations and Partial Orders

Functions: Domain, Codomain, Image

A function f: A → B is a binary relation on A × B such that every element a ∈ A is paired with exactly one element f(a) ∈ B. Three distinct concepts must not be conflated: the domain A (where inputs come from), the codomain B (the declared output type), and the image f(A) = {f(a) | a ∈ A} (where outputs actually land).

**Three separate notions:** - **Domain:** the set of valid inputs-A - **Codomain:** the declared output set-B (like a function's return type) - **Image (range):** the set of values actually produced-f(A) ⊆ B, possibly a strict subset

The distinction between image and codomain matters in type systems. A TypeScript function returning number does not return every possible number: only a subset. Dependent types (in Idris or Agda) allow specifying the exact image in the signature, turning a runtime invariant into a compile-time guarantee.

Two functions f, g: A → B are equal (f = g) when f(a) = g(a) for all a ∈ A, regardless of their implementation. But injectivity and surjectivity depend on the codomain, so the codomain must be stated explicitly.

For f: ℝ → ℝ, f(x) = x², what is the image of f?

Injections, Surjections, and Bijections

These three kinds of functions differ in how they 'cover' the codomain and how 'uniquely' they map the domain. They determine invertibility and allow comparison of infinite set sizes.

**Three definitions:** - **Injection (one-to-one):** f(a) = f(b) ⟹ a = b. Distinct inputs produce distinct outputs: no collisions. - **Surjection (onto):** ∀b ∈ B ∃a ∈ A: f(a) = b. Every codomain element is reached. - **Bijection:** both injection and surjection simultaneously. A perfect one-to-one correspondence.

Property	Condition	Image vs Codomain	Invertibility
Injection	f(a)=f(b) ⟹ a=b	Image ⊆ Codomain	Left inverse exists
Surjection	∀b ∃a: f(a)=b	Image = Codomain	Right inverse exists
Bijection	both hold	Image = Codomain, no repeats	Full inverse f⁻¹ exists

Hash functions ideally should be injective (no collisions), but for a finite codomain ({0..2³²-1}) and an infinite domain (all strings), an injection is mathematically impossible. MD5 and SHA-256 are not injections; their collision resistance is a computational approximation of injectivity.

Pigeonhole principle: if |A| > |B| and f: A → B, then f cannot be injective. A hash table with 1000 buckets and 1001 keys is guaranteed to have a collision, regardless of how good the hash function is.

AES encryption maps {0,1}¹²⁸ to {0,1}¹²⁸ for a fixed key. Which property must it satisfy?

Composition and Inverse Functions

Given f: A → B and g: B → C, their composition (g ∘ f): A → C is defined by (g ∘ f)(a) = g(f(a)). Apply f first, then g. An inverse function f⁻¹: B → A exists if and only if f is a bijection.

**Composition properties:** - If f and g are both injective, so is g ∘ f - If f and g are both surjective, so is g ∘ f - If f and g are both bijective, so is g ∘ f, and (g ∘ f)⁻¹ = f⁻¹ ∘ g⁻¹

In category theory, functions are morphisms and composition is the primary operation. Functors, monads (Promise, Maybe, IO) generalize functions while preserving compositional structure. The identity (g ∘ f)⁻¹ = f⁻¹ ∘ g⁻¹ is used in cryptography to invert compound ciphers.

f: A → B is injective. g: B → C is surjective. What can be said about g ∘ f?

The Pigeonhole Principle

The pigeonhole principle: if n+1 objects are placed into n boxes, at least one box contains two or more objects. Formally: if |A| > |B|, there is no injection f: A → B.

**Generalized pigeonhole:** if kn+r objects (0 < r ≤ n) are placed into n boxes, at least one box contains at least k+1 objects. This gives a lower bound on the maximum load of any box.

The birthday paradox: among 23 people, the probability that two share a birthday exceeds 50%. This is the pigeonhole principle applied probabilistically. Without password salting, birthday attacks work against an entire database at once, since unsalted identical passwords produce identical hashes.

LeetCode 287 'Find the Duplicate Number': the existence of a duplicate in an array of n+1 numbers from [1..n] is proven by pigeonhole. Finding it in O(1) space uses Floyd's cycle detection on the function graph, but the existence proof is pure discrete math.

A hash function returns a 32-bit value (2³² ≈ 4 billion possible outputs). How many files must be hashed to guarantee a collision?

Key Ideas

**Function f: A → B:** a single-valued mapping; domain, codomain, and image are three distinct things
**Injection** (no collisions): f(a)=f(b) ⟹ a=b; impossible when |A| > |B|
**Surjection** (onto): image = codomain; every element of B is reached
**Bijection** = injection + surjection: f⁻¹ exists, |A| = |B|; required for encryption
**Pigeonhole principle:** |A| > |B| guarantees a collision for any f: A → B

Вопросы для размышления

Python's built-in hash() is not injective by design, and its values change between interpreter restarts (PYTHONHASHSEED). How does this affect practical collision resistance, and why is this intentional?
TypeScript's return type number describes the codomain but not the image. How do branded types or newtype patterns (e.g., UserId instead of number) let you narrow the effective codomain? Give a concrete example.
OAuth access token generation must behave like an injection. What happens if two users receive the same token? Which techniques guarantee uniqueness at scale?

Связанные уроки

plt-06-lambda-calculus