Discrete Mathematics
Combinatorics: Counting the Space of Possibilities
GPT-4 beam search at k=5: at each step, 50257 token candidates. Over 1000 tokens - (50257)^1000 possible paths. More than atoms in the universe raised to the power of 70. Combinatorial explosion is a real constraint every LLM navigates with heuristics. And an 8-character alphanumeric password gives 62^8 - roughly 2×10^14. A 20-digit PIN gives 10^20, 457 times larger. Combinatorics does not tolerate intuition.
- **Password security:** entropy is log2(number of options), computed via the product rule. 62^8 vs 10^20 - the numbers decide
- **Birthday attack on hashes:** among C(n,2) hash pairs the collision probability spikes at n ~ 2^(m/2). Why MD5 (2^64) is broken and SHA-256 (2^128) is not
- **Hyperparameter search:** 10 parameters x 10 values = 10^10 combinations. RandomSearch samples C(n,k) subspaces and outperforms GridSearch with the same budget
Предварительные знания
Product and Sum Rules: the foundation of counting
GPT-4 beam search at width k=5 considers 5 candidate tokens from a vocabulary of 50257 at each step. For a sequence of 1000 tokens - that is (50257)^1000 theoretical paths. A number so large it does not fit in standard notation. Yet the model responds within seconds. The product rule works in both directions: it explains the combinatorial explosion and how to circumvent it.
**Product rule (AND):** sequential independent choices multiply. |A × B| = |A| · |B| **Sum rule (OR):** mutually exclusive alternatives add. |A ∪ B| = |A| + |B|, when A ∩ B = ∅
Password length matters more than alphabet complexity - a direct consequence of the product rule. Adding one character multiplies the total count by the entire alphabet size. Expanding the alphabet from 62 to 72 characters at length 8 yields only (72/62)^8, roughly 3.3 times more options. Adding one character with the same alphabet multiplies by 62.
**Practical rule:** the logarithm of the number of options is its entropy in bits. 2^64, roughly 1.8 × 10^19, is the approximate lower bound for brute-force resistance at modern hardware speeds. A 12-character random password from a 62-character alphabet yields around 71 bits - sufficient for most threat models.
How many 4-digit PIN codes exist (each digit from 0 to 9)?
Permutations: n! ways to order
1654. Pascal and Fermat exchange letters about a gambling problem: how many ways can a certain sequence of outcomes occur? The first systematic counting of orderings. Three hundred and seventy years later the same formulas count the search space of neural networks, routes for TSP solvers, and permutation invariants in attention mechanisms.
**Permutation of n elements:** P(n) = n! = n · (n-1) · (n-2) · ... · 2 · 1 **k-permutation from n** (k objects from n, order matters): P(n, k) = n! / (n-k)! = n · (n-1) · ... · (n-k+1) Mnemonic: **order matters** - use permutations.
| Task | Formula | Example |
|---|---|---|
| All orderings of n elements | n! | TSP routes: (n-1)! options |
| k from n (order matters) | P(n,k) = n!/(n-k)! | Top-3 from 10: P(10,3)=720 |
| With repetition | n^k | k-digit PIN: 10^k |
| n elements with repeated groups | n!/(n1!·n2!·...) | "MISSISSIPPI": 11!/(4!·4!·2!·1!) |
**n! grows catastrophically fast.** At n=20 there are roughly 2.4 × 10^18 routes. At 10^9 operations per second that is 3.8 billion years. This is why TSP, optimal scheduling, and exhaustive hyperparameter search are all infeasible by brute force - heuristics, evolutionary algorithms, and random search exist precisely because of this.
How many ways are there to assign 3 distinct roles (tech lead, reviewer, tester) to a team of 8 people?
Combinations C(n,k): when order does not matter
Hyperparameter search is a canonical combinatorics problem. A model with 10 hyperparameters, each with 10 values - GridSearch evaluates 10^10 combinations. RandomSearch samples randomly from this space without replacement. It turns out random search finds good solutions faster on average - a mathematical fact following from the geometry of C(n,k) spaces (Bergstra and Bengio, 2012).
**Number of combinations:** C(n, k) = n! / (k! · (n-k)!) Read as "n choose k" or "binomial coefficient". Relationship to permutations: C(n, k) = P(n, k) / k! Dividing by k! because k! permutations of the same group represent one combination.
| Situation | Formula | Order matters? |
|---|---|---|
| Choose k from n, no repetition | C(n,k) | No |
| Assign k roles from n people | P(n,k) | Yes |
| k from n, repetition allowed | C(n+k-1, k) | No (stars & bars) |
| All orderings of n elements | n! | Yes |
**Pascal's triangle:** C(n,k) = C(n-1,k-1) + C(n-1,k). Each element is either included in the subset (C(n-1,k-1)) or excluded (C(n-1,k)). This recurrence is the core of dynamic programming for counting problems. The same structure appears in attention head analysis in transformer interpretability.
A team has 6 people. How many unique pairs can be formed for pair programming?
Applications: entropy, beam search, birthday attack
An 8-character alphanumeric password yields 62^8, roughly 2.18 × 10^14 options. A 20-digit PIN yields 10^20. The second is 457 times larger. Intuition says: 20 characters must be more secure. Combinatorics says: look at the base of the exponent, not just the exponent. This is exactly where most corporate password policies fail.
Data augmentation in computer vision is another combinatorial space. For a 224×224 image: random crops give C(n,k) options, horizontal flips add a factor of 2, rotations cover 360 discrete degrees, brightness shifts span a continuous range. The number of unique augmentations is effectively unbounded - which is exactly why models trained with augmentation do not overfit even after hundreds of epochs.
"Random search is worse than systematic - it might miss the best point"
For problems with many hyperparameters, RandomSearch is statistically more efficient than GridSearch at the same iteration budget
Most hyperparameters contribute little to model performance. RandomSearch covers more values of the important dimensions by sampling them independently - this is a mathematical result from Bergstra and Bengio 2012, not a heuristic.
How many ways are there to buy 4 drinks choosing from 3 types (repeats allowed)?
Key Ideas
- **Product rule:** sequential independent choices multiply. Sum rule: mutually exclusive alternatives add. This explains beam search cost and password entropy
- **Permutations:** when order matters. P(n,k) = n!/(n-k)!. n! grows faster than any exponential - TSP is infeasible by brute force beyond n=15
- **Combinations C(n,k):** when order does not matter. C(n,k) = P(n,k)/k!. Birthday attack: 50% collision probability at n ~ sqrt(2N)
- **Stars and bars C(n+k-1,k):** combinations with repetition - distribute k identical objects into n bins. Gives the DP memoization table size for resource allocation
- **RandomSearch > GridSearch** for many hyperparameters - a mathematical consequence of the geometry of combinatorial spaces, not a heuristic
Related Topics
Combinatorics underlies probability, cryptography, and algorithm analysis:
- Proof Techniques — The binomial theorem and formulas for C(n,k) are proved by induction
- Generating Functions — Ordinary generating functions encode combinatorial sequences as polynomial coefficients
- Inclusion-Exclusion Principle — PIE computes |A∪B∪...| through alternating sums of binomial coefficients
Вопросы для размышления
- Designing an API token generation system: what alphabet size and length should be chosen so that tokens are not exhausted in 10 years at 10^6 generations per day with 128-bit security?
- What is the difference between 'choose 3 features from 10 for a sprint' (combination) and 'assign priority 1, 2, 3 to three features from 10' (permutation)?
- Why does RandomSearch outperform GridSearch when hyperparameter count is large - can this be explained through C(n,k) geometry?
Связанные уроки
- prob-02-combinatorics — Probabilistic combinatorics: birthday problem, random graphs, probabilistic method
- prob-04-bayes — Bayesian updates require counting event spaces - direct application of C(n,k)
- dm-06 — Generating functions encode combinatorial sequences as polynomial coefficients
- dm-07 — Inclusion-exclusion principle is built on binomial coefficients
- calc-01-sequences — Factorial and binomial growth - limiting behavior studied via sequences
- alg-01-big-o