Probability Theory
Combinatorics: the art of counting
Цели урока
- Master two golden rules: multiplication (AND) and addition (OR)
- Understand the difference between permutations, arrangements, and combinations
- Learn to choose the right formula for each situation
- Apply combinatorics to compute probabilities
Предварительные знания
- The concept of probability and sample space
- Factorial: $n! = 1 \cdot 2 \cdot 3 \cdot \ldots \cdot n$
GPT-4 picks the next token from a vocabulary of 50,257 options. One inference step - thousands of such choices in sequence. The number of possible responses to a single prompt exceeds the number of atoms in the observable universe. Yet the model finds a good answer in milliseconds - because probability collapses this explosion into something tractable. But to collapse it, the explosion must first be counted. Combinatorics is the counting.
- **LLM vocabulary:** GPT-4 vocabulary is 50,257 tokens. Beam search with width 10 explores $50257^{10}$ paths - combinatorics explains why pruning is non-negotiable
- **BPE tokenization:** Byte Pair Encoding builds its vocabulary by merging the most frequent pairs - $C_n^2$ candidates per iteration, classic combination counting
- **Passwords:** 8 characters from 62 (a-z, A-Z, 0-9) = $62^8 \approx 218$ trillion combinations - arrangement with repetitions
- **Lottery "6 from 49":** $C_{49}^6 = 13\,983\,816$ combinations. Jackpot probability: $0.000007\%$. This is why people don't win
- **System reliability:** the probability of exactly $k$ failures among $n$ independent components is computed using $C_n^k$ - the formula behind every SLA in AWS and Azure
From magic squares to Big Data
Combinatorics was studied as far back as ancient China (magic squares) and India. But its systematic treatment was given by Jacob Bernoulli in "Ars Conjectandi" (1713) - the very book where probability theory was born. The irony: he died 8 years before publication and never saw how his work changed mathematics. Today combinatorics is the foundation of cryptography, machine learning, and data analysis.
Combinatorics: the art of counting
**The handshake problem:** 30 guests at a party, each shakes hands with everyone else exactly once. How many handshakes? First instinct: $30 \times 30 = 900$. Wrong. Why - becomes clear in a few minutes.
Combinatorics is the mathematics of counting possibilities. The number of paths in beam search, the number of neural network configurations, the number of hash collisions - all of it is combinatorics. Two rules build everything else.
Why does the handshake problem with 30 guests give $\binom{30}{2}=435$ handshakes, not $30\times 30 = 900$?
$30\times 30$ counts ordered pairs and self-handshakes. The real count: $\binom{30}{2} = \frac{30\cdot 29}{2} = 435$ - choose 2 people from 30 without order.
Two golden rules: "AND" vs "OR"
Two golden rules: "AND" vs "OR"
Before formulas - two fundamental rules. They explain why some problems are solved by multiplying and others by adding, and why confusing the two gives answers off by a factor of thousands.
✖️ Multiplication rule: when "AND"
If action A **AND** action B both must be performed, and A can be done in $m$ ways while B can be done in $n$ ways, then together: $m \times n$ ways.
Morning outfit
Choosing clothes
5 shirts **AND** 3 pairs of trousers. How many outfits? $5 \times 3 = 15$ outfits Why multiply? For each shirt all 3 trouser options are available - the choices are independent. Add 4 pairs of shoes: $5 \times 3 \times 4 = 60$ outfits. The same rule governs the number of neural network configurations: if layer 1 has $m$ possible states and layer 2 has $n$, a two-layer network has $m \times n$ distinct configurations.
➕ Addition rule: when "OR"
If action A **OR** action B (not both!) is to be chosen, and A can be done in $m$ ways while B can be done in $n$ ways, then: $m + n$ ways.
Getting to the office
Alternatives
3 buses **OR** 2 trolleybuses go to the office. $3 + 2 = 5$ ways **Why add?** Because these are *alternatives* - one option is taken or the other, not both at once.
"AND" in the problem - multiply. "OR" - add. Works in 90% of cases. The remaining 10% is when alternatives overlap, which calls for the inclusion-exclusion formula from lesson 01.
A menu has 4 salads, 5 main courses, and 3 desserts. How many ways are there to order a meal of salad, main AND dessert?
The key word is "AND". A salad AND a main AND a dessert must each be chosen. Each choice is independent → multiply.
Permutations: arranging EVERYTHING
Permutations: arranging EVERYTHING
A **permutation** is when we take ALL objects and arrange them in a specific order.
Consider a queue of $n$ people. How many ways can they be arranged?
**Why factorial?** For the first position - $n$ choices. For the second - $(n-1)$ (one is already taken). For the third - $(n-2)$. And so on.
A photo of friends
4 people in a row
Ann, Bob, Carl, Dana stand in a row for a photo. $P_4 = 4! = 4 \times 3 \times 2 \times 1 = 24$ ways Logic: - Any of the 4 can be first - Any of the remaining 3 can be second - Any of the remaining 2 can be third - The last one is the only one left
$5! = 120$ $10! = 3\,628\,800$ (3.6 million) $20! \approx 2.4 \times 10^{18}$ - more than the number of seconds since the Big Bang $52!$ (a deck of cards) $\approx 8 \times 10^{67}$ - more than atoms in the observable Universe This is precisely why brute-forcing all token sequences in an LLM is hopeless - the space $50257^n$ explodes faster than any computer could handle. Beam search, temperature sampling, nucleus sampling - all are strategies for navigating this explosion without enumerating every path.
Permutations with repetitions
What if some objects are **identical**? Then permuting identical objects among themselves doesn't produce a new result.
Anagrams of the word "ABBA"
How many distinct permutations?
The word "ABBA" - 4 letters: A (2 times), B (2 times) If all were distinct: $4! = 24$ But the two A's can be swapped in $2!$ ways, and the two B's also in $2!$ ways - this doesn't change the word! $P = \frac{4!}{2! \cdot 2!} = \frac{24}{4} = 6$ Verification: AABB, ABAB, ABBA, BAAB, BABA, BBAA - exactly 6!
Spotify wants to shuffle a playlist of 10 tracks. How many orderings exist?
This is a classic permutation: arrange all 10 tracks in a specific order. First track - 10 options, second - 9, and so on. Total $10!$
Arrangements: choosing PART with order
Arrangements: choosing PART with order
An **arrangement** is when we choose $k$ objects from $n$, and **order matters**.
Medal standings
A race of 10 athletes
How many ways can gold, silver, and bronze be awarded among 10 athletes? $A_{10}^3 = 10 \times 9 \times 8 = 720$ ways **Order matters!** "Smith-gold, Jones-silver" ≠ "Jones-gold, Smith-silver"
Arrangements WITH repetitions
If elements can be **reused** (digits in a PIN, characters in a password):
4-digit PIN
How many combinations?
Digits: 0-9 (n = 10), length: 4 (k = 4) $\bar{A}_{10}^4 = 10^4 = 10\,000$ codes From 0000 to 9999 - exactly 10,000 options. A GPU exhausts the entire space in under a millisecond. That is why a 4-digit PIN is a speed bump, not a lock. Eight characters from 62 symbols - $62^8 \approx 218$ trillion - is a different universe entirely.
"Arrangements and permutations are the same thing"
A permutation is an arrangement of ALL elements ($k = n$)
Permutation $P_n = n!$ is the special case of arrangement when $k = n$: $A_n^n = n!/(n-n)! = n!/0! = n!$. When $k < n$ it is an arrangement. The distinction matters in practice: beam search selects $k$ tokens from a vocabulary of size $n$ in order - that is $A_n^k$, not $n!$.
A password of 8 characters (a-z, A-Z, 0-9 = 62 symbols), repetitions allowed. How many options?
Arrangement with repetitions: each of the 8 positions has 62 options. Total $62^8$ ≈ 218 trillion. That's why an 8-character password is considered secure!
Combinations: order does NOT matter
Combinations: order does NOT matter
A **combination** is when we choose $k$ objects from $n$, and **order does NOT matter** (only the composition).
Lottery "6 from 49"
Chance of hitting the jackpot
Six numbers must be guessed out of 49. Order doesn't matter - only which ones came up. $C_{49}^6 = \frac{49!}{6! \cdot 43!} = \frac{49 \times 48 \times 47 \times 46 \times 45 \times 44}{720} = 13\,983\,816$ **Jackpot probability:** $\frac{1}{13\,983\,816} \approx 0.000007\%$ To guarantee a win, 14 million tickets would be required!
Project team
Choosing 3 from 10
How many ways can a team of 3 be chosen from a group of 10? $C_{10}^3 = \frac{10!}{3! \cdot 7!} = \frac{10 \times 9 \times 8}{6} = 120$ ways The team has no roles - {Ann, Bob, Carl} = {Carl, Ann, Bob}.
$C_n^k = C_n^{n-k}$ Why? Choosing $k$ elements to **take** is the same as choosing $n-k$ elements to **leave**. $C_{10}^3 = C_{10}^7 = 120$ $C_{100}^{98} = C_{100}^2 = 4950$ **Quick tip:** If $k$ is close to $n$, compute via $n-k$!
"If I'm not sure, I'll use combinations - the formula is simpler"
Combinations vs arrangements depends ONLY on whether order matters
One question settles it: if the chosen elements are rearranged, does the result change? YES (medals, positions, tokens in a sentence) - arrangement. NO (lottery, team without roles, set of features) - combination. Naive Bayes treats words in an email as a combination: order is irrelevant, only which words appear matters.
We are choosing a president, vice-president, and treasurer from 10 people. Which formula to use?
Order MATTERS! President ≠ treasurer. The same 3 people can fill the positions in different ways. Therefore arrangement, not combination.
📋 How to choose the formula?
📋 How to choose the formula?
**Step 1:** Does order matter? - YES → arrangement or permutation - NO → combination **Step 2:** Are repetitions allowed? - YES → with repetitions ($n^k$) - NO → without repetitions **Step 3:** Taking all or part? - ALL → permutation $n!$ - PART → arrangement $A_n^k$ or combination $C_n^k$
| What we do | Formula | Example |
|---|---|---|
| All in a row | $P_n = n!$ | Queue, playlist shuffle |
| k from n, order matters | $A_n^k$ | Medals, positions |
| k from n, with repetitions | $n^k$ | PIN, password |
| k from n, order doesn't matter | $C_n^k$ | Lottery, team |
Back to handshakes: 30 guests, each shakes hands with everyone else. How many handshakes?
A handshake is a pair of people. Order does NOT matter: "Ann-Bob" = "Bob-Ann". Therefore combination! $C_{30}^2 = \frac{30 \times 29}{2} = 435$. The mistake "$30 \times 30$" counts each handshake twice and includes handshakes with oneself.
Combinatorics in probability
Combinatorics in probability
Classical probability: $P(A) = \frac{|A|}{|\Omega|}$. Elegant - but how are $|A|$ and $|\Omega|$ counted when there are millions of outcomes? That is where combinatorics takes over. Without it, the formula handles little more than "draw a ball from an urn".
Poker: probability of a pair
One of the most common hands
A pair - exactly two cards of the same rank (two aces, two sevens, etc.) **Favorable outcomes:** 1. Choose the rank of the pair: $C_{13}^1 = 13$ 2. Choose 2 suits from 4: $C_4^2 = 6$ 3. Choose 3 ranks for the remaining cards: $C_{12}^3 = 220$ 4. For each - 1 suit from 4: $4^3 = 64$ Total: $13 \times 6 \times 220 \times 64 = 1\,098\,240$ **Total 5-card hands:** $C_{52}^5 = 2\,598\,960$ $P(\text{pair}) = \frac{1\,098\,240}{2\,598\,960} \approx 42.3\%$ A pair is the most common non-trivial hand!
Computing the probability of a pair in a random 5-card hand, which expression corresponds to the numerator $|A|$?
The numerator $|A|$ counts favorable hands: fix the pair's rank, choose 2 of 4 suits, then add 3 distinct ranks each with one of 4 suits. $C_{52}^5$ is the denominator $|\Omega|$.
Practice
Practice
How many ways are there to choose a class president and a deputy from a group of 25 people?
$A_{25}^2 = 25 \times 24 = 600$ ways
An urn contains 7 white and 5 black balls. 3 balls are drawn simultaneously. What is the probability that all three are white?
Favorable (3 white from 7): $C_7^3 = 35$ Total (any 3 from 12): $C_{12}^3 = 220$ $P = \frac{35}{220} = \frac{7}{44} \approx 15.9\%$
A license plate has 3 letters (from 12 available) + 3 digits. Letters and digits can repeat. How many plates are possible?
Letters: $12^3 = 1728$ options Digits: $10^3 = 1000$ options Total: $1728 \times 1000 = 1\,728\,000$ plates
How many ways can 8 rooks be placed on a chessboard so that no rook attacks another?
One rook on each row, one on each column. First rook: 8 columns Second: 7 (one taken) Third: 6 ...and so on $8! = 40\,320$ ways *This is a classic problem equivalent to permutations of numbers 1-8!*
A lottery requires guessing 6 numbers out of 49. What is the probability of hitting the jackpot by matching all 6?
Order of numbers does not matter and repeats are forbidden, so combinations apply. One favorable outcome out of $C_{49}^6 = 13{,}983{,}816$ total. Arrangements inflate the denominator by $6!$, and $49^6$ wrongly allows repeats.
Combinatorics is everywhere
This is the foundation for many areas:
- Conditional Probability — Counting under additional conditions
- Binomial Distribution — $C_n^k$ - binomial coefficients
- Cryptography — Estimating the complexity of key brute-forcing
- Machine Learning — Combinatorial complexity of feature spaces
Key formulas
- **Multiplication (AND):** $m \times n$ - independent sequential choices multiply. The number of system configurations is the product of per-component counts
- **Addition (OR):** $m + n$ - mutually exclusive alternatives add
- **Permutations:** $P_n = n!$ - all $n$ objects in order. Factorial growth crushes any exponential
- **Arrangements:** $A_n^k = n!/(n-k)!$ - $k$ from $n$, order matters. Beam search, passwords
- **With repetitions:** $n^k$ - each position is independent. Token vocabulary to the power of sequence length
- **Combinations:** $C_n^k$ - $k$ from $n$, order irrelevant. Binomial coefficients, BPE vocabulary, system reliability
Вопросы для размышления
- Go back to the handshake problem. Why is $30 \times 30$ wrong here?
- Why do lottery organizers use combinations rather than arrangements?
- An 8-digit password vs an 8-character (letters+digits) password - how many times stronger is the second?
- In poker there are $C_{52}^5 \approx 2.6$ million 5-card hands. How many 7-card hands are there?