Probability Theory
Event Independence
Цели урока
- Understand the formal definition of independence through the formula
- Learn to test event independence in practice
- Distinguish independence from mutual exclusivity - a critical difference!
- Master the multiplication rule for independent events
- Recognize and avoid the "gambler's fallacy"
Предварительные знания
- Conditional probability P(A|B) - understanding the formula
- The multiplication rule for dependent events
- Bayes' theorem (helpful, but not required)
Pairwise independence does not imply mutual independence. Example: three coin flips $X_1, X_2, X_3$ - each 0 or 1 with equal probability. Define $X_3 = X_1 \oplus X_2$ (XOR). Any two of them are pairwise independent. But all three together are not: knowing $X_1$ and $X_2$ completely determines $X_3$. This is not a theoretical curiosity - it is why dropout in neural networks requires separate independence analysis per layer.
- **Dropout:** dropout masks are assumed independent across neurons - this assumption is what makes dropout an approximate Bayesian inference method
- **Batch normalization:** assumes independence of examples within the batch - correlated batches (e.g. sorted by class) break this and reduce effectiveness
- **Naive Bayes:** assumes conditional independence of features given the class - a naive assumption that works surprisingly well in practice
- **A/B test independence:** users must be independent for p-values to be valid. Network effects violate this and make results unreliable
- **Cryptography:** every key bit must be independent of the previous ones. Pseudorandomness is the engineering approximation of independence
The night the roulette wheel broke psychology
The events of August 18, 1913 went down in history as "Le Grande". The probability of 26 blacks in a row - $(18/37)^{26} \approx 1/67,000,000$. But the casino didn't "cheat" - streaks happen. Psychologists later named the gamblers' error the **"gambler's fallacy"**. People were looking for "fairness" where none existed - the roulette wheel has no memory; every spin is independent. The irony: the probability of red on spin 27 was the same as on spin 1 - 18/37 ≈ 48.6%. There is no "compensation" for a black streak.
Event Independence
**August 18, 1913, the Casino de Monte-Carlo.** Black comes up on the roulette wheel. Then again. After the 10th black, players flood to bet on red: *"the universe must restore balance"*.
Black comes up for the 15th time. The 20th. Bets grow. People pawn their watches. The casino earns millions. On the 26th spin - black again. The probability of that streak: about 1 in 67 million. On spin 27, red was exactly as likely as on spin 1.
The roulette wheel has no memory. That is formalized in a single word: **independence**.
Events $A$ and $B$ are independent if:
Product of probabilities is the formal definition of independence. Equivalently: $P(A|B)=P(A)$ - knowing $B$ does not change the probability of $A$. Disjointness ($A\cap B=\emptyset$) is a different concept.
📐 What does "independent events" mean?
📐 What does "independent events" mean?
Intuitively: event $B$ **does not affect** the probability of event $A$. Formally:
**In plain words:** knowing whether $B$ occurred **does not change** our estimate of the probability of $A$. Information about $B$ is useless for predicting $A$.
🪙 Two coin flips
Classic independence check
$A$ = "first flip is heads" $B$ = "second flip is heads" Sample space: {HH, HT, TH, TT} - all equally likely. $P(A) = 2/4 = 1/2$ (HH and HT) $P(B) = 2/4 = 1/2$ (HH and TH) $P(A \cap B) = 1/4$ (only HH) **Check:** $P(A) \cdot P(B) = 1/2 \cdot 1/2 = 1/4 = P(A \cap B)$ ✓ The events are independent! The outcome of the first flip has no effect on the second.
Independence is an assumption made in the model. The coin is assumed independent of previous flips. Dropout masks are assumed independent across neurons. Users in an A/B test are assumed independent. In reality, absolute independence does not exist. The question is always whether the dependence is small enough to be negligible for the task at hand.
For independent events A and B, P(A|B) = P(A). What does this mean in practice?
$P(A|B) = P(A)$ literally means: "the probability of A given B" = "the probability of A with no conditions". Information about B is **useless** for predicting A. It doesn't matter whether B happened or not - our estimate of P(A) doesn't change. This is the essence of independence: events "don't know" about each other.
⚠️ Independence ≠ Mutual Exclusivity!
⚠️ Independence ≠ Mutual Exclusivity!
This is **the most common confusion** in probability theory. Let's settle it once and for all.
| Mutually exclusive (disjoint) | Independent | |
|---|---|---|
| Definition | $A \cap B = \emptyset$ | $P(A \cap B) = P(A) \cdot P(B)$ |
| Can they occur together? | ❌ NO | ✅ YES (usually) |
| Do they affect each other? | ✅ Strongly! (one excludes the other) | ❌ NO |
| $P(A \cap B)$ | = 0 | = $P(A) \cdot P(B) > 0$ |
🎲 Mutually exclusive, but DEPENDENT
Rolling a die
$A$ = "rolled 1", $B$ = "rolled 6" **Mutually exclusive?** ✅ Yes - rolling both 1 and 6 simultaneously is impossible. $P(A \cap B) = 0$ **Independent?** ❌ NO! If $A$ occurred (rolled 1), then $B$ definitely didn't: $P(B|A) = 0 \neq P(B) = 1/6$ Knowing $A$ **completely determines** $B$ - that's the strongest possible dependence!
🎲 Independent and COMPATIBLE
"Even" and "greater than 2"
$A$ = "rolled even" = {2, 4, 6}, $P(A) = 1/2$ $B$ = "rolled greater than 2" = {3, 4, 5, 6}, $P(B) = 4/6 = 2/3$ **Compatible?** ✅ Yes - $A \cap B$ = {4, 6}, $P(A \cap B) = 2/6 = 1/3$ **Independent?** Let's check: $P(A) \cdot P(B) = 1/2 \cdot 2/3 = 1/3 = P(A \cap B)$ ✅ Yes, independent! Knowing "greater than 2" doesn't change the probability of even.
If events cannot occur simultaneously, they are independent (don't affect each other)
Mutually exclusive events (other than trivial cases) are ALWAYS dependent - and very strongly so!
If $A$ and $B$ are mutually exclusive, then $P(A \cap B) = 0$. But $P(A) \cdot P(B) > 0$ (if both events are possible). So $P(A \cap B) \neq P(A) \cdot P(B)$ - that is dependence. Moreover, it is maximum negative dependence: if one occurred, the other is guaranteed not to have. Mutual exclusivity is the opposite of independence, not a synonym.
We roll a die. A = "even" = {2,4,6}, B = "less than 4" = {1,2,3}. Are these events independent?
$A = \{2,4,6\}$, $P(A) = 3/6 = 1/2$ $B = \{1,2,3\}$, $P(B) = 3/6 = 1/2$ $A \cap B = \{2\}$, $P(A \cap B) = 1/6$ $P(A) \cdot P(B) = 1/2 \cdot 1/2 = 1/4$ But $P(A \cap B) = 1/6 \neq 1/4$ The events are **dependent**! Knowing "less than 4" reduces the probability of even (only 2 out of {1,2,3} is even).
✖️ The multiplication rule for independent events
✖️ The multiplication rule for independent events
For **independent** events, computing $P(A \cap B)$ becomes trivial - just multiply:
Compare this with the general formula for **dependent** events, which requires conditional probabilities:
🪙 5 heads in a row
Simple multiplication
What is the probability of getting 5 heads in a row with a fair coin? Flips are **independent**, $P(\text{heads}) = 1/2$. $$P(\text{5 heads}) = \left(\frac{1}{2}\right)^5 = \frac{1}{32} \approx 3.1\%$$ Rare, but not impossible. On average this happens once every 32 series of 5 flips.
🔧 Reliability of a series system
Real-world application
A system of 3 components works only if **all three** are working. Reliability: $R_1 = 0.95$, $R_2 = 0.98$, $R_3 = 0.99$ Failures are **independent**. $$P(\text{system works}) = 0.95 \cdot 0.98 \cdot 0.99 \approx 0.921$$ Even with high component reliability (~95-99%), the system only works 92% of the time! **Engineering takeaway:** for critical systems, use **parallel** redundancy (duplication).
The probability of rain on any given day is 30% (independently). What is the probability of rain on at least one day out of 7?
$P(\text{no rain on a day}) = 0.7$ $P(\text{no rain all week}) = 0.7^7 \approx 0.082$ $P(\text{at least 1 rainy day}) = 1 - 0.082 \approx \mathbf{91.8\%}$ **The complement trick** - it is often easier to compute $P(\text{none})$ than to enumerate all the cases for "at least one".
🎰 The Gambler's Fallacy
🎰 The Gambler's Fallacy
Back to Monte Carlo. Why did gamblers **believe** red "had to" come up?
The **gambler's fallacy** is the belief that after a streak of identical outcomes, the probability of the opposite result **increases** - as if the universe "needs to restore balance".
🎡 Roulette after 5 blacks
Flawed vs. correct reasoning
**Flawed:** "Black came up 5 times! The law of large numbers says there must be balance. I'm betting on red!" **Correct:** Every spin is **independent**. The roulette wheel doesn't remember previous results. $P(\text{red on spin 6}) = 18/37 \approx 48.6\%$ Exactly the same probability as on **any** spin. "Balance" is restored not by red becoming more likely, but by the black fraction washing out over a long series.
If events are **dependent** - previous outcomes DO genuinely affect what comes next! **Example:** Drawing cards **without replacement**. If all 4 aces have been drawn - the probability of an ace on the next draw is 0, not 4/52. **Example:** In poker (no jokers), visible cards affect the probabilities of the remaining ones. The gambler's fallacy is an error **at roulette, with coins, with dice**. Not in poker or blackjack!
There is also the **reverse fallacy**: the belief that a winning streak will continue ("he's on fire!"). For a long time it was thought that the "hot hand" in basketball didn't exist - just the same error in reverse. **But!** Modern research has shown that for some athletes and situations the effect can be real - through psychology, confidence, and changes in defensive behavior. The lesson: don't confuse **mathematical independence** (roulette) with **real life** (where events can be linked through psychology).
A coin has landed heads 10 times in a row. What is the probability of tails on the 11th flip?
For a **fair** coin, each flip is independent. The previous 10 heads have no effect on the 11th flip. $P(\text{tails on flip 11}) = 50\%$ **However**, 10 heads in a row is a very rare event $(1/1024)$. If it happened, it's reasonable to **question** whether the coin is fair! Perhaps $P(\text{tails}) \neq 50\%$ for this particular coin. The answer "depends on whether it's fair" would be correct if the question were about a real-world situation. But for an **ideal** fair coin - 50%.
🔀 Conditional Independence
🔀 Conditional Independence
Sometimes events are **dependent** unconditionally, but become **independent** given a third event. This is called **conditional independence**:
☂️ Umbrella and sunglasses
Classic example of conditional independence
$A$ = "Alice brought an umbrella" $B$ = "Bob is wearing sunglasses" $C$ = "It is sunny today" **Unconditionally**, $A$ and $B$ are **dependent** - both depend on the weather. If Alice brought an umbrella, it's probably cloudy, and Bob probably isn't wearing sunglasses. **Given "it is sunny"**, $A$ and $B$ may become **independent**: each person makes their own decision, and the weather is already known. $$P(A \cap B | C) = P(A|C) \cdot P(B|C)$$
Naive Bayes assumes conditional independence of features given the class: $$P(w_1, w_2, \ldots \mid \text{spam}) = P(w_1 \mid \text{spam}) \cdot P(w_2 \mid \text{spam}) \cdot \ldots$$ Words in text are directly not independent - but the assumption works surprisingly well. Dropout uses the same trick: neuron masks are treated as conditionally independent given the activations. Conditional independence is the key structure in Bayesian networks and graphical models, which underpin modern probabilistic ML.
Two students' exam results are unconditionally dependent (both prepared from the same materials). Under what condition might they become independent?
The results **correlate** through a shared hidden factor - level of preparation (and exam difficulty). If we **know** each student's level of preparation, the results become conditionally independent: each sits their own exam, given their known level. $$P(A \geq 4, B \geq 4 | \text{levels}) = P(A \geq 4 | \text{level of A}) \cdot P(B \geq 4 | \text{level of B})$$ This is the principle of **d-separation** in Bayesian networks: conditioning on a common "ancestor" makes the descendants independent.
🏋️ Practice
🏋️ Practice
A shooter hits a target with probability 0.8. Shots are independent. What is the probability of hitting at least once in 3 shots?
$P(\text{miss}) = 0.2$ $P(\text{3 misses in a row}) = 0.2^3 = 0.008$ $P(\text{at least 1 hit}) = 1 - 0.008 = \mathbf{0.992 = 99.2\%}$ Almost certainly hits at least once!
Two dice are rolled. Are these events independent: A = "the first die shows 6" and B = "the sum of the dice is 7"?
$P(A) = 1/6$ (first die shows 6) $P(B) = 6/36 = 1/6$ - pairs summing to 7: (1,6), (2,5), (3,4), (4,3), (5,2), (6,1) $A \cap B$: first die = 6 **and** sum = 7 → second die = 1 That is the single pair (6, 1): $P(A \cap B) = 1/36$ **Check:** $P(A) \cdot P(B) = 1/6 \cdot 1/6 = 1/36 = P(A \cap B)$ ✅ The events are **independent**! Surprising but true.
A system consists of 4 components: two connected in series (both must work), then that block is connected in parallel with a second identical block. Each component has reliability 0.9. Failures are independent. What is the system's reliability?
**Step 1:** Reliability of one series block: $R_{\text{block}} = 0.9 \cdot 0.9 = 0.81$ **Step 2:** Probability of block failure: $Q_{\text{block}} = 1 - 0.81 = 0.19$ **Step 3:** For the parallel connection, the system works if at least one block works: $R_{\text{system}} = 1 - Q_{\text{block}}^2 = 1 - 0.19^2 = 1 - 0.0361 = \mathbf{0.964}$ Parallel redundancy boosted reliability from 81% to 96.4%!
A server is replicated across three independent instances, each failing on a given day with probability $0.05$. What is the probability that at least one instance stays up for the day?
At least one survives iff not all fail. With independent failures $P(\text{all fail}) = 0.05^3$, so the answer is $1 - 0.05^3 = 0.999875$. Parallel redundancy sharply boosts fault tolerance.
Independence - the foundation of statistics
Most statistical methods assume independence of observations.
- Random Variables — Independent random variables - the basis of many distributions
- Law of Large Numbers — Requires independence for convergence
- Central Limit Theorem — Sum of independent random variables → normal distribution
- Markov Chains — A relaxation: dependence only on the previous state
- Bayesian Networks — Conditional independence as structure
Key ideas
- **Independence:** $P(A \cap B) = P(A) \cdot P(B)$ - knowing $B$ does not change the probability of $A$. The XOR trick - $X_3 = X_1 \oplus X_2$ - shows that pairwise independence does not imply mutual independence
- **Independence ≠ mutual exclusivity:** mutually exclusive events are maximally dependent - one occurring rules out the other, $P(A \cap B) = 0 \neq P(A) \cdot P(B)$
- **Dropout as Bayesian inference:** each neuron is dropped independently with probability $p$. This conditional independence assumption makes dropout an approximation to Bayesian model averaging over exponentially many architectures
- **Gambler's fallacy:** independent events have no memory. The roulette wheel after 26 blacks is still $18/37$ red. Monte Carlo, August 1913 - gamblers lost millions on this error
- **Conditional independence:** $P(A \cap B \mid C) = P(A \mid C) \cdot P(B \mid C)$ - the backbone of Naive Bayes, Bayesian networks, and graphical models. Words are not independent, but conditional on the class they are treated as if they were - and it works
Вопросы для размышления
- 🎰 Back to Monte Carlo: someone present on August 18, 1913 - how could the case be made to the crowd that betting on red was pointless?
- 🧠 Why are people so prone to the "gambler's fallacy"? What psychological need lies behind it?
- 🏀 How can it be tested whether a particular basketball player has a "hot hand"? What data is needed?
- 🔐 Why is the independence of bits in a random key critically important for cryptography?