Logic

The Base Rate Fallacy

A terrorist is caught! A lie detector test shows 'guilty.' The detector's accuracy is 90%. Would you convict? What if terrorists are 1 in a million, and 10% of innocent people also 'fail' the test? Then 99.99% of everyone flagged is innocent. One number - the base rate - changes everything.

**Medicine:** doctors overestimate the probability of rare diagnoses when symptoms are 'typical.' A medical student sees a patient with a headache and thinks tumor, even though 99.9% of the time it's a migraine
**Investing:** 'This startup will definitely take off' - but 90% of startups fail. A founder's compelling story doesn't override the base statistics
**Forensics:** a DNA match from a database of millions means something entirely different from a match when checking a specific suspect

Ignoring the Base Rate

The famous Kahneman and Tversky experiment: a group contains 70 engineers and 30 lawyers. One person is chosen at random, and all we know is: 'likes mathematical puzzles, uninterested in politics, neat and orderly.' What is the probability this person is an engineer? Most people answer '90%', ignoring the fact that the **prior** probability is already 70%.

**Base rate neglect** - a cognitive bias in which people ignore statistical information about the prevalence of a phenomenon and instead focus on vivid individual characteristics. Discovered by Daniel Kahneman and Amos Tversky in the 1970s.

Why does the brain ignore base rates? Because a **vivid story** captures attention, while **dry statistics** do not. The description 'likes puzzles, neat' conjures an image of an engineer, and we 'forget' to ask: how common are engineers in this group in the first place?

**The representativeness heuristic:** we judge probability by how much a description 'resembles' a typical member of a category. But resemblance is not the same as probability. Even if a description perfectly fits an engineer, what matters is how many engineers are in the sample.

At a university, 90% of students are in the humanities and 10% in STEM. A random student is described as 'loves programming, plays chess.' What is the approximate probability this student is in STEM?

The False Positive Problem

When the base rate is low, even a highly accurate test can produce more **false positives** than true positives. This is a counterintuitive but critically important fact for medicine, security, and law.

**Test error matrix:**

	Actually ill	Actually healthy
Test positive	True positive (TP)	False positive (FP)
Test negative	False negative (FN)	True negative (TN)

**PPV (Positive Predictive Value)** = TP / (TP + FP) - the probability of having the disease given a positive test

This explains why you cannot blindly screen an entire population for rare diseases. With a disease prevalence of 1 in 10,000 and a test that is 99% accurate, the majority of positive results will be false. Every false positive means stress, unnecessary procedures, and sometimes real harm to health.

**The solution:** test not everyone, but high-risk groups with an elevated base rate. If a person has symptoms or risk factors, their 'prior' is higher, and a positive test becomes far more informative.

Why does mass screening for a rare disease produce mostly false positive results?

The Screening Paradox

**The screening paradox:** mass testing for rare diseases can cause more harm than good. Why? Because the harm from false positives (stress, biopsies, surgeries) is multiplied by their enormous number, while the benefit of finding rare true cases is limited.

**Example: breast cancer screening** According to research data: • Per 1,000 women over 10 years of screening: - 1 life saved (from early detection) - 100+ false alarms (follow-up examinations) - 5–15 unnecessary biopsies - 0–2 cases of overdiagnosis (treatment of cancer that would never have caused harm) Conclusion: there is benefit, but less than intuition suggests

**Overdiagnosis** - detecting a 'disease' that would never have caused harm. Some cancers grow so slowly that the patient will die of old age first. But upon learning the diagnosis, they will receive treatment with all of its side effects.

This paradox doesn't mean screening is useless. It means that the decision to screen must account for **all** consequences: both lives saved and harm from false alarms. Informed patient consent should include these numbers.

A country introduces mass screening for a rare disease X. The number of diagnoses increases 10-fold, but mortality does not change. What most likely happened?

How to Use Base Rates

Ignoring base rates is a cognitive error, but **blindly following** them is also a problem. The right approach is to **weigh** statistics and individual characteristics in accordance with Bayes' theorem.

**Rules for working with base rates:** 1. **Always ask:** how common is this in general? 2. **Don't ignore the description:** individual characteristics do change probability 3. **Assess diagnosticity:** how well does the characteristic distinguish between groups? 4. **Find the reference class:** which group does this case belong to? 5. **Update sequentially:** each new fact adjusts the estimate

**The reference class problem:** which group should a case be assigned to? A person in a library - is that 'all adults' (many farmers) or 'library visitors' (few farmers)? The choice of reference class changes the prior. There is no single correct answer - it is a matter of judgment.

**Practical tip:** when you hear a vivid story (in the news, in an argument), ask yourself: 'How often does this actually happen?' Plane crashes make the news; car accidents don't. But flying is safer than driving. The base rate is your anchor to reality.

Statistics don't apply to individual cases - every case is unique

Statistics provide a baseline estimate; individual characteristics then adjust that estimate

Yes, every case is unique, but that doesn't mean base rates are useless. They give a starting point (prior) from which we update based on individual features. Ignoring statistics means treating your case as 'special' without justification.

You hear that a friend's startup 'will definitely take off' - the idea is unique, the team is strong, the market is growing. What question should you ask with base rates in mind?

Key Takeaways

**Base rate neglect:** we ignore 'boring' statistics and focus on vivid details
**The false positive paradox:** with rare events, even an accurate test produces more false alarms than true ones
**The screening paradox:** mass testing for rare diseases can do more harm than good
**The right approach:** use the base rate as the starting point, then update using individual characteristics

Вопросы для размышления

Recall a decision you made based on a vivid story (a news item, a friend's advice). What was the actual base rate of success or failure?
How would your fears (terrorism, plane crashes, rare diseases) change if you always kept base rates in mind?
In which areas are you inclined to see yourself as a 'special case' to which statistics don't apply?

Связанные уроки

prob-04-bayes