Statistics
Hypothesis Testing: How p-values Killed 64,000 Studies
In 2015, Science magazine tried to replicate 100 psychology studies - only 36% held up. Amgen replicated 53 landmark cancer papers - only 6% confirmed. The p-value crisis reshaped how tech companies like Airbnb and Spotify run experiments today.
- Replication crisis 2015: 64% of psychology studies failed independent replication
- FDA drug approval: alpha=0.05 threshold for primary endpoint significance
- Multiple testing at Airbnb and Spotify: Bonferroni and BH corrections
- GWAS genomics: genome-wide significance threshold p < 5e-8 (not 0.05)
- ML evaluation: permutation tests instead of parametric assumptions
- p-hacking prevention: pre-registration and sequential testing (alpha spending)
Предварительные знания
- (no prerequisites)
Semmelweis, 1847: When Data Are Not Enough
**2015. 270 scientists join forces in the Open Science Collaboration and do something unprecedented.** They take 100 published psychology studies - all peer-reviewed, all showing p < 0.05 - and attempt to replicate them. The result: **only 36% replicated**. 64 out of 100 "proven" findings vanished on repetition. This is called the "replication crisis". The shock spread through medicine, economics, and neuroscience. The culprit was not fraud or negligence - it was a fundamental misunderstanding of what p < 0.05 actually means. The story begins in 1847 in Vienna.