Statistics
Bootstrap and Resampling
'How do one build a confidence interval for the median?' For the mean there's a formula. For the median - there isn't. In 1979, Bradley Efron proposed the bootstrap: simulate thousands of samples from the own data. This single idea transformed applied statistics - now any statistic gets a confidence interval.
- A/B testing: confidence intervals for conversion rates without normality assumptions
- Machine learning: bootstrap aggregating (bagging) - the foundation of Random Forest
- Financial risk: Value at Risk (VaR) computed from bootstrapped historical returns
- Clinical trials: exact p-values for small samples without distributional assumptions
- Genomics: error estimation with a few hundred SNP markers
Предварительные знания
The Resampling Idea: Simulation Instead of Formulas
Bootstrap powers Random Forest (scikit-learn): 100+ trees, each trained on 63.2% of data (sampling with replacement). The **bootstrap** is a method for estimating the sampling distribution of a statistic by repeatedly resampling from the observed data. The intuition: if the sample represents the population well, resampling from the sample mimics resampling from the population. Algorithm: 1. draw n observations with replacement from the n-point sample 2. compute the statistic 3. repeat B = 1,000 - 10,000 times 4. the distribution of the statistic across B repetitions = the bootstrap distribution.
**When to use bootstrap:** 1. for median, mode, IQR, and other statistics without closed-form SE formulas 2. for complex composite statistics (ratio of medians, trimmed mean) 3. small samples where the CLT doesn't apply 4. unknown distributional shape. Not suitable for: very small samples (n < 10), heavy-tailed distributions (bootstrap may miss extremes).
One need a 95% confidence interval for the median of a sample of 25 observations with an unknown distribution. Which method is best?
Bootstrap Confidence Intervals: The Percentile Method
Three main bootstrap CI methods: 1. **Percentile**: [Q(α/2), Q(1-α/2)] of the bootstrap distribution - simple but may be biased. 2. **BCa** (bias-corrected and accelerated) - corrects for bias and skewness; recommended for publication. 3. **Basic bootstrap**: 2×θ − [Q(1-α/2), Q(α/2)] - symmetrises the interval. In practice: BCa for accuracy, percentile for speed.
**How many bootstrap replications?** B=1,000 is sufficient for the standard error. B=5,000 for a percentile CI. B=10,000+ for BCa CI in the tails (α=0.01). More replications = more precise, but slower. For a quick check: B=1,000; for publication: B=5,000 - 10,000.
Bootstrap: the percentile 95% CI for the difference in medians = [2.3, 18.7]. What does this mean?
Permutation Tests
A **permutation test** (randomisation test) is an exact non-parametric test that makes no distributional assumptions. The idea: if H₀ is true (no difference between groups), group labels are arbitrary - they can be shuffled. Algorithm: 1. compute the observed test statistic 2. randomly shuffle labels B times 3. p-value = fraction of shuffles producing a statistic ≥ the observed one.
**Bootstrap vs Permutation test:** bootstrap - for confidence intervals and standard errors (resampling with replacement). Permutation test - for p-values (resampling without replacement, shuffling group labels). Both work without distributional assumptions and handle arbitrary statistics.
Permutation test: one computed the observed difference of means. Then one shuffled group labels 10,000 times and recomputed the difference each time. p = 0.03. What does this mean?
Key Ideas
- Bootstrap: B-fold resampling with replacement → sampling distribution of any statistic
- Works without analytical formulas: median, trimmed mean, ratio of medians
- Percentile CI: [Q(2.5%), Q(97.5%)] of B bootstrap values
- BCa CI - more accurate under bias and skewness; use scipy.stats.bootstrap
- B=1,000 for SE; B=5,000 for CI; B=10,000 for BCa
- Permutation test: p-value without assumptions (shuffle group labels)
- Limitations: very small samples (n<10), heavy-tailed distributions
Connections to Other Methods
Bootstrap is related to cross-validation (resampling for model evaluation), bagging/Random Forest (bootstrap ensembles), and the jackknife (the predecessor of bootstrap).
- Confidence Intervals — Bootstrap is an alternative CI method without distributional assumptions
- Non-Parametric Tests — Permutation tests are exact non-parametric tests
Вопросы для размышления
- Why does bootstrap work? What justifies treating resampling from the sample as equivalent to resampling from the population?
- In what situation will bootstrap give poor results? (Hint: think about heavy-tailed distributions)
- How is bootstrap related to Random Forest? Why does bagging reduce model variance?