Statistics

Bootstrap and Resampling

'How do one build a confidence interval for the median?' For the mean there's a formula. For the median - there isn't. In 1979, Bradley Efron proposed the bootstrap: simulate thousands of samples from the own data. This single idea transformed applied statistics - now any statistic gets a confidence interval.

A/B testing: confidence intervals for conversion rates without normality assumptions
Machine learning: bootstrap aggregating (bagging) - the foundation of Random Forest
Financial risk: Value at Risk (VaR) computed from bootstrapped historical returns
Clinical trials: exact p-values for small samples without distributional assumptions
Genomics: error estimation with a few hundred SNP markers

Предварительные знания

The Resampling Idea: Simulation Instead of Formulas

Bootstrap powers Random Forest (scikit-learn): 100+ trees, each trained on 63.2% of data (sampling with replacement). The **bootstrap** is a method for estimating the sampling distribution of a statistic by repeatedly resampling from the observed data. The intuition: if the sample represents the population well, resampling from the sample mimics resampling from the population. Algorithm: 1. draw n observations with replacement from the n-point sample 2. compute the statistic 3. repeat B = 1,000 - 10,000 times 4. the distribution of the statistic across B repetitions = the bootstrap distribution.

**When to use bootstrap:** 1. for median, mode, IQR, and other statistics without closed-form SE formulas 2. for complex composite statistics (ratio of medians, trimmed mean) 3. small samples where the CLT doesn't apply 4. unknown distributional shape. Not suitable for: very small samples (n < 10), heavy-tailed distributions (bootstrap may miss extremes).

One need a 95% confidence interval for the median of a sample of 25 observations with an unknown distribution. Which method is best?

Bootstrap Confidence Intervals: The Percentile Method

Three main bootstrap CI methods: 1. **Percentile**: [Q(α/2), Q(1-α/2)] of the bootstrap distribution - simple but may be biased. 2. **BCa** (bias-corrected and accelerated) - corrects for bias and skewness; recommended for publication. 3. **Basic bootstrap**: 2×θ − [Q(1-α/2), Q(α/2)] - symmetrises the interval. In practice: BCa for accuracy, percentile for speed.

**How many bootstrap replications?** B=1,000 is sufficient for the standard error. B=5,000 for a percentile CI. B=10,000+ for BCa CI in the tails (α=0.01). More replications = more precise, but slower. For a quick check: B=1,000; for publication: B=5,000 - 10,000.

Bootstrap: the percentile 95% CI for the difference in medians = [2.3, 18.7]. What does this mean?

Permutation Tests

A **permutation test** (randomisation test) is an exact non-parametric test that makes no distributional assumptions. The idea: if H₀ is true (no difference between groups), group labels are arbitrary - they can be shuffled. Algorithm: 1. compute the observed test statistic 2. randomly shuffle labels B times 3. p-value = fraction of shuffles producing a statistic ≥ the observed one.

**Bootstrap vs Permutation test:** bootstrap - for confidence intervals and standard errors (resampling with replacement). Permutation test - for p-values (resampling without replacement, shuffling group labels). Both work without distributional assumptions and handle arbitrary statistics.

Permutation test: one computed the observed difference of means. Then one shuffled group labels 10,000 times and recomputed the difference each time. p = 0.03. What does this mean?

Key Ideas

Bootstrap: B-fold resampling with replacement → sampling distribution of any statistic
Works without analytical formulas: median, trimmed mean, ratio of medians
Percentile CI: [Q(2.5%), Q(97.5%)] of B bootstrap values
BCa CI - more accurate under bias and skewness; use scipy.stats.bootstrap
B=1,000 for SE; B=5,000 for CI; B=10,000 for BCa
Permutation test: p-value without assumptions (shuffle group labels)
Limitations: very small samples (n<10), heavy-tailed distributions

Connections to Other Methods

Bootstrap is related to cross-validation (resampling for model evaluation), bagging/Random Forest (bootstrap ensembles), and the jackknife (the predecessor of bootstrap).

Confidence Intervals — Bootstrap is an alternative CI method without distributional assumptions
Non-Parametric Tests — Permutation tests are exact non-parametric tests

Вопросы для размышления

Why does bootstrap work? What justifies treating resampling from the sample as equivalent to resampling from the population?
In what situation will bootstrap give poor results? (Hint: think about heavy-tailed distributions)
How is bootstrap related to Random Forest? Why does bagging reduce model variance?