Statistics
Nonparametric Tests
How do you test the effect of a drug when measurements are heavy-tailed and the sample is only 12 patients?
- **Clinical trials (FDA):** Wilcoxon is standard for paired before/after biomarkers when n < 30 and normality is not guaranteed
- **A/B tests on conversions:** Mann-Whitney for comparing time-on-page distributions with strongly skewed tails (Netflix, Booking)
- **Quality control:** Kruskal-Wallis compares defect rates across factories without assuming normality
- **Educational research:** comparing class score distributions across teaching methods when scores are bounded and non-normal
Предварительные знания
- Hypothesis testing and p-values
- Concept of the median and order statistics
- Basic combinatorics for the binomial distribution
Nonparametric tests deliver exact or asymptotic p-values without distributional assumptions, working with signs and ranks rather than raw values. The cost: a small power loss versus parametric tests on data that truly meet their assumptions, is rewarded with robustness to outliers, freedom from normality, and applicability to ordinal data.
Decision tree: small n + no normality → sign test or Wilcoxon. Ordinal scale → ranks only (Mann-Whitney, Kruskal-Wallis). Heavy tails or outliers → rank tests instead of t/F. Mixed continuous + discrete → permutation test. Parametric tests are preferable only when distributional assumptions hold.
Sign test
In 1710 John Arbuthnot ran the first ever statistical test: he counted years in which male births in London exceeded female births (82 of 82 from 1629 to 1710) and concluded the probability of equal birth proportions was 1/2^82, implausible. This is the sign test in its original form: a binomial test of the median, requiring almost no assumptions about the distribution.
The sign test is the most robust classical test: it works even when the mean is not defined (Cauchy) or for arbitrarily heavy-tailed distributions. Cost of robustness: low power for symmetric distributions, about 64% relative to a t-test for the normal case.
What distribution does the sign-test statistic S follow under H_0: med(X) = m_0?
Under the null, each X_i is independently above or below m_0 with probability 1/2 (by definition of the median). The count of '+' signs is therefore a sum of n independent Bernoulli(1/2) variables, i.e. Bin(n, 1/2). The result is exact and distribution-free: any continuous F at the median gives the same null distribution.
Wilcoxon signed-rank test
In 1945 Frank Wilcoxon noted that the sign test discards information about the magnitude of deviations: a tiny X_i - m_0 = 0.01 and a large 10.0 count the same. Wilcoxon proposed using absolute-value ranks: assign weight 1 to the smallest deviation, weight 2 to the next, and so on. The result is more powerful for symmetric distributions while still being distribution-free.
Wilcoxon is the FDA standard for paired clinical-trial data: e.g., biomarker before/after treatment in 30 patients. It is robust to outliers and does not require normality.
Why is the Wilcoxon signed-rank test more powerful than the sign test?
The sign test counts only how many d_i are positive, small and large deviations contribute equally. Wilcoxon weights each positive deviation by its rank |d_i|: large deviations have weight n, small ones have weight 1. Asymptotic Relative Efficiency (ARE) of Wilcoxon vs t-test ≈ 0.955, while sign-test vs t ≈ 0.637. Wilcoxon nearly matches the t-test on Gaussian data while staying robust.
Kruskal-Wallis test
In 1952 William Kruskal and W. Allen Wallis generalized Wilcoxon to k groups: a nonparametric analogue of one-way ANOVA. Idea: pool all observations, rank them globally, then check whether ranks are distributed similarly across groups. No normality assumption: only continuity and independence.
Important: Kruskal-Wallis tests not only differences in location but also in distributional shape, even with equal medians but different variances, the test can reject. Stricter form (with the assumption of equal shape, only locations differ) restores the analogy with ANOVA.
How does the Kruskal-Wallis test differ from classical one-way ANOVA?
Both tests address H_0: all groups have the same distribution. ANOVA assumes normality and equal variances and uses the F-statistic on raw data. KW uses ranks of pooled observations and the statistic H ~ χ²_{k-1} asymptotically. ARE of KW vs ANOVA is 0.955 under normality, only a small power loss for substantially broader assumptions. With heavy-tailed data or outliers, KW outperforms ANOVA.
Итоги
- **Sign test:** tests the median; statistic S ~ Bin(n, 1/2) under H_0; robust but loses information about magnitudes
- **Wilcoxon signed-rank:** weights deviations by rank |d_i|; ARE ≈ 0.955 vs t-test on normal data
- **Mann-Whitney U:** Wilcoxon for two independent samples; equivalent to t-test on ranks
- **Kruskal-Wallis:** nonparametric analogue of ANOVA on k groups; H ~ χ²_{k-1}
- **Permutation tests:** exact p-values via null-hypothesis enumeration of label permutations
- **ARE:** asymptotic relative efficiency measures power loss vs parametric tests when their assumptions hold
Nonparametric tests and adjacent topics
Nonparametric methods are a bridge between classical hypothesis testing and modern computational statistics.
- Bootstrap and resampling — Computational extension
- Robust statistics — Outlier resistance
- Ordinal regression — Rank-based modelling