Statistics
ANOVA: Comparing Multiple Groups
A pharmaceutical company tests three drug dosages. Running three pairwise t-tests gives a ~14% chance of a false positive. With ANOVA - a controlled 5%. That's exactly why clinical trials require ANOVA rather than multiple t-tests.
- ANOVA is used in A/B/C product testing, in agriculture to compare fertilizers, in psychology to compare therapy methods, and in machine learning to compare algorithms across multiple datasets.
Предварительные знания
The Idea Behind ANOVA: Decomposing Variance
**ANOVA (Analysis of Variance)** is a method for testing whether the means of several populations are equal. Instead of running pairwise t-tests (which inflates the Type I error rate), ANOVA performs a single test by comparing variability *between groups* to variability *within groups*.
**Why not just do pairwise t-tests?** With 3 groups, 3 comparisons are needed. With 10 groups - 45. At α = 0.05, the probability of at least one false discovery is 1 - 0.95⁴⁵ ≈ 90%. ANOVA controls the family-wise Type I error rate at α for the entire set of comparisons.
In a one-way ANOVA with 4 groups of 6 observations each, what are the degrees of freedom for SS_between and SS_within?
The ANOVA Table and F-Test
ANOVA results are typically presented in an **ANOVA table**. The F-statistic is compared against a critical value from the F-distribution. If F > F_critical, we reject H₀.
**ANOVA assumptions:** 1) normality within each group (check with Shapiro-Wilk); 2) homoscedasticity - equal variances (Levene's test); 3) independence of observations. If homoscedasticity is violated, use Welch's ANOVA. If normality is violated with n > 30, it's acceptable by the CLT.
ANOVA returned p = 0.03. What can be concluded?
Post-Hoc Tests: Which Groups Differ?
ANOVA is an 'omnibus test': it indicates whether differences exist, but not where. **Post-hoc tests** perform pairwise comparisons with a multiplicity correction. The most popular: Tukey HSD (balanced groups), Bonferroni (strict correction), Scheffé (flexible).
**Which post-hoc to use?** Tukey HSD - optimal for balanced groups, controls FWER. Games-Howell - when variances are unequal (violated homoscedasticity). Bonferroni - most conservative, best for a small number of comparisons. For exploratory analysis, FDR correction (Benjamini-Hochberg) is appropriate.
Consider 5 patient groups compared after a significant ANOVA (p < 0.05). How many pairwise comparisons must the post-hoc analysis perform?
Key Ideas
- ANOVA compares means of ≥3 groups in one test, controlling the Type I error rate
- F = MS_between / MS_within: if groups differ, F >> 1
- df_between = k-1, df_within = k(n-1) for a balanced design
- Assumptions: normality, homoscedasticity, independence
- Significant ANOVA → post-hoc test (Tukey, Bonferroni) for pairwise comparisons
- ANOVA only says 'there is a difference', not 'where' - that's the job of post-hoc analysis
What's Next
ANOVA is a parametric method requiring normality. When data are directly non-normal or ordinal, use non-parametric alternatives (Kruskal-Wallis instead of ANOVA).
- Non-Parametric Tests — Kruskal-Wallis - the non-parametric alternative to one-way ANOVA
- Bayesian Statistics — Bayesian ANOVA gives probabilities of hypotheses rather than a binary reject/fail decision
Вопросы для размышления
- Why does ANOVA use a ratio of variances rather than a difference of means? When could a large difference in means still yield a non-significant F?
- An A/B/C test on a website yields ANOVA with p = 0.04. How should the result be explained to a manager with no statistics background?
- Two-way ANOVA adds a second factor and their interaction. Think of an example where the interaction between factors is more important than either main effect alone.