Statistics

ANOVA: Comparing Multiple Groups

A pharmaceutical company tests three drug dosages. Running three pairwise t-tests gives a ~14% chance of a false positive. With ANOVA - a controlled 5%. That's exactly why clinical trials require ANOVA rather than multiple t-tests.

ANOVA is used in A/B/C product testing, in agriculture to compare fertilizers, in psychology to compare therapy methods, and in machine learning to compare algorithms across multiple datasets.

Предварительные знания

Student's t-test: The Statistic Born in a Brewery

The Idea Behind ANOVA: Decomposing Variance

**ANOVA (Analysis of Variance)** is a method for testing whether the means of several populations are equal. Instead of running pairwise t-tests (which inflates the Type I error rate), ANOVA performs a single test by comparing variability *between groups* to variability *within groups*.

**Why not just do pairwise t-tests?** With 3 groups, 3 comparisons are needed. With 10 groups - 45. At α = 0.05, the probability of at least one false discovery is 1 - 0.95⁴⁵ ≈ 90%. ANOVA controls the family-wise Type I error rate at α for the entire set of comparisons.

In a one-way ANOVA with 4 groups of 6 observations each, what are the degrees of freedom for SS_between and SS_within?

The ANOVA Table and F-Test

ANOVA results are typically presented in an **ANOVA table**. The F-statistic is compared against a critical value from the F-distribution. If F > F_critical, we reject H₀.

**ANOVA assumptions:** 1) normality within each group (check with Shapiro-Wilk); 2) homoscedasticity - equal variances (Levene's test); 3) independence of observations. If homoscedasticity is violated, use Welch's ANOVA. If normality is violated with n > 30, it's acceptable by the CLT.

ANOVA returned p = 0.03. What can be concluded?

Post-Hoc Tests: Which Groups Differ?

ANOVA is an 'omnibus test': it indicates whether differences exist, but not where. **Post-hoc tests** perform pairwise comparisons with a multiplicity correction. The most popular: Tukey HSD (balanced groups), Bonferroni (strict correction), Scheffé (flexible).

**Which post-hoc to use?** Tukey HSD - optimal for balanced groups, controls FWER. Games-Howell - when variances are unequal (violated homoscedasticity). Bonferroni - most conservative, best for a small number of comparisons. For exploratory analysis, FDR correction (Benjamini-Hochberg) is appropriate.

Consider 5 patient groups compared after a significant ANOVA (p < 0.05). How many pairwise comparisons must the post-hoc analysis perform?

Key Ideas

ANOVA compares means of ≥3 groups in one test, controlling the Type I error rate
F = MS_between / MS_within: if groups differ, F >> 1
df_between = k-1, df_within = k(n-1) for a balanced design
Assumptions: normality, homoscedasticity, independence
Significant ANOVA → post-hoc test (Tukey, Bonferroni) for pairwise comparisons
ANOVA only says 'there is a difference', not 'where' - that's the job of post-hoc analysis

What's Next

ANOVA is a parametric method requiring normality. When data are directly non-normal or ordinal, use non-parametric alternatives (Kruskal-Wallis instead of ANOVA).

Non-Parametric Tests — Kruskal-Wallis - the non-parametric alternative to one-way ANOVA
Bayesian Statistics — Bayesian ANOVA gives probabilities of hypotheses rather than a binary reject/fail decision

Вопросы для размышления

Why does ANOVA use a ratio of variances rather than a difference of means? When could a large difference in means still yield a non-significant F?
An A/B/C test on a website yields ANOVA with p = 0.04. How should the result be explained to a manager with no statistics background?
Two-way ANOVA adds a second factor and their interaction. Think of an example where the interaction between factors is more important than either main effect alone.

Связанные уроки

prob-11-normal