Statistics

Multivariate Analysis: MANOVA and Discriminant Analysis

How do you compare two customer cohorts on 12 quality metrics at once, when correlations between metrics turn the naive approach into a false-discovery race?

**Sephora product analytics:** MANOVA compares 40 SKUs on 8 buyer-behaviour metrics (CTR, AOV, return rate, ...) simultaneously, separate ANOVAs would give FWER > 30%
**Biomedical diagnostics:** LDA on 30 breast-cancer biomarkers classifies benign vs malignant tumours at AUC 0.97 (Wisconsin Breast Cancer dataset)
**fMRI neuroimaging:** canonical correlation aligns 50,000 brain voxels with 200 behavioural tests, linking prefrontal cortex activity to working memory
**Financial stability:** regulators use QDA on 25 bank-balance indicators for early warning of liquidity crises

Предварительные знания

Matrices, eigenvalues, SVD
ANOVA and the F-distribution
Multivariate normal distribution and covariance

Multivariate analysis collects methods that work with vector observations as a whole rather than each coordinate in isolation. This shifts intuition: equality-of-means becomes the question of distinct ellipsoid centers; classification becomes building hypersurfaces in ℝ^p; measuring dependence becomes searching for canonical axes between two point clouds. The shared ingredient is spectral decomposition of symmetric or generalized-symmetric matrices.

Geometric view: ANOVA compares positions of points on a line → MANOVA compares positions of ellipsoids in ℝ^p; t-test and LDA are special cases of the same discriminant direction Σ⁻¹(μ₁ - μ₂); regression and CCA are special cases of searching for axes of maximum correlation.

Modern extensions: sparse LDA (with L1 regularization) for p >> n, kernel CCA and Deep CCA for nonlinear links, RDA for regularizing QDA at small n_k, nonlinear discriminants via neural networks with softmax outputs, all interpretable as generalizations of the basic multivariate-analysis formulas.

Multivariate normal distribution

The multivariate normal N_p(μ, Σ) generalizes the univariate normal to a p-dimensional vector. Parameters: mean vector μ ∈ ℝ^p and positive-definite covariance matrix Σ ∈ ℝ^{p×p}. The density depends on an observation only through the Mahalanobis distance (x-μ)ᵀΣ⁻¹(x-μ), which properly accounts for correlations between variables.

Mardia's test of multivariate normality uses generalized skewness b₁,p and kurtosis b₂,p. When normality fails, apply Box-Cox transforms or use rank-based multivariate methods.

Why use Mahalanobis distance rather than Euclidean distance to detect multivariate outliers?

D²_M = (x-μ)ᵀΣ⁻¹(x-μ) turns the concentration ellipsoid into a sphere: data is first decorrelated (via Σ⁻¹) and then scaled by its eigen-variances. Euclidean ignores correlations and overstates the normality of points along low-variance directions.

MANOVA: comparing mean vectors

MANOVA (Multivariate ANOVA) generalizes ANOVA to p simultaneous responses. The null is H₀: μ₁ = μ₂ = ... = μ_g, where μ_k is the mean vector of group k. Advantage over separate ANOVAs: control of type-I error under correlated responses and detection of differences visible only in linear combinations of features.

Why is MANOVA fundamentally better than running p separate ANOVAs to test equality of group means on p responses?

Separate ANOVAs give FWER ≈ 1 - (1-α)^p and ignore correlations. MANOVA finds the direction of maximum between-group signal through W⁻¹B: two groups can match on every coordinate yet differ along a linear combination, the canonical example is clusters rotated 45° to the axes.

Linear and quadratic discriminant analysis

LDA (Linear Discriminant Analysis) builds a classifier assuming each class N_p(μ_k, Σ) shares a common covariance. Decision boundaries between classes are linear. QDA (Quadratic) allows separate Σ_k per class, producing quadratic decision surfaces.

Regularized discriminant analysis (RDA) interpolates between LDA and QDA via Σ̂_k(α) = α·Σ̂_k + (1-α)·Σ̂_pooled, selecting α by cross-validation.

When should LDA be preferred over QDA in practice?

QDA fits g·p(p+1)/2 covariance parameters vs p(p+1)/2 for LDA. With small n_k, Σ̂_k estimates are unstable and QDA loses on MSE. Rule of thumb: LDA when n_k < 5·p or when covariances are similar; QDA when n_k >> p and class scatter shapes differ noticeably.

Canonical correlation

Canonical Correlation Analysis (CCA) finds linear combinations of two variable sets X ∈ ℝ^p and Y ∈ ℝ^q whose correlation is maximal. Vectors a and b are called canonical loadings, and corr(aᵀX, bᵀY) is the canonical coefficient ρ. It generalizes Pearson correlation to vector random variables.

CCA underpins multi-view learning (text+image), DCCA (deep CCA via neural networks), and connects to PLS regression. In neuroscience it aligns brain signals with behaviour.

What are the canonical correlations ρ_i equal to algebraically?

After 'whitening' both variable sets (multiplying by Σ_{XX}^{-1/2} and Σ_{YY}^{-1/2}), the covariance becomes a correlation, and its SVD yields ρ_i as singular values. Equivalent to solving the generalized eigenvalue problem Σ_{XX}^{-1}Σ_{XY}Σ_{YY}^{-1}Σ_{YX} a = ρ² a.

Multivariate analysis and adjacent methods

Multivariate methods bridge classical statistics and machine learning.

Logistic regression — Alternative to LDA with different assumptions
ML classification — QDA and LDA as Bayes classifiers
PCA and SVD — Shared mathematical machinery

Итоги

Multivariate normal: density set by μ and Σ; contours are ellipsoids aligned with eigenvectors of Σ
Mahalanobis distance D²_M ~ χ²_p is the standard tool for multivariate outliers
MANOVA: Λ = |W|/|B+W|; Pillai/Hotelling/Roy traces are alternatives to Wilks under different alternatives
LDA: common Σ → linear boundary δ_k(x) = xᵀΣ⁻¹μ_k - ½μ_kᵀΣ⁻¹μ_k + log π_k
QDA: per-class Σ_k → quadratic boundary; RDA interpolates between LDA and QDA via α
CCA: ρ_i = singular values of Σ_{XX}^{-1/2}Σ_{XY}Σ_{YY}^{-1/2}; Bartlett's test on residual canonical correlations

Связанные уроки

la-13-eigenvectors

Multivariate normal distribution

Mardia's test of multivariate normality uses generalized skewness b₁,p and kurtosis b₂,p. When normality fails, apply Box-Cox transforms or use rank-based multivariate methods.

Why use Mahalanobis distance rather than Euclidean distance to detect multivariate outliers?

MANOVA: comparing mean vectors

Why is MANOVA fundamentally better than running p separate ANOVAs to test equality of group means on p responses?

Linear and quadratic discriminant analysis

Regularized discriminant analysis (RDA) interpolates between LDA and QDA via Σ̂_k(α) = α·Σ̂_k + (1-α)·Σ̂_pooled, selecting α by cross-validation.

When should LDA be preferred over QDA in practice?

Canonical correlation

CCA underpins multi-view learning (text+image), DCCA (deep CCA via neural networks), and connects to PLS regression. In neuroscience it aligns brain signals with behaviour.

What are the canonical correlations ρ_i equal to algebraically?

Итоги

Multivariate normal: density set by μ and Σ; contours are ellipsoids aligned with eigenvectors of Σ

Mahalanobis distance D²_M ~ χ²_p is the standard tool for multivariate outliers

MANOVA: Λ = |W|/|B+W|; Pillai/Hotelling/Roy traces are alternatives to Wilks under different alternatives

LDA: common Σ → linear boundary δ_k(x) = xᵀΣ⁻¹μ_k - ½μ_kᵀΣ⁻¹μ_k + log π_k

QDA: per-class Σ_k → quadratic boundary; RDA interpolates between LDA and QDA via α

CCA: ρ_i = singular values of Σ_{XX}^{-1/2}Σ_{XY}Σ_{YY}^{-1/2}; Bartlett's test on residual canonical correlations