Statistics

Independent Component Analysis (ICA)

How do you separate the voices of three people speaking at once, using only three microphone recordings of the room?

**Digital hearing aids:** real-time ICA isolates the conversation partner's voice from restaurant noise, improving speech intelligibility by 8-12 dB
**EEG in brain-computer interfaces:** ICA removes blink and heartbeat artefacts from 64-channel recordings, cleaning the signal for intent classification (BCI)
**Satellite communications:** separating signals of different transmitters reaching one receiving antenna (Multi-User Detection in CDMA)
**Financial analytics:** ICA finds 5-10 'independent risk factors' across 500 S&P stocks for portfolios with minimally correlated shocks

Предварительные знания

PCA and SVD
Linear algebra: orthogonal matrices, covariance
Concept of statistical independence

The goal of ICA is to recover N unknown independent sources s_j from M ≥ N observed mixtures x_i = Σ_j a_{ij} s_j. The method relies on a key mathematical fact: a mixture of independent non-Gaussian random variables is 'more Gaussian' than its components, by the central limit theorem. Hence searching for directions of maximum non-Gaussianity in the observation space is equivalent to recovering the original sources.

Standard ICA pipeline: (1) centre x ← x - E[x]; (2) whiten x_w = Σ^{-1/2}x (PCA stage); (3) find orthogonal W via FastICA / Infomax / JADE; (4) reconstruct s = W·x_w; (5) interpret components. Steps 1-2 are common to all ICA variants; differences appear only in step 3.

Extensions of basic ICA: kernel ICA for nonlinear mixing, frequency-domain ICA for convolutional mixtures with delays (typical in acoustics), nonnegative matrix factorization (NMF), a related method for positive sources (images, spectra).

Independent components vs PCA

ICA (Independent Component Analysis) seeks a decomposition of the observed vector x = As, where s are statistically independent sources and A is the mixing matrix. Unlike PCA, which searches for decorrelated directions of maximum variance, ICA demands true independence, strictly stronger than decorrelation for non-Gaussian distributions.

Why Gaussian s cannot be separated: the sum of independent Gaussians is again Gaussian with rotationally-symmetric distribution. The central limit theorem says the mixture Ax is 'more Gaussian' than the individual s_j. That motivates the ICA criterion: find directions where the projection is maximally non-Gaussian.

How does ICA differ from PCA in signal decomposition?

PCA diagonalises the covariance matrix: components are decorrelated (Cov(y_i, y_j) = 0) yet may still depend on each other. ICA demands p(s) = Π_j p_j(s_j), full statistical independence. For non-Gaussian distributions this is strictly stronger: e.g., points on a circle are decorrelated but not independent. ICA recovers the true x, y axes; PCA picks arbitrary orthogonal directions.

FastICA algorithm and contrast functions

FastICA (Hyvärinen, 1999) is the most popular ICA algorithm. Idea: after whitening, find vector w that maximises non-Gaussianity of w^T x. Non-Gaussianity measures: kurtosis, negentropy, approximation via a nonlinear function g. The algorithm is a fixed-point iteration with cubic convergence.

Contrast function choice: log cosh is universal; tanh is fast to compute; exp(-y²/2) is best for super-Gaussian (sounds, biosignals); y³ (cubic) suits sub-Gaussian (uniform sources).

Why does FastICA, after whitening, search for an orthogonal matrix W rather than an arbitrary one?

Whitening makes Cov(x_white) = I. If the true sources s have unit variance and are independent, Cov(s) = I, so W = A^{-1} (up to scaling) satisfies WW^T = I, an orthogonal matrix. Searching for W in SO(p) instead of GL(p) reduces parameters from p² to p(p-1)/2 and simplifies optimisation.

ICA applications: cocktail-party, EEG, finance

Cocktail-party problem: N microphones record N speakers, each microphone hears a mixture. ICA recovers the individual voices without knowing speaker positions. This is Blind Source Separation (BSS). It applies when: (a) sources are independent, (b) mixing is linear and instantaneous, (c) at most one source is Gaussian.

ICA identification ambiguities: (1) component order is arbitrary, no 'first principal component' like PCA; (2) sign and scale of each component are undefined (Ws and -Ws/2 indistinguishable); (3) permutation PA and DA are equivalent for any diagonal D. Acceptable when the final task is component clustering or visualisation.

In finance, ICA extracts hidden risk factors from asset returns. Each component is interpreted as an 'independent economic shock' (oil shock, rate shock) influencing the portfolio through individual loadings. An alternative to classical factor analysis with less restrictive Gaussianity assumptions.

Under what condition does ICA fail to separate sources s, even with an invertible mixing matrix A?

The sum of independent Gaussians is again Gaussian and rotation-invariant. If multiple s_j ~ N(0, σ²) are independent, any rotation Us yields independent Gaussians with the same covariance. ICA cannot tell these rotations apart and loses identifiability. At most one Gaussian component is allowed; the rest must be non-Gaussian.

ICA among other methods

ICA bridges classical statistics, signal processing, and machine learning.

PCA and factor analysis — Relaxation of the independence assumption
Digital signal processing — Blind source separation
Neural autoencoders — Nonlinear generalisations

Итоги

ICA finds a linear transform W such that the components s = Wx are statistically independent
Unlike PCA, ICA demands not only zero correlation but independence of higher-order moments
Identification is possible when at most one source is Gaussian (Comon's theorem)
FastICA: fixed-point iteration with contrast function log cosh / tanh / exp(-y²/2); cubic convergence
Standard pipeline: centre → whiten → find orthogonal W → reconstruct s
Applications: cocktail-party, EEG/MEG artefact removal, spectral unmixing, financial risk factors