Statistics

Factor Analysis: Latent Variables

'What is intelligence, really?' - in 1904 Charles Spearman applied factor analysis to test scores and discovered the g-factor. Since then FA has revealed the structure of personality (Big Five), consumer preferences, and portfolio risk. FA is an X-ray of hidden reality.

Psychometrics: the Big Five personality traits (Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism) - the result of FA on thousands of adjectives
Marketing: hidden buyer motivations behind questionnaire responses
Genomics: haplogroups as latent factors of SNP markers
Finance: factor models of returns (Fama-French 5-factor model)
Neuroscience: independent components of fMRI signals (ICA)

Предварительные знания

Principal Component Analysis (PCA)

Latent Variables: What Lies Behind the Data

Word2Vec, BERT embeddings, and PCA are factor analysis under different names. OpenAI's ada-002 embeds 300M+ texts using 1536 latent factors. FA model: X = LF + ε, where L is the loading matrix, F are latent factors (N(0,1)), and ε is unique noise per variable. PCA compresses; FA models the causal structure.

Aspect	PCA	Factor Analysis
Goal	Compress variance	Model latent structure
Model	X = PC (deterministic)	X = LF + ε (probabilistic)
Uniqueness	None (all explained by PCs)	Present (ε - specific noise)
Interpretability	Components = mathematical constructs	Factors = meaningful concepts
Rotation	Optional	Key tool for interpretation
Application	Dimensionality reduction	Psychometrics, surveys, genomics

**History:** Factor analysis was developed by psychologist Charles Spearman in 1904 to study intelligence. The g-factor (general intelligence) is the first and most famous application of FA. Today FA is used in psychometrics, marketing (hidden buyer motivations), neuroscience, and genomics.

A researcher wants to understand which latent personality traits underlie responses to 50 questionnaire items. Which method is more appropriate?

Factor Loadings and Uniqueness

A **factor loading** is the correlation between an observed variable and a latent factor. A high loading (|l| > 0.5) means the variable is a strong indicator of that factor. **Communality** h² is the fraction of the variable's variance explained by all factors: h² = Σl². **Uniqueness** ψ = 1 − h² is the part of variance specific to that variable.

**FA prerequisites:** 1. KMO > 0.6 (sampling adequacy) 2. Bartlett's test significant (p < 0.05) - non-trivial correlations exist 3. n ≥ 5 × p (at least 5 observations per variable, ideally 10×) 4. metric data (or polytomous ordinal). For binary data - ordinal FA or IRT.

The factor loading of 'Anxiety' on F1 = 0.82, on F2 = 0.12. Communality = 0.69. What does this mean?

Rotations: Varimax and Promax

The initial FA solution is not unique - any rotation of the factor space fits equally well. **Rotation** turns the axes to maximise interpretability. **Varimax** (orthogonal) - factors remain uncorrelated; loadings are polarised (pushed toward 0 or 1). **Promax** (oblique) - factors may correlate, which is more realistic for psychological constructs.

**Which rotation to choose?** Varimax (orthogonal): when one assume factors are independent (e.g., speed and accuracy are distinct abilities). Promax (oblique): when factors may correlate (anxiety and depression are related). In practice: start with Varimax; if the model is hard to interpret, switch to Promax and inspect the factor correlation matrix.

After Varimax rotation one see: 'Vocabulary', 'Reading', 'Grammar' load highly on F1; 'Matrices', 'Figure Rotation', 'Spatial Relations' load highly on F2. How do one name the factors?

Key Ideas

FA model: X = LF + ε (observed = loadings × factors + uniqueness)
PCA compresses variance; FA finds latent variables generating correlations
Loading = correlation between a variable and a factor
Communality h² = explained variance fraction; uniqueness = 1 − h²
Varimax (orthogonal rotation): polarises loadings for interpretability
Promax (oblique): for correlated factors (more realistic in social sciences)
Kaiser criterion (λ > 1) and scree plot - for choosing the number of factors

FA and Related Methods

FA is related to PCA (both reduce dimensionality), ICA (independent components - a non-linear extension), SEM (structural equation modelling), and LDA (latent semantic analysis of text).

PCA — FA is a probabilistic extension of PCA with a latent structure model
Bayesian Statistics — Bayesian FA allows priors on loadings and the number of factors

Вопросы для размышления

Why does rotation not change model fit (log-likelihood) but improves interpretability?
Take a public psychological dataset (IPIP personality, Big Five). Apply FA with 5 factors and Varimax. Do the five personality traits reproduce?
Which is better for analysing a questionnaire: FA or PCA? When will the results agree, and when will they diverge?

Связанные уроки

la-15-svd

Latent Variables: What Lies Behind the Data

Aspect

PCA

Factor Analysis

Goal

Compress variance

Model latent structure

Model

X = PC (deterministic)

X = LF + ε (probabilistic)

Uniqueness

None (all explained by PCs)

Present (ε - specific noise)

Interpretability

Components = mathematical constructs

Factors = meaningful concepts

Rotation

Optional

Key tool for interpretation

Application

Dimensionality reduction

Psychometrics, surveys, genomics

A researcher wants to understand which latent personality traits underlie responses to 50 questionnaire items. Which method is more appropriate?

Factor Loadings and Uniqueness

The factor loading of 'Anxiety' on F1 = 0.82, on F2 = 0.12. Communality = 0.69. What does this mean?

Rotations: Varimax and Promax

After Varimax rotation one see: 'Vocabulary', 'Reading', 'Grammar' load highly on F1; 'Matrices', 'Figure Rotation', 'Spatial Relations' load highly on F2. How do one name the factors?

Key Ideas

FA model: X = LF + ε (observed = loadings × factors + uniqueness)

PCA compresses variance; FA finds latent variables generating correlations

Loading = correlation between a variable and a factor

Communality h² = explained variance fraction; uniqueness = 1 − h²

Varimax (orthogonal rotation): polarises loadings for interpretability

Promax (oblique): for correlated factors (more realistic in social sciences)

Kaiser criterion (λ > 1) and scree plot - for choosing the number of factors