Linear Algebra
Eigenvectors: the axes a matrix never rotates
Google PageRank, PCA for dimensionality reduction, stability analysis of a quadrotor, energy levels of a hydrogen atom - all are problems of finding eigenvectors of a matrix. An eigenvector is a direction the matrix never rotates, only stretches. This is the most important concept in applied linear algebra.
- Google PageRank: page importance = dominant eigenvector of the transition matrix
- PCA: principal components = eigenvectors of the covariance matrix
- Quantum mechanics: atomic states = eigenvectors of the Hamiltonian operator
- Structural engineering: natural frequencies of a bridge = eigenvalues of the stiffness matrix
- Recommender systems: matrix factorization via EVD/SVD yields latent factors
Eigenvectors: the axes a matrix never rotates
**Google processes 8.5 billion searches a day.** At the heart of PageRank sits a transition matrix between web pages, and its ~dominant eigenvector~{The eigenvector corresponding to the largest |λ|} is literally the ranking score of every page. PCA - the go-to tool for dimensionality reduction across all of ML - works the same way: it finds eigenvectors of the data covariance matrix. Eigenfaces for face recognition, spectral graph clustering, quantum measurement outcomes - all one idea: **there exist special directions along which a matrix only stretches, never rotates**. Those directions are eigenvectors.
Every square matrix has a **skeleton** - a handful of directions it leaves alone, only stretching them. Find those directions and the matrix opens up: what it amplifies, what it shrinks, how it behaves after a million applications. This is not abstract. PCA, PageRank, spectral clustering, quantum state evolution, the stability of every recurrent network - all of them live on this one observation.
What is the key idea of the concept 'Eigenvectors: the axes a matrix never rotates'?
Check that the concept material has been understood.
The Definition
The Definition
Multiplying by a matrix typically rotates vectors. But every well-behaved matrix has special directions it leaves alone - it only scales them. A vector v is called an ~eigenvector~{From German "eigen" = own, characteristic} of matrix A if:
EIGENVECTOR v (v != 0): A · v = lambda · v where lambda is the EIGENVALUE What happens under multiplication: lambda > 1 -> stretching along v 0 < lambda < 1 -> compression along v lambda < 0 -> flip 180 degrees + scaling lambda = 0 -> vector is annihilated (lands in the null space) lambda = 1 -> vector is completely unchanged EXAMPLE - projection onto the X-axis: A = [[1, 0], [0, 0]] e1 = (1, 0): A·e1 = (1, 0) = 1·e1 -> lambda = 1 e2 = (0, 1): A·e2 = (0, 0) = 0·e2 -> lambda = 0
Eigenvectors are the resting axes of a transformation. The rest of the world spins - they just grow or shrink in place.
What is the key idea of the concept 'The Definition'?
Check that the concept material has been understood.
Finding Eigenvalues: the Characteristic Equation
Finding Eigenvalues: the Characteristic Equation
The equation A·v = λ·v looks nonlinear because λ is unknown. The trick: rewrite it as a homogeneous linear system and demand a nontrivial solution.
A·v = lambda·v A·v - lambda·v = 0 (A - lambda·I)·v = 0 The system (A - lambda·I)·v = 0 has a nontrivial solution v != 0 <=> the matrix (A - lambda·I) is singular <=> det(A - lambda·I) = 0 CHARACTERISTIC EQUATION: det(A - lambda·I) = 0 For an n×n matrix this is a degree-n polynomial in lambda. For 2×2: lambda^2 - trace(A)·lambda + det(A) = 0 where trace(A) = a11 + a22 (sum of diagonal entries) VIETA'S RELATIONS: lambda1 + lambda2 + ... + lambdaN = trace(A) lambda1 * lambda2 * ... * lambdaN = det(A)
STEP 1. Characteristic equation: det(A - lambda·I) = det|4-lambda 2 | = 0 |1 3-lambda| (4-lambda)(3-lambda) - 2·1 = 0 lambda^2 - 7·lambda + 10 = 0 (lambda - 5)(lambda - 2) = 0 lambda1 = 5, lambda2 = 2 Check: 5 + 2 = 7 = trace(A) = 4 + 3 = 7 OK 5 * 2 = 10 = det(A) = 4*3 - 2*1 = 10 OK STEP 2. Find eigenvectors. FOR lambda1 = 5: (A - 5I)·v = 0 |-1 2| · |x| = |0| | 1 -2| |y| |0| -x + 2y = 0 -> x = 2y v1 = (2, 1) (or any scalar multiple) FOR lambda2 = 2: (A - 2I)·v = 0 | 2 2| · |x| = |0| | 1 1| |y| |0| 2x + 2y = 0 -> x = -y v2 = (1, -1) (or any scalar multiple)
What is the key idea of the concept 'Finding Eigenvalues: the Characteristic Equation'?
Check that the concept material has been understood.
PCA: eigenvectors of the covariance matrix
PCA: eigenvectors of the covariance matrix
**PCA (Principal Component Analysis)** is the first thing applied to any high-dimensional dataset. Reduce 512 features to 2 for visualization, remove multicollinearity before regression, compress data without discarding structure - all PCA. Its mathematical core is eigendecomposition.
**Why it works**: the covariance matrix is symmetric, so its eigenvectors are guaranteed to be orthogonal and real (spectral theorem). The eigenvector with the largest eigenvalue points in the direction of maximum variance in the data.
What is the key idea of the concept 'PCA: eigenvectors of the covariance matrix'?
Check that the concept material has been understood.
PageRank: the dominant eigenvector
PageRank: the dominant eigenvector
The web is a graph. PageRank turns it into a transition matrix P, where P[i][j] is the probability of jumping from page j to page i. **Page rankings are the stationary distribution of this Markov chain** - a vector pi satisfying P·pi = pi = 1·pi. That is exactly an eigenvector with eigenvalue 1.
What is the key idea of the concept 'PageRank: the dominant eigenvector'?
Check that the concept material has been understood.
Spectral clustering and eigenfaces
Spectral clustering and eigenfaces
**Spectral clustering** splits a graph into communities using eigenvectors of the graph Laplacian L = D - W (where W is the weight matrix, D is the degree matrix). The second-smallest eigenvector of L (the Fiedler vector) cuts the graph in two - this is the optimal balanced cut. Used in social network community detection, image segmentation, and protein interaction networks.
**Eigenfaces** (Turk & Pentland, 1991) was the dominant face recognition method before deep learning. The covariance matrix of 10,000 face images (each 100x100 = 10,000 pixels) is 10,000x10,000. Its top 100-200 eigenvectors - «eigenfaces» - capture over 95% of variation across all faces. A face is then just a vector of weights on these eigenfaces. **This approach powered airport surveillance and passport verification systems throughout the 1990s and 2000s.**
What is the key idea of the concept 'Spectral clustering and eigenfaces'?
Check that the concept material has been understood.
Special cases and traps
Special cases and traps
| Matrix | Eigenvalues | Key fact |
|---|---|---|
| Identity I | All = 1 | Every nonzero vector is an eigenvector |
| Diagonal diag(d1, d2, ...) | d1, d2, ... | Standard basis vectors are eigenvectors |
| Symmetric A = A^T | All real | Eigenvectors are orthogonal (spectral theorem) |
| Rotation by angle != 0, 180 | Complex | No real eigenvectors exist |
| Projection (P^2 = P) | Only 0 and 1 | lambda=1 for the projection plane, lambda=0 for the null space |
| Singular (det = 0) | One lambda = 0 | det = 0 iff at least one eigenvalue is zero |
A rotation matrix by a non-trivial angle has no real eigenvectors - no direction is left invariant. The characteristic equation yields complex roots. This is why rotations in 3D are better described by quaternions or Rodrigues' formula rather than eigendecomposition.
What is the key idea of the concept 'Special cases and traps'?
Check that the concept material has been understood.
Practice: PCA on real data
Practice: PCA on real data
Interview questions
The sum of eigenvalues equals trace(A) and their product equals det(A). Where does this come from?
- det(A - lambda·I) expands to: lambda^2 - trace(A)·lambda + det(A) = 0 - Vieta's formulas give: lambda1 + lambda2 = trace(A), lambda1*lambda2 = det(A) - This generalizes: sum of all eigenvalues = trace, product = det - Corollary: det = 0 iff some eigenvalue is zero iff the matrix is singular
Why does PCA use eigenvectors of the covariance matrix specifically?
- Covariance is symmetric (Cov = Cov^T), so eigenvectors are guaranteed orthogonal - The eigenvector with the largest eigenvalue is the direction of maximum variance - Each successive principal component is perpendicular to all previous ones - lambda_i / sum(lambda) = fraction of total variance explained by component i
If A has eigenvalues lambda1 and lambda2, what are the eigenvalues of A^2?
- A^2·v = A·(A·v) = A·(lambda·v) = lambda·(A·v) = lambda^2·v - Eigenvalues of A^2 are the squares of eigenvalues of A - The eigenvectors are unchanged - Generalization: A^n has eigenvalues lambda^n - this is why matrix powers are easy via diagonalization
What is the key idea of the concept 'Practice: PCA on real data'?
Check that the concept material has been understood.
Key Ideas
- **A·v = lambda·v**: eigenvectors keep their direction, only their length changes by factor lambda
- **det(A - lambda·I) = 0**: the characteristic equation - a degree-n polynomial in lambda
- **trace(A) = sum of lambda, det(A) = product of lambda**: quick sanity check for calculations
- **PCA**: eigenvectors of the covariance matrix = principal components (directions of maximum variance)
- **PageRank**: page scores = dominant eigenvector of the stochastic transition matrix
- **Symmetric matrices**: eigenvalues always real, eigenvectors always orthogonal (spectral theorem)
Related Topics
Eigenvectors are the key to understanding what any matrix really does
- Diagonalization — A = P·D·P^-1, where P holds eigenvectors and D = diag(lambda1,...,lambdaN)
- SVD — Generalization of eigendecomposition to rectangular matrices; the math behind LoRA fine-tuning
- Vector Spaces — The eigenspace for lambda is a subspace - the null space of (A - lambda·I)