Linear Algebra

Eigenvectors: the axes a matrix never rotates

Google PageRank, PCA for dimensionality reduction, stability analysis of a quadrotor, energy levels of a hydrogen atom - all are problems of finding eigenvectors of a matrix. An eigenvector is a direction the matrix never rotates, only stretches. This is the most important concept in applied linear algebra.

Google PageRank: page importance = dominant eigenvector of the transition matrix
PCA: principal components = eigenvectors of the covariance matrix
Quantum mechanics: atomic states = eigenvectors of the Hamiltonian operator
Structural engineering: natural frequencies of a bridge = eigenvalues of the stiffness matrix
Recommender systems: matrix factorization via EVD/SVD yields latent factors

Eigenvectors: the axes a matrix never rotates

**Google processes 8.5 billion searches a day.** At the heart of PageRank sits a transition matrix between web pages, and its ~dominant eigenvector~{The eigenvector corresponding to the largest |λ|} is literally the ranking score of every page. PCA - the go-to tool for dimensionality reduction across all of ML - works the same way: it finds eigenvectors of the data covariance matrix. Eigenfaces for face recognition, spectral graph clustering, quantum measurement outcomes - all one idea: **there exist special directions along which a matrix only stretches, never rotates**. Those directions are eigenvectors.

Every square matrix has a **skeleton** - a handful of directions it leaves alone, only stretching them. Find those directions and the matrix opens up: what it amplifies, what it shrinks, how it behaves after a million applications. This is not abstract. PCA, PageRank, spectral clustering, quantum state evolution, the stability of every recurrent network - all of them live on this one observation.

What is the key idea of the concept 'Eigenvectors: the axes a matrix never rotates'?

Check that the concept material has been understood.

The Definition

Multiplying by a matrix typically rotates vectors. But every well-behaved matrix has special directions it leaves alone - it only scales them. A vector v is called an ~eigenvector~{From German "eigen" = own, characteristic} of matrix A if:

EIGENVECTOR v (v != 0): A · v = lambda · v where lambda is the EIGENVALUE What happens under multiplication: lambda > 1 -> stretching along v 0 < lambda < 1 -> compression along v lambda < 0 -> flip 180 degrees + scaling lambda = 0 -> vector is annihilated (lands in the null space) lambda = 1 -> vector is completely unchanged EXAMPLE - projection onto the X-axis: A = [[1, 0], [0, 0]] e1 = (1, 0): A·e1 = (1, 0) = 1·e1 -> lambda = 1 e2 = (0, 1): A·e2 = (0, 0) = 0·e2 -> lambda = 0

Eigenvectors are the resting axes of a transformation. The rest of the world spins - they just grow or shrink in place.

What is the key idea of the concept 'The Definition'?

Check that the concept material has been understood.

Finding Eigenvalues: the Characteristic Equation

The equation A·v = λ·v looks nonlinear because λ is unknown. The trick: rewrite it as a homogeneous linear system and demand a nontrivial solution.

A·v = lambda·v A·v - lambda·v = 0 (A - lambda·I)·v = 0 The system (A - lambda·I)·v = 0 has a nontrivial solution v != 0 <=> the matrix (A - lambda·I) is singular <=> det(A - lambda·I) = 0 CHARACTERISTIC EQUATION: det(A - lambda·I) = 0 For an n×n matrix this is a degree-n polynomial in lambda. For 2×2: lambda^2 - trace(A)·lambda + det(A) = 0 where trace(A) = a11 + a22 (sum of diagonal entries) VIETA'S RELATIONS: lambda1 + lambda2 + ... + lambdaN = trace(A) lambda1 * lambda2 * ... * lambdaN = det(A)

STEP 1. Characteristic equation: det(A - lambda·I) = det|4-lambda 2 | = 0 |1 3-lambda| (4-lambda)(3-lambda) - 2·1 = 0 lambda^2 - 7·lambda + 10 = 0 (lambda - 5)(lambda - 2) = 0 lambda1 = 5, lambda2 = 2 Check: 5 + 2 = 7 = trace(A) = 4 + 3 = 7 OK 5 * 2 = 10 = det(A) = 4*3 - 2*1 = 10 OK STEP 2. Find eigenvectors. FOR lambda1 = 5: (A - 5I)·v = 0 |-1 2| · |x| = |0| | 1 -2| |y| |0| -x + 2y = 0 -> x = 2y v1 = (2, 1) (or any scalar multiple) FOR lambda2 = 2: (A - 2I)·v = 0 | 2 2| · |x| = |0| | 1 1| |y| |0| 2x + 2y = 0 -> x = -y v2 = (1, -1) (or any scalar multiple)

What is the key idea of the concept 'Finding Eigenvalues: the Characteristic Equation'?

Check that the concept material has been understood.

PCA: eigenvectors of the covariance matrix

**PCA (Principal Component Analysis)** is the first thing applied to any high-dimensional dataset. Reduce 512 features to 2 for visualization, remove multicollinearity before regression, compress data without discarding structure - all PCA. Its mathematical core is eigendecomposition.

**Why it works**: the covariance matrix is symmetric, so its eigenvectors are guaranteed to be orthogonal and real (spectral theorem). The eigenvector with the largest eigenvalue points in the direction of maximum variance in the data.

What is the key idea of the concept 'PCA: eigenvectors of the covariance matrix'?

Check that the concept material has been understood.

PageRank: the dominant eigenvector

The web is a graph. PageRank turns it into a transition matrix P, where P[i][j] is the probability of jumping from page j to page i. **Page rankings are the stationary distribution of this Markov chain** - a vector pi satisfying P·pi = pi = 1·pi. That is exactly an eigenvector with eigenvalue 1.

What is the key idea of the concept 'PageRank: the dominant eigenvector'?

Check that the concept material has been understood.

Spectral clustering and eigenfaces

**Spectral clustering** splits a graph into communities using eigenvectors of the graph Laplacian L = D - W (where W is the weight matrix, D is the degree matrix). The second-smallest eigenvector of L (the Fiedler vector) cuts the graph in two - this is the optimal balanced cut. Used in social network community detection, image segmentation, and protein interaction networks.

**Eigenfaces** (Turk & Pentland, 1991) was the dominant face recognition method before deep learning. The covariance matrix of 10,000 face images (each 100x100 = 10,000 pixels) is 10,000x10,000. Its top 100-200 eigenvectors - «eigenfaces» - capture over 95% of variation across all faces. A face is then just a vector of weights on these eigenfaces. **This approach powered airport surveillance and passport verification systems throughout the 1990s and 2000s.**

What is the key idea of the concept 'Spectral clustering and eigenfaces'?

Check that the concept material has been understood.

Special cases and traps

Matrix	Eigenvalues	Key fact
Identity I	All = 1	Every nonzero vector is an eigenvector
Diagonal diag(d1, d2, ...)	d1, d2, ...	Standard basis vectors are eigenvectors
Symmetric A = A^T	All real	Eigenvectors are orthogonal (spectral theorem)
Rotation by angle != 0, 180	Complex	No real eigenvectors exist
Projection (P^2 = P)	Only 0 and 1	lambda=1 for the projection plane, lambda=0 for the null space
Singular (det = 0)	One lambda = 0	det = 0 iff at least one eigenvalue is zero

A rotation matrix by a non-trivial angle has no real eigenvectors - no direction is left invariant. The characteristic equation yields complex roots. This is why rotations in 3D are better described by quaternions or Rodrigues' formula rather than eigendecomposition.

What is the key idea of the concept 'Special cases and traps'?

Check that the concept material has been understood.

Practice: PCA on real data

Interview questions

The sum of eigenvalues equals trace(A) and their product equals det(A). Where does this come from?

- det(A - lambda·I) expands to: lambda^2 - trace(A)·lambda + det(A) = 0 - Vieta's formulas give: lambda1 + lambda2 = trace(A), lambda1*lambda2 = det(A) - This generalizes: sum of all eigenvalues = trace, product = det - Corollary: det = 0 iff some eigenvalue is zero iff the matrix is singular

Why does PCA use eigenvectors of the covariance matrix specifically?

- Covariance is symmetric (Cov = Cov^T), so eigenvectors are guaranteed orthogonal - The eigenvector with the largest eigenvalue is the direction of maximum variance - Each successive principal component is perpendicular to all previous ones - lambda_i / sum(lambda) = fraction of total variance explained by component i

If A has eigenvalues lambda1 and lambda2, what are the eigenvalues of A^2?

- A^2·v = A·(A·v) = A·(lambda·v) = lambda·(A·v) = lambda^2·v - Eigenvalues of A^2 are the squares of eigenvalues of A - The eigenvectors are unchanged - Generalization: A^n has eigenvalues lambda^n - this is why matrix powers are easy via diagonalization

What is the key idea of the concept 'Practice: PCA on real data'?

Check that the concept material has been understood.

Key Ideas

**A·v = lambda·v**: eigenvectors keep their direction, only their length changes by factor lambda
**det(A - lambda·I) = 0**: the characteristic equation - a degree-n polynomial in lambda
**trace(A) = sum of lambda, det(A) = product of lambda**: quick sanity check for calculations
**PCA**: eigenvectors of the covariance matrix = principal components (directions of maximum variance)
**PageRank**: page scores = dominant eigenvector of the stochastic transition matrix
**Symmetric matrices**: eigenvalues always real, eigenvectors always orthogonal (spectral theorem)

Eigenvectors: the axes a matrix never rotates

Eigenvectors: the axes a matrix never rotates

The Definition

The Definition

Finding Eigenvalues: the Characteristic Equation

Finding Eigenvalues: the Characteristic Equation

PCA: eigenvectors of the covariance matrix

PCA: eigenvectors of the covariance matrix

PageRank: the dominant eigenvector

PageRank: the dominant eigenvector

Spectral clustering and eigenfaces

Spectral clustering and eigenfaces

Special cases and traps

Special cases and traps

Practice: PCA on real data

Practice: PCA on real data

Interview questions

Key Ideas

Related Topics

Связанные уроки