Algebra
Eigenvalues and Eigenvectors
PCA, PageRank, spectral clustering, stability analysis of control systems - all are eigenvalue problems. In neural networks, the spectral radius of the weight matrix determines whether gradients explode or vanish. Eigendecomposition is the X-ray of a linear operator.
- **PCA:** eigenvectors of the covariance matrix are principal components; eigenvalues are the variance explained along each axis
- **Google PageRank:** dominant eigenvector of the transition matrix = page ranks; power iteration is the actual algorithm
- **Spectral clustering:** graph Laplacian L = D − W; the k smallest nonzero eigenvalues reveal k clusters in the graph
Предварительные знания
The Eigenvalue Equation
A nonzero vector v is an eigenvector of matrix A if Av = λv for some scalar λ (the eigenvalue). Geometrically: an eigenvector is a direction that A only stretches or compresses (or flips), never rotates. λ is the stretch factor - positive means same direction, negative means flip.
**Geometric meaning:** eigenvectors are the 'axes' of the transformation. Any vector can be expanded in eigenvectors, and each component simply scales by its eigenvalue. This is why eigendecomposition is so powerful for understanding matrix behavior.
If Av = 2v, what is A²v?
Characteristic Polynomial
Av = λv is equivalent to (A−λI)v = 0. A nonzero solution exists if and only if det(A−λI) = 0. This is the characteristic equation. For an n×n matrix, det(A−λI) is a degree-n polynomial in λ - its roots are the eigenvalues.
**Cayley-Hamilton theorem:** every matrix satisfies its own characteristic polynomial p(A) = 0. Consequence: Aⁿ can be expressed as a linear combination of A^{n-1}, …, A, I. This enables efficient computation of matrix powers without direct exponentiation.
The characteristic polynomial of a 3×3 matrix has degree:
Eigendecomposition and Symmetric Matrices
If A has n linearly independent eigenvectors v₁,…,vₙ with eigenvalues λ₁,…,λₙ, then A = VΛV⁻¹, where V has eigenvectors as columns and Λ = diag(λ₁,…,λₙ). For symmetric matrices (A = Aᵀ): all eigenvalues are real and eigenvectors are orthogonal → A = VΛVᵀ (spectral theorem).
Use np.linalg.eigh for symmetric matrices - it's twice as fast and returns real eigenvalues in ascending order. Use np.linalg.eig for general (possibly non-symmetric) matrices.
For a symmetric matrix A = Aᵀ, eigenvalues are guaranteed to be:
Power Iteration and Applications
Power iteration: start with a random v₀, then v_{k+1} = Av_k / ||Av_k||. When |λ₁| > |λ₂|, this converges to the eigenvector of the largest-magnitude eigenvalue λ₁. Convergence rate: |λ₂/λ₁|^k - faster when eigenvalues are well separated.
| Application | What we seek | Matrix |
|---|---|---|
| PCA | Directions of maximum variance | Covariance matrix |
| PageRank | Stationary distribution | Transition matrix |
| Markov chains | Equilibrium distribution | Stochastic matrix |
| Spectral clustering | Cluster structure in graph | Graph Laplacian D − W |
Power iteration converges to the eigenvector corresponding to:
Key Ideas
- **Av = λv:** eigenvector is an invariant direction; λ is the stretch factor; eigenvalues are roots of det(A−λI)=0
- **Characteristic polynomial** has degree n for an n×n matrix; gives n eigenvalues in ℂ (counting multiplicity)
- **A = VΛV⁻¹:** eigendecomposition; for symmetric A = VΛVᵀ with orthogonal V (spectral theorem)
- **Power iteration** converges to the dominant eigenvector in O(n) per step; foundation of PageRank and RNN gradient analysis
Related Topics
Eigenvalues connect all of linear algebra:
- SVD — Singular values = sqrt of eigenvalues of AᵀA; SVD generalizes eigendecomposition to rectangular matrices
- Quadratic Forms — The signs of eigenvalues determine positive/negative definiteness of the Hessian
- Linear Algebra in ML — Spectral analysis of weight matrices, gradient flow in RNNs
Вопросы для размышления
- In RNNs, if the spectral radius ρ(W) > 1, gradients explode; if < 1, they vanish. How does this follow directly from eigendecomposition and matrix powers?
- PCA via eigendecomposition and via SVD give the same result. Why is SVD numerically preferred for large datasets?
- The graph Laplacian has n eigenvalues with the smallest always 0. What does the multiplicity of λ=0 tell you about the graph's connectivity?