Algebra

Eigenvalues and Eigenvectors

PCA, PageRank, spectral clustering, stability analysis of control systems - all are eigenvalue problems. In neural networks, the spectral radius of the weight matrix determines whether gradients explode or vanish. Eigendecomposition is the X-ray of a linear operator.

  • **PCA:** eigenvectors of the covariance matrix are principal components; eigenvalues are the variance explained along each axis
  • **Google PageRank:** dominant eigenvector of the transition matrix = page ranks; power iteration is the actual algorithm
  • **Spectral clustering:** graph Laplacian L = D − W; the k smallest nonzero eigenvalues reveal k clusters in the graph

Предварительные знания

  • Linear Maps

The Eigenvalue Equation

A nonzero vector v is an eigenvector of matrix A if Av = λv for some scalar λ (the eigenvalue). Geometrically: an eigenvector is a direction that A only stretches or compresses (or flips), never rotates. λ is the stretch factor - positive means same direction, negative means flip.

**Geometric meaning:** eigenvectors are the 'axes' of the transformation. Any vector can be expanded in eigenvectors, and each component simply scales by its eigenvalue. This is why eigendecomposition is so powerful for understanding matrix behavior.

If Av = 2v, what is A²v?

Characteristic Polynomial

Av = λv is equivalent to (A−λI)v = 0. A nonzero solution exists if and only if det(A−λI) = 0. This is the characteristic equation. For an n×n matrix, det(A−λI) is a degree-n polynomial in λ - its roots are the eigenvalues.

**Cayley-Hamilton theorem:** every matrix satisfies its own characteristic polynomial p(A) = 0. Consequence: Aⁿ can be expressed as a linear combination of A^{n-1}, …, A, I. This enables efficient computation of matrix powers without direct exponentiation.

The characteristic polynomial of a 3×3 matrix has degree:

Eigendecomposition and Symmetric Matrices

If A has n linearly independent eigenvectors v₁,…,vₙ with eigenvalues λ₁,…,λₙ, then A = VΛV⁻¹, where V has eigenvectors as columns and Λ = diag(λ₁,…,λₙ). For symmetric matrices (A = Aᵀ): all eigenvalues are real and eigenvectors are orthogonal → A = VΛVᵀ (spectral theorem).

Use np.linalg.eigh for symmetric matrices - it's twice as fast and returns real eigenvalues in ascending order. Use np.linalg.eig for general (possibly non-symmetric) matrices.

For a symmetric matrix A = Aᵀ, eigenvalues are guaranteed to be:

Power Iteration and Applications

Power iteration: start with a random v₀, then v_{k+1} = Av_k / ||Av_k||. When |λ₁| > |λ₂|, this converges to the eigenvector of the largest-magnitude eigenvalue λ₁. Convergence rate: |λ₂/λ₁|^k - faster when eigenvalues are well separated.

ApplicationWhat we seekMatrix
PCADirections of maximum varianceCovariance matrix
PageRankStationary distributionTransition matrix
Markov chainsEquilibrium distributionStochastic matrix
Spectral clusteringCluster structure in graphGraph Laplacian D − W

Power iteration converges to the eigenvector corresponding to:

Key Ideas

  • **Av = λv:** eigenvector is an invariant direction; λ is the stretch factor; eigenvalues are roots of det(A−λI)=0
  • **Characteristic polynomial** has degree n for an n×n matrix; gives n eigenvalues in ℂ (counting multiplicity)
  • **A = VΛV⁻¹:** eigendecomposition; for symmetric A = VΛVᵀ with orthogonal V (spectral theorem)
  • **Power iteration** converges to the dominant eigenvector in O(n) per step; foundation of PageRank and RNN gradient analysis

Related Topics

Eigenvalues connect all of linear algebra:

  • SVD — Singular values = sqrt of eigenvalues of AᵀA; SVD generalizes eigendecomposition to rectangular matrices
  • Quadratic Forms — The signs of eigenvalues determine positive/negative definiteness of the Hessian
  • Linear Algebra in ML — Spectral analysis of weight matrices, gradient flow in RNNs

Вопросы для размышления

  • In RNNs, if the spectral radius ρ(W) > 1, gradients explode; if < 1, they vanish. How does this follow directly from eigendecomposition and matrix powers?
  • PCA via eigendecomposition and via SVD give the same result. Why is SVD numerically preferred for large datasets?
  • The graph Laplacian has n eigenvalues with the smallest always 0. What does the multiplicity of λ=0 tell you about the graph's connectivity?

Связанные уроки

  • la-01-vectors-intro
Eigenvalues and Eigenvectors

0

1

Sign In