Numerical Methods

Direct Methods: LU, Cholesky, QR

numpy.linalg.solve, sklearn.LinearRegression, scipy.optimize-all use LU, Cholesky, or QR decompositions internally. Choosing the right factorization determines accuracy, speed, and applicability for a given matrix structure.

  • **numpy.linalg.solve:** uses LAPACK's dgesv-LU with partial pivoting; O(n³/3) for dense matrices
  • **sklearn LinearRegression:** solved via QR (LAPACK's dgels)-more stable than normal equations
  • **Kernel ML methods (SVM, Gaussian Processes):** kernel matrix K is SPD → Cholesky for solving Kα = y

Предварительные знания

  • Systems of Nonlinear Equations

LU Decomposition

LU decomposition factors a matrix A into the product of a lower-triangular L and an upper-triangular U: PA = LU, where P is a permutation matrix (partial pivoting). The main motivation: decompose once, then solve for many right-hand sides b at O(n²) cost each instead of O(n³).

**PA = LU (partial pivoting):** 1. Decomposition: O(n³/3)-done once 2. Forward substitution Ly = Pb: O(n²) 3. Back substitution Ux = y: O(n²) 4. New right-hand side b: O(n²)-no re-factoring! **Partial pivoting:** permutation P places the largest element in each column on the diagonal → stability, |lᵢⱼ| ≤ 1. In NumPy: `scipy.linalg.lu_factor` + `lu_solve`, or `np.linalg.solve` (LU internally).

LU suits **dense matrices** of general form. For sparse matrices, LU can create fill-in-zero entries become nonzero. Use `scipy.sparse.linalg.spsolve` with fill-reducing orderings (AMD, nested dissection) for sparse systems.

What is the purpose of the permutation matrix P in the decomposition PA = LU?

Cholesky Decomposition

For a **symmetric positive-definite** (SPD) matrix A = Aᵀ with positive eigenvalues, the Cholesky decomposition gives A = LLᵀ where L is lower-triangular with positive diagonal. It is twice as fast as LU and uses half the memory.

**Cholesky: A = LLᵀ** - Applicable ONLY to SPD matrices (A = Aᵀ, all eigenvalues > 0) - Cost: O(n³/6)-twice as fast as LU (O(n³/3)) - Memory: n(n+1)/2 elements (lower triangle only) - No pivoting needed: Cholesky is always stable for SPD - SPD check: if Cholesky fails-the matrix is NOT SPD `numpy.linalg.cholesky(A)` or `scipy.linalg.cholesky(A)`.

PropertyLUCholesky
ApplicabilityAny non-singular matrixSPD only
CostO(n³/3)O(n³/6)
Memoryn² elementsn(n+1)/2
PivotingRequired for stabilityNot needed
ML use caseGeneral linear systemsNormal equations, kernel methods

Matrix X ∈ ℝ¹⁰⁰ˣ²⁰ holds feature data. Is the Gram matrix G = XᵀX SPD?

QR Decomposition

QR decomposition writes A = QR where Q is orthogonal (QᵀQ = I) and R is upper-triangular. Two main algorithms: **Gram-Schmidt** (conceptually simple, numerically unstable) and **Householder reflections** (stable, standard). QR is the key to the least-squares problem.

**Least squares via QR:** Minimize ||Ax − b||² for A ∈ ℝᵐˣⁿ, m > n: 1. A = QR (thin QR: Q ∈ ℝᵐˣⁿ, R ∈ ℝⁿˣⁿ) 2. x* = R⁻¹ Qᵀ b (or solve Rx = Qᵀb) **Advantage over normal equations** AᵀAx = Aᵀb: - Condition number: κ(AᵀA) = κ(A)²-QR is more stable! - QR: O(mn²), normal equations: O(mn² + n³)-comparable cost

QR decomposition is also the foundation of the **Francis QR algorithm** for eigenvalue computation (nm-09). The idea: repeatedly apply QR factorization to A, and A converges to upper-triangular (Schur) form whose diagonal entries are the eigenvalues.

Why is QR decomposition preferred over normal equations AᵀAx = Aᵀb for linear regression?

Key Ideas

  • **LU (PA = LU):** any non-singular matrix; O(n³/3); partial pivoting for stability; reuse for multiple right-hand sides
  • **Cholesky (A = LLᵀ):** SPD only; twice as fast as LU; no pivoting; failure signals non-SPD
  • **QR (A = QR):** least-squares problems; more stable than normal equations because κ(AᵀA) = κ(A)²; Householder > Gram-Schmidt
  • **Choice rule:** LU for general, Cholesky for SPD (kernel ML), QR for overdetermined least squares

Related Topics

Direct methods are the foundation for iterative solvers and spectral problems:

  • Iterative Methods for Linear Systems — Iterative methods (CG, GMRES) are used when O(n³) direct methods are too expensive for large sparse matrices
  • Numerical Eigenvalue Computation — The QR algorithm for eigenvalues repeatedly applies QR factorization

Вопросы для размышления

  • Why store LU factorization rather than the inverse A⁻¹ when solving Ax = b repeatedly for different b?
  • How can it be proved that XᵀX + λI (ridge regression regularizer) is always SPD for any λ > 0?
  • Why do GLMs (generalized linear models) typically use QR rather than Cholesky for the weighted least-squares subproblem?

Связанные уроки

  • la-21-lu
Direct Methods: LU, Cholesky, QR

0

1

Sign In