Numerical Methods
Direct Methods: LU, Cholesky, QR
numpy.linalg.solve, sklearn.LinearRegression, scipy.optimize-all use LU, Cholesky, or QR decompositions internally. Choosing the right factorization determines accuracy, speed, and applicability for a given matrix structure.
- **numpy.linalg.solve:** uses LAPACK's dgesv-LU with partial pivoting; O(n³/3) for dense matrices
- **sklearn LinearRegression:** solved via QR (LAPACK's dgels)-more stable than normal equations
- **Kernel ML methods (SVM, Gaussian Processes):** kernel matrix K is SPD → Cholesky for solving Kα = y
Предварительные знания
LU Decomposition
LU decomposition factors a matrix A into the product of a lower-triangular L and an upper-triangular U: PA = LU, where P is a permutation matrix (partial pivoting). The main motivation: decompose once, then solve for many right-hand sides b at O(n²) cost each instead of O(n³).
**PA = LU (partial pivoting):** 1. Decomposition: O(n³/3)-done once 2. Forward substitution Ly = Pb: O(n²) 3. Back substitution Ux = y: O(n²) 4. New right-hand side b: O(n²)-no re-factoring! **Partial pivoting:** permutation P places the largest element in each column on the diagonal → stability, |lᵢⱼ| ≤ 1. In NumPy: `scipy.linalg.lu_factor` + `lu_solve`, or `np.linalg.solve` (LU internally).
LU suits **dense matrices** of general form. For sparse matrices, LU can create fill-in-zero entries become nonzero. Use `scipy.sparse.linalg.spsolve` with fill-reducing orderings (AMD, nested dissection) for sparse systems.
What is the purpose of the permutation matrix P in the decomposition PA = LU?
Cholesky Decomposition
For a **symmetric positive-definite** (SPD) matrix A = Aᵀ with positive eigenvalues, the Cholesky decomposition gives A = LLᵀ where L is lower-triangular with positive diagonal. It is twice as fast as LU and uses half the memory.
**Cholesky: A = LLᵀ** - Applicable ONLY to SPD matrices (A = Aᵀ, all eigenvalues > 0) - Cost: O(n³/6)-twice as fast as LU (O(n³/3)) - Memory: n(n+1)/2 elements (lower triangle only) - No pivoting needed: Cholesky is always stable for SPD - SPD check: if Cholesky fails-the matrix is NOT SPD `numpy.linalg.cholesky(A)` or `scipy.linalg.cholesky(A)`.
| Property | LU | Cholesky |
|---|---|---|
| Applicability | Any non-singular matrix | SPD only |
| Cost | O(n³/3) | O(n³/6) |
| Memory | n² elements | n(n+1)/2 |
| Pivoting | Required for stability | Not needed |
| ML use case | General linear systems | Normal equations, kernel methods |
Matrix X ∈ ℝ¹⁰⁰ˣ²⁰ holds feature data. Is the Gram matrix G = XᵀX SPD?
QR Decomposition
QR decomposition writes A = QR where Q is orthogonal (QᵀQ = I) and R is upper-triangular. Two main algorithms: **Gram-Schmidt** (conceptually simple, numerically unstable) and **Householder reflections** (stable, standard). QR is the key to the least-squares problem.
**Least squares via QR:** Minimize ||Ax − b||² for A ∈ ℝᵐˣⁿ, m > n: 1. A = QR (thin QR: Q ∈ ℝᵐˣⁿ, R ∈ ℝⁿˣⁿ) 2. x* = R⁻¹ Qᵀ b (or solve Rx = Qᵀb) **Advantage over normal equations** AᵀAx = Aᵀb: - Condition number: κ(AᵀA) = κ(A)²-QR is more stable! - QR: O(mn²), normal equations: O(mn² + n³)-comparable cost
QR decomposition is also the foundation of the **Francis QR algorithm** for eigenvalue computation (nm-09). The idea: repeatedly apply QR factorization to A, and A converges to upper-triangular (Schur) form whose diagonal entries are the eigenvalues.
Why is QR decomposition preferred over normal equations AᵀAx = Aᵀb for linear regression?
Key Ideas
- **LU (PA = LU):** any non-singular matrix; O(n³/3); partial pivoting for stability; reuse for multiple right-hand sides
- **Cholesky (A = LLᵀ):** SPD only; twice as fast as LU; no pivoting; failure signals non-SPD
- **QR (A = QR):** least-squares problems; more stable than normal equations because κ(AᵀA) = κ(A)²; Householder > Gram-Schmidt
- **Choice rule:** LU for general, Cholesky for SPD (kernel ML), QR for overdetermined least squares
Related Topics
Direct methods are the foundation for iterative solvers and spectral problems:
- Iterative Methods for Linear Systems — Iterative methods (CG, GMRES) are used when O(n³) direct methods are too expensive for large sparse matrices
- Numerical Eigenvalue Computation — The QR algorithm for eigenvalues repeatedly applies QR factorization
Вопросы для размышления
- Why store LU factorization rather than the inverse A⁻¹ when solving Ax = b repeatedly for different b?
- How can it be proved that XᵀX + λI (ridge regression regularizer) is always SPD for any λ > 0?
- Why do GLMs (generalized linear models) typically use QR rather than Cholesky for the weighted least-squares subproblem?