Linear Algebra
Matrix Decompositions: QR, LU, and Cholesky
numpy, scipy, and LAPACK never store inverse matrices - they store decompositions. LU for solving systems, QR for least squares, Cholesky for covariance matrices. Choosing the right decomposition for the task is the difference between numerically stable code and code that silently loses precision.
- scipy.linalg.solve: internally uses LU with partial pivoting via LAPACK dgesv
- Linear regression: QR instead of (XᵀX)⁻¹ halves the condition number - numerically safer
- Kalman filter: Cholesky decomposition of the covariance matrix ensures SPD is preserved
- Bayesian optimization: Cholesky for Gaussian process covariance in GPyTorch
- Randomized SVD: sketching + QR + EVD - processes matrices that do not fit in RAM
Comparing Decompositions
The choice of decomposition depends on the matrix structure and the task. General matrix → LU. Rectangular matrix (least squares) → QR. SPD → Cholesky (2× faster than LU). Sparse → specialized factorizations (CHOLMOD, SuperLU). Low-rank approximation → SVD or randomized SVD.
Computational Considerations
In practice, not only mathematical properties matter but also computational efficiency. BLAS (Basic Linear Algebra Subprograms) provides a standard interface for matrix operations: GEMM (matrix multiplication) achieves 90%+ of theoretical peak on GPU. LAPACK is built on top of BLAS and implements all standard decompositions.
When a matrix is sparse (most entries are zero), dense factorizations are wasteful. Sparse variants are used: CHOLMOD for SPD, UMFPACK (SuperLU) for general matrices. scipy.sparse.linalg provides an interface to these libraries.
QR Decomposition
**LAPACK (used in NumPy, MATLAB, Julia) implements QR decomposition in O(mn²) , for a 10,000×1,000 matrix that is 10¹⁰ operations, done in seconds on GPU.** Matrix decompositions are not just mathematical curiosities: they determine the computational complexity and numerical stability of algorithms. QR solves least-squares problems, LU solves linear systems, and Cholesky accelerates work with symmetric positive-definite matrices in optimization and Kalman filtering.
QR Decomposition
QR decomposition represents an m×n matrix A as the product of an orthogonal matrix Q (m×m) and an upper-triangular matrix R (m×n). The columns of Q form an orthonormal basis. This decomposition is the foundation of the QR algorithm for eigenvalues and the method of least squares.
Why is QR preferred over normal equations for least squares?
Normal equations AᵀAx = Aᵀb use a matrix with condition number cond(AᵀA) = cond(A)². QR operates directly on A, preserving cond(A). For ill-conditioned problems this is the difference between an accurate solution and numerical noise.
LU Decomposition
LU Decomposition
LU decomposition represents a matrix as the product of a lower-triangular (L) and upper-triangular (U) matrix. With partial pivoting: PA = LU, where P is a permutation matrix. This is the most efficient method for solving systems when the right-hand side changes: O(n³) for factorization, O(n²) for each new right-hand side.
Why is the permutation matrix P needed in PA = LU?
P swaps rows so the largest-magnitude entry in each column ends up on the diagonal. Without it, dividing by a tiny pivot loses accuracy catastrophically. np.linalg.lu_factor returns exactly PLU.
Cholesky Decomposition
Cholesky Decomposition
For symmetric positive-definite (SPD) matrices, the Cholesky decomposition exists: A = LLᵀ. It is twice as fast as LU (works only with the lower triangle) and numerically more stable. SPD matrices appear everywhere: covariance matrices in statistics, the Hessian in convex optimization, Kalman filter update equations, and kernel matrices in Gaussian processes.
Applications: **QR** , least squares, QR iterations for eigenvalues, feature orthogonalization; **LU** , direct system solving, determinant computation (det = ∏uᵢᵢ), matrix inversion; **Cholesky** , Kalman filter (covariance update), Gaussian processes (GP regression), sampling from multivariate normal, second-order convex optimization.
Why does Cholesky require symmetric positive-definite matrices?
The formula l_jj = sqrt(a_jj - sum(l_jk² for k<j)) needs a positive argument. That is equivalent to positive-definiteness. On a non-SPD matrix the algorithm yields complex numbers or NaN.