Numerical Methods

Direct Methods: LU, Cholesky, QR

numpy.linalg.solve, sklearn.LinearRegression, scipy.optimize-all use LU, Cholesky, or QR decompositions internally. Choosing the right factorization determines accuracy, speed, and applicability for a given matrix structure.

**numpy.linalg.solve:** uses LAPACK's dgesv-LU with partial pivoting; O(n³/3) for dense matrices
**sklearn LinearRegression:** solved via QR (LAPACK's dgels)-more stable than normal equations
**Kernel ML methods (SVM, Gaussian Processes):** kernel matrix K is SPD → Cholesky for solving Kα = y

Предварительные знания

Systems of Nonlinear Equations

LU Decomposition

LU decomposition factors a matrix A into the product of a lower-triangular L and an upper-triangular U: PA = LU, where P is a permutation matrix (partial pivoting). The main motivation: decompose once, then solve for many right-hand sides b at O(n²) cost each instead of O(n³).

**PA = LU (partial pivoting):** 1. Decomposition: O(n³/3)-done once 2. Forward substitution Ly = Pb: O(n²) 3. Back substitution Ux = y: O(n²) 4. New right-hand side b: O(n²)-no re-factoring! **Partial pivoting:** permutation P places the largest element in each column on the diagonal → stability, |lᵢⱼ| ≤ 1. In NumPy: `scipy.linalg.lu_factor` + `lu_solve`, or `np.linalg.solve` (LU internally).

LU suits **dense matrices** of general form. For sparse matrices, LU can create fill-in-zero entries become nonzero. Use `scipy.sparse.linalg.spsolve` with fill-reducing orderings (AMD, nested dissection) for sparse systems.

What is the purpose of the permutation matrix P in the decomposition PA = LU?

Cholesky Decomposition

For a **symmetric positive-definite** (SPD) matrix A = Aᵀ with positive eigenvalues, the Cholesky decomposition gives A = LLᵀ where L is lower-triangular with positive diagonal. It is twice as fast as LU and uses half the memory.

**Cholesky: A = LLᵀ** - Applicable ONLY to SPD matrices (A = Aᵀ, all eigenvalues > 0) - Cost: O(n³/6)-twice as fast as LU (O(n³/3)) - Memory: n(n+1)/2 elements (lower triangle only) - No pivoting needed: Cholesky is always stable for SPD - SPD check: if Cholesky fails-the matrix is NOT SPD `numpy.linalg.cholesky(A)` or `scipy.linalg.cholesky(A)`.

Property	LU	Cholesky
Applicability	Any non-singular matrix	SPD only
Cost	O(n³/3)	O(n³/6)
Memory	n² elements	n(n+1)/2
Pivoting	Required for stability	Not needed
ML use case	General linear systems	Normal equations, kernel methods

Matrix X ∈ ℝ¹⁰⁰ˣ²⁰ holds feature data. Is the Gram matrix G = XᵀX SPD?

QR Decomposition

QR decomposition writes A = QR where Q is orthogonal (QᵀQ = I) and R is upper-triangular. Two main algorithms: **Gram-Schmidt** (conceptually simple, numerically unstable) and **Householder reflections** (stable, standard). QR is the key to the least-squares problem.

**Least squares via QR:** Minimize ||Ax − b||² for A ∈ ℝᵐˣⁿ, m > n: 1. A = QR (thin QR: Q ∈ ℝᵐˣⁿ, R ∈ ℝⁿˣⁿ) 2. x* = R⁻¹ Qᵀ b (or solve Rx = Qᵀb) **Advantage over normal equations** AᵀAx = Aᵀb: - Condition number: κ(AᵀA) = κ(A)²-QR is more stable! - QR: O(mn²), normal equations: O(mn² + n³)-comparable cost

QR decomposition is also the foundation of the **Francis QR algorithm** for eigenvalue computation (nm-09). The idea: repeatedly apply QR factorization to A, and A converges to upper-triangular (Schur) form whose diagonal entries are the eigenvalues.

Why is QR decomposition preferred over normal equations AᵀAx = Aᵀb for linear regression?

Key Ideas

**LU (PA = LU):** any non-singular matrix; O(n³/3); partial pivoting for stability; reuse for multiple right-hand sides
**Cholesky (A = LLᵀ):** SPD only; twice as fast as LU; no pivoting; failure signals non-SPD
**QR (A = QR):** least-squares problems; more stable than normal equations because κ(AᵀA) = κ(A)²; Householder > Gram-Schmidt
**Choice rule:** LU for general, Cholesky for SPD (kernel ML), QR for overdetermined least squares

Вопросы для размышления

Why store LU factorization rather than the inverse A⁻¹ when solving Ax = b repeatedly for different b?
How can it be proved that XᵀX + λI (ridge regression regularizer) is always SPD for any λ > 0?
Why do GLMs (generalized linear models) typically use QR rather than Cholesky for the weighted least-squares subproblem?

Связанные уроки

la-21-lu

LU Decomposition

What is the purpose of the permutation matrix P in the decomposition PA = LU?

Cholesky Decomposition

Property

Cholesky

Applicability

Any non-singular matrix

SPD only

Cost

O(n³/3)

O(n³/6)

Memory

n² elements

n(n+1)/2

Pivoting

Required for stability

Not needed

ML use case

General linear systems

Normal equations, kernel methods

Matrix X ∈ ℝ¹⁰⁰ˣ²⁰ holds feature data. Is the Gram matrix G = XᵀX SPD?

QR Decomposition

Why is QR decomposition preferred over normal equations AᵀAx = Aᵀb for linear regression?

Key Ideas

**LU (PA = LU):** any non-singular matrix; O(n³/3); partial pivoting for stability; reuse for multiple right-hand sides

**Cholesky (A = LLᵀ):** SPD only; twice as fast as LU; no pivoting; failure signals non-SPD

**QR (A = QR):** least-squares problems; more stable than normal equations because κ(AᵀA) = κ(A)²; Householder > Gram-Schmidt

**Choice rule:** LU for general, Cholesky for SPD (kernel ML), QR for overdetermined least squares

Direct Methods: LU, Cholesky, QR

Предварительные знания

LU Decomposition

Cholesky Decomposition

QR Decomposition

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки

Direct Methods: LU, Cholesky, QR

Предварительные знания

LU Decomposition

Cholesky Decomposition

QR Decomposition

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки