Linear Algebra

LU Decomposition: Gaussian Elimination, Unpacked

np.linalg.solve is called millions of times per day from Python. Behind every call is an LU factorization from LAPACK, tuned for CPU cache and vector instructions. Understanding LU means understanding what happens inside scipy, numpy, and every numerical solver.

scipy.linalg.solve: LAPACK dgesv implements PA=LU with partial pivoting
Kalman filter: every update step solves a linear system via LU
FEM: the stiffness matrix is factorized once, then solved repeatedly for different loads
Computer graphics: computing the inverse of a 4x4 matrix via LU in fixed time
Optimization: Newton direction (H⁻¹g) via solving Hd = g, never inverting H explicitly

**scipy.linalg.lu_factor solves a problem that makes raw Gaussian elimination crumble.** Consider: 1000 different right-hand sides for a single 500x500 matrix - say, different boundary conditions in a finite element simulation. Gaussian elimination: O(n^3) * 1000 operations. LU: O(n^3) once + O(n^2) * 1000 times. The difference is thousands-fold. LU decomposition is Gaussian elimination written in a reusable, unpacked form.

**What the lesson is really about**: A = LU is not mysterious matrix magic - it is just careful bookkeeping of elimination steps. L stores the multipliers (what got subtracted), U stores the result (what survived). That bookkeeping unlocks the main advantage: factor A once, then substitute any number of right-hand sides for O(n^2) each.

The Structure: What L and U Are

A = LU, where L is lower triangular (Lower) with ones on the diagonal, and U is upper triangular (Upper). Both are n x n.

Where L and U Come From: Step by Step

Gaussian elimination zeros out entries below the diagonal by subtracting multiples of pivot rows. LU simply **remembers those multiples** and stores them in L.

Solving a System: Two Triangular Steps

Once A = LU, the system Ax = b becomes two simple steps. Triangular systems are solved by substitution in O(n^2) - no further elimination needed.

**numpy.linalg.solve** does the same thing internally: it calls the LAPACK routine dgesv, which performs PLU factorization followed by two substitution steps. When solving many systems with the same matrix, lu_factor/lu_solve is explicitly more efficient.

PLU: Row Pivoting for Numerical Stability

Plain LU breaks down when the diagonal hits zero (division by zero). Even a small diagonal element causes numerical instability through rounding amplification. The fix: **partial pivoting** - swap rows so the largest element in the column becomes the pivot.

Cholesky: LU for Symmetric Positive Definite Matrices

When a matrix is symmetric and positive definite (SPD), a more efficient factorization exists: the Cholesky decomposition. This appears constantly in Gaussian Processes, Kalman Filters, and MCMC with covariance matrices.

**Sparse LU in finite element analysis**: physical simulations (FEM/CFD) produce sparse matrices - most entries are zero. Specialized sparse LU solvers (UMFPACK, SuperLU, PARDISO) exploit sparsity to store and process only nonzero entries. Matrices of size 10^6 x 10^6 become tractable on ordinary servers.

Complexity and Method Comparison

LU in Production Systems

From numpy to physical simulations Components: numpy.linalg.solve / scipy.linalg.lu_factor, Gaussian Process Regression (sklearn, GPyTorch), FEM / CFD simulations, Kalman Filter, Normal equation in regression

Practice: LU Decomposition

- Why is lu_factor/lu_solve better than calling np.linalg.solve 1000 times when the matrix A is fixed? - When should Cholesky be used instead of LU? What conditions are required? - What is partial pivoting and why is it needed? What happens without it?

LU Factorization and Forward Substitution

scipy.linalg.lu_factor solves the problem that naive Gaussian elimination misses: one factorization A = PLU computed once, applied thousands of times. LAPACK uses this in every scientific simulation -- from Boeing CFD codes to IBM quantum chemistry.

What is the main advantage of LU factorization when solving systems with the same matrix and multiple right-hand sides?

LU shines when solving many systems with the same A: one O(n^3) factorization followed by O(n^2) solves per right-hand side.

LU: Determinants and Matrix Inversion

NumPy computes the determinant of a 1000x1000 matrix in milliseconds -- using LU factorization internally. det(A) = det(P)*det(L)*det(U) = +/-1 * 1 * u_11*u_22*...*u_nn -- the product of U's diagonal elements with a sign from the permutation.

How is det(A) computed after the LU factorization PA = LU?

det(PA) = det(L)det(U) gives det(A) = det(P^{-1}) * 1 * prod(u_ii) = (-1)^s * prod(u_ii).

L = | 1 0 0 | U = | u11 u12 u13 | | l21 1 0 | | 0 u22 u23 | | l31 l32 1 | | 0 0 u33 | L: ones on diagonal, elimination multipliers below U: upper triangular - the result of Gaussian elimination Properties: det(L) = 1 (diagonal product = 1*1*...*1) det(A) = det(L) * det(U) = 1 * u11*u22*...*unn = product of diagonal elements of U

A = | 2 1 1 | | 4 3 3 | | 8 7 9 | Step 1: eliminate the first column l21 = 4/2 = 2: R2 = R2 - 2*R1 l31 = 8/2 = 4: R3 = R3 - 4*R1 -> | 2 1 1 | | 0 1 1 | | 0 3 5 | Step 2: eliminate the second column l32 = 3/1 = 3: R3 = R3 - 3*R2 -> U = | 2 1 1 | | 0 1 1 | | 0 0 2 | Multipliers go into L: L = | 1 0 0 | | 2 1 0 | | 4 3 1 | Verification: L * U = | 1*2 1*1 1*1 | | 2 1 1 | | 2*2+1*0 2*1+1*1 2*1+1*1 | = | 4 3 3 | = A ✓ | 4*2+3*0 4*1+3*1 4*1+3*1+2 | | 8 7 9 |

Ax = b LUx = b Let y = Ux, then Ly = b STEP 1 - forward substitution: solve Ly = b L is lower triangular -> solve top to bottom in O(n^2) y1 = b1 y2 = b2 - l21*y1 y3 = b3 - l31*y1 - l32*y2 STEP 2 - back substitution: solve Ux = y U is upper triangular -> solve bottom to top in O(n^2) x3 = y3 / u33 x2 = (y2 - u23*x3) / u22 x1 = (y1 - u12*x2 - u13*x3) / u11 Total: LU once at O(n^3) + each new b costs O(n^2)

PROBLEM without pivoting: A = | 0 1 | Step 1: divide by a11 = 0 | 2 3 | Division by zero - algorithm crashes. With partial pivoting: Find the row with max |element| in column 1: row 2 (|2| > |0|) Swap rows: P*A = | 2 3 | | 0 1 | Now continue normally. P*A = L*U, where P is a permutation matrix Partial pivoting guarantees: |l_ij| <= 1 for all i, j This means bounded error growth - numerical stability. det(A) = (-1)^s * u11 * u22 * ... * unn where s = number of row swaps

For SPD matrix A: A = L * L^T L is lower triangular with positive diagonal elements Advantages: Operations: n^3/3 (about half of standard LU) Memory: store only L (half of L + U) Stability: always stable without pivoting (if A is truly SPD) Where it appears in ML: - Gaussian Process: K = L*L^T, then L^{-1}*y for prediction - Multivariate Normal: sampling via L*z, z ~ N(0, I) - Kalman Filter: covariance matrices are always SPD - Bayesian optimization: covariance inversion at each step

Operation	Complexity	When to use
LU factorization	O(n^3)	Once, before solving multiple systems
Ly = b (forward substitution)	O(n^2)	For each new right-hand side
Ux = y (back substitution)	O(n^2)	For each new right-hand side
Cholesky	n^3/3	SPD matrix (covariances, Gram matrices)
det(A) after LU	O(n)	Product of U's diagonal
numpy.linalg.solve	O(n^3)	Single system, no repeated solves

Key Takeaways

**A = LU**: L stores elimination multipliers, U is the result - Gaussian elimination, unpacked
**Two steps**: Ly = b in O(n^2), Ux = y in O(n^2) - cheaper than repeating full elimination
**The main idea**: factor A once at O(n^3), then solve 1000 right-hand sides at O(n^2) each
**PA = LU**: partial pivoting ensures numerical stability; scipy and numpy always use it
**Cholesky A = LL^T**: twice as fast as LU, for SPD matrices (covariances, X^T X); no pivoting needed
**numpy.linalg.solve** calls LAPACK dgesv = PLU internally; for repeated solves, lu_factor/lu_solve explicitly

What Comes Next

LU is the foundation of numerical linear algebra

Matrix Rank — Rank is determined via the echelon form of Gaussian elimination - the same LU process
Inverse Matrix — The inverse is computed as n solves of Ax = e_i, each a pair of LU substitutions
SVD — A more powerful factorization: A = U*S*V^T, generalizes LU to all matrices including non-square
Conjugate Gradient Method — Iterative alternative to LU for large sparse systems where O(n^3) is out of reach

Связанные уроки

nm-02