Scientific Computing

Systems of Linear Equations

Цели урока

  • Understand LU decomposition and its advantage for multiple right-hand sides
  • Know when Cholesky is faster than LU and why it is stable without pivoting
  • Explain the difference between Jacobi, Gauss-Seidel, and CG
  • Understand sparse matrix formats and the memory savings they provide

Предварительные знания

  • NumPy and Linear Algebra

2003. Boeing runs a virtual crash test on a new wing in a simulated wind tunnel. The FEM model: 2 million equations. Engineers crank through 400 such systems per day, sweeping load parameters. Without LU decomposition and sparse matrices that would take years. With them, hours. The same linear algebra drives Google's PageRank, climate models, and GPU-accelerated fluid simulation in game engines.

  • **FEM in engineering** - computing structural stresses reduces to solving Ax=b with a sparse stiffness matrix involving millions of equations
  • **Computer graphics** - cloth, fluid, and smoke simulations solve sparse linear systems every frame
  • **Google PageRank** - ranking billions of web pages through iterative solution of a sparse link-matrix system

From Cholesky to sparse matrices

1910: Cholesky builds the method for geodetic calculations and is killed in WWI in 1918. 1952: Hestenes and Stiefel independently discover Conjugate Gradient. 1970s: CSR and CSC formats arrive and unlock industrial-scale FEM. One unbroken chain - a military surveyor in 1910 to aerospace simulation today.

LU Decomposition

Boeing's aerodynamics simulator stress-tests a wing. The FEM model is a 50,000 x 50,000 stiffness matrix - one Ax = b solve per flight configuration. Goal: factor A once, then knock out hundreds of load vectors in seconds. That is exactly what **LU decomposition** delivers: A = LU, with L lower-triangular and U upper-triangular.

**PA = LU** - the row-pivoted variant. P is a permutation matrix that locks in numerical stability. Skip pivoting and a tiny diagonal entry will divide rounding errors straight into catastrophe.

LU factorization costs $O(2n^3/3)$. The whole win is reuse: factor A once, then dispatch each new right-hand side b in $O(n^2)$. PyTorch's L-BFGS optimizer rides the same trick - one Hessian approximation, hundreds of right-hand sides per step.

OperationComplexityWhen needed
LU decompositionO(2n^3/3)Once per matrix A
Forward substitutionO(n^2)For each new b
Back substitutionO(n^2)For each new b
Full solveO(2n^3/3)When there is only one b

What is the main advantage of LU decomposition over calling solve directly, for multiple right-hand sides?

Cholesky Decomposition

Covariance matrices in statistics. Stiffness matrices in FEM. Kernel matrices in ML. All symmetric positive definite (SPD). For SPD matrices there is a method **twice as fast** as LU and using half the memory: **Cholesky** $A = LL^T$.

A **positive definite** matrix has $x^T A x > 0$ for every nonzero x, with strictly positive eigenvalues. Gaussian processes lean on Cholesky every day: sampling from a multivariate normal with covariance K is one `torch.linalg.cholesky` call at $O(n^3/3)$.

PropertyLUCholesky
Requirement on ASquare, nonsingularSymmetric, positive definite
FormulaPA = LUA = LL^T
ComplexityO(2n^3/3)O(n^3/3)
Numerical stabilityRequires pivotingStable without pivoting
MemoryTwo matrices L, UOne matrix L

Cholesky is twice as fast as LU and uses half the memory. No pivoting required - SPD structure provides stability for free. In PyTorch, `torch.linalg.cholesky` powers Gaussian-process posteriors and multivariate-normal sampling in one shot.

Andre-Louis Cholesky (1875-1918)

Cholesky, a French military geodesist, hammered out the method in 1910 for the linear systems that fall out of land surveys. He was killed in World War I in 1918. A colleague published the algorithm in 1924, six years after the author's death.

For which type of matrix is Cholesky decomposition guaranteed to work?

Iterative Methods

Picture a $10^6 \times 10^6$ matrix. LU needs $10^{18}$ operations - at 10 TFLOPS that is 100,000 seconds. Not happening. Real-world matrices (FEM, CFD, graphs) are **sparse**: 99.99% of entries are zero. **Iterative methods** start from a guess and refine it, touching only the nonzero entries at each pass.

**Jacobi** - updates every component in parallel using last iteration's values. Drop straight onto a GPU. **Gauss-Seidel** - updates one at a time, immediately reusing fresh values. Roughly 2x faster than Jacobi in practice, but the dependency chain kills easy parallelism.

**Conjugate Gradient (CG)** is the heavyweight champion for SPD matrices. In exact arithmetic it converges in at most n steps; with a preconditioner it nails a great approximation in $O(\sqrt{\kappa(A)})$ steps, where $\kappa$ is the condition number.

MethodMatrix typeConvergence rateParallelism
JacobiDiagonally dominantSlowHigh (GPU-friendly)
Gauss-SeidelDiagonally dominant~2x faster than JacobiLow
CGSPDO(sqrt(cond(A))) iterationsModerate
GMRESAny nonsingularDepends on spectrumModerate

Why are iterative methods used for a $10^6 \times 10^6$ matrix instead of LU?

Sparse Matrices

A $10^6 \times 10^6$ matrix in dense float64 weighs 8 TB. A tridiagonal matrix of the same size carries only 3 million nonzero entries - 3 out of $10^{12}$. Storing just the nonzero entries is the **sparse** format, and it cuts memory by hundreds of thousands of times.

**CSR** (Compressed Sparse Row) holds values, column indices, and row pointers. Fast for matrix-vector products (Ax) and row-wise reads. **CSC** (Compressed Sparse Column) is the column-major twin, fast for triangular solves. Google's PageRank rides a 10-billion-page transition matrix stored in CSR.

FormatStorageFast operationTypical use
CSRBy rowsA @ x, row slicingIterative solvers
CSCBy columnsColumn slicing, triangular solveDirect solvers (UMFPACK)
COOTriples (i, j, val)Fast constructionAssembling matrix from elements
LILLists of listsFast insertionIncremental construction

Best practice: assemble in COO or LIL (insertion is cheap), then convert to CSR before computing. Never call `.toarray()` on a large sparse matrix - SciPy will dutifully try to materialize 8 TB of zeros and crash the process.

numpy.linalg.solve is suitable for any system of linear equations

np.linalg.solve is a direct O(n^3) method for dense matrices. For systems of size 10^6 and above, iterative methods (CG, GMRES) combined with sparse formats (scipy.sparse) are the only practical option.

A 10^6 x 10^6 dense matrix occupies 8 TB; LU decomposition requires 10^18 operations. Iterative methods on sparse matrices use O(nnz) memory and converge in hundreds of iterations.

A tridiagonal $10^6 \times 10^6$ matrix has ~3 million nonzero entries. How much memory does CSR save versus dense storage?

Key Ideas

  • **LU decomposition** PA=LU - universal direct method at O(n^3); advantageous for many right-hand sides
  • **Cholesky** A=LL^T - twice as fast as LU for SPD matrices, no pivoting needed
  • **Iterative methods** (Jacobi, Gauss-Seidel, CG) - the only option for n > 10^5
  • **Sparse matrices** (CSR/CSC) store only nonzero entries, saving hundreds of thousands of times in memory

Related Topics

Systems of linear equations are at the core of numerical methods:

  • NumPy and Linear Algebra — np.linalg.solve is the direct method; here we examine what happens under the hood
  • Introduction to Scientific Computing — Linear systems arise from model discretization (FEM, finite difference schemes)
  • Linear Programming — Simplex method performs LU decomposition at each pivot step

Вопросы для размышления

  • Why does Cholesky not need pivoting while LU does? What property of SPD matrices guarantees stability?
  • Designing a solver for a FEM mesh with 10^9 nodes - what combination of methods would be the right choice?
  • Why must the spectral radius of the iteration matrix be less than 1 for convergence?

Связанные уроки

  • sci-02 — NumPy linear algebra is the foundation - no LU without it
  • sci-01 — Model discretization produces Ax=b systems
  • sci-04 — Eigenvalue computation builds on linear solvers
  • nm-01 — Numerical methods go deeper: conditioning, norms, error bounds
  • ml-13 — SVM via QP relies on LU or CG under the hood
  • opt-03 — LP solvers execute LU steps at each simplex pivot
  • dm-01 — Graph adjacency matrices are sparse structures of the same type
  • nm-03
Systems of Linear Equations

0

1

Sign In