Scientific Computing

Systems of Linear Equations

Цели урока

Understand LU decomposition and its advantage for multiple right-hand sides
Know when Cholesky is faster than LU and why it is stable without pivoting
Explain the difference between Jacobi, Gauss-Seidel, and CG
Understand sparse matrix formats and the memory savings they provide

Предварительные знания

NumPy and Linear Algebra

2003. Boeing runs a virtual crash test on a new wing in a simulated wind tunnel. The FEM model: 2 million equations. Engineers crank through 400 such systems per day, sweeping load parameters. Without LU decomposition and sparse matrices that would take years. With them, hours. The same linear algebra drives Google's PageRank, climate models, and GPU-accelerated fluid simulation in game engines.

**FEM in engineering** - computing structural stresses reduces to solving Ax=b with a sparse stiffness matrix involving millions of equations
**Computer graphics** - cloth, fluid, and smoke simulations solve sparse linear systems every frame
**Google PageRank** - ranking billions of web pages through iterative solution of a sparse link-matrix system

From Cholesky to sparse matrices

1910: Cholesky builds the method for geodetic calculations and is killed in WWI in 1918. 1952: Hestenes and Stiefel independently discover Conjugate Gradient. 1970s: CSR and CSC formats arrive and unlock industrial-scale FEM. One unbroken chain - a military surveyor in 1910 to aerospace simulation today.

LU Decomposition

Boeing's aerodynamics simulator stress-tests a wing. The FEM model is a 50,000 x 50,000 stiffness matrix - one Ax = b solve per flight configuration. Goal: factor A once, then knock out hundreds of load vectors in seconds. That is exactly what **LU decomposition** delivers: A = LU, with L lower-triangular and U upper-triangular.

**PA = LU** - the row-pivoted variant. P is a permutation matrix that locks in numerical stability. Skip pivoting and a tiny diagonal entry will divide rounding errors straight into catastrophe.

LU factorization costs $O(2n^3/3)$. The whole win is reuse: factor A once, then dispatch each new right-hand side b in $O(n^2)$. PyTorch's L-BFGS optimizer rides the same trick - one Hessian approximation, hundreds of right-hand sides per step.

Operation	Complexity	When needed
LU decomposition	O(2n^3/3)	Once per matrix A
Forward substitution	O(n^2)	For each new b
Back substitution	O(n^2)	For each new b
Full solve	O(2n^3/3)	When there is only one b

What is the main advantage of LU decomposition over calling solve directly, for multiple right-hand sides?

Cholesky Decomposition

Covariance matrices in statistics. Stiffness matrices in FEM. Kernel matrices in ML. All symmetric positive definite (SPD). For SPD matrices there is a method **twice as fast** as LU and using half the memory: **Cholesky** $A = LL^T$.

A **positive definite** matrix has $x^T A x > 0$ for every nonzero x, with strictly positive eigenvalues. Gaussian processes lean on Cholesky every day: sampling from a multivariate normal with covariance K is one `torch.linalg.cholesky` call at $O(n^3/3)$.

Property	LU	Cholesky
Requirement on A	Square, nonsingular	Symmetric, positive definite
Formula	PA = LU	A = LL^T
Complexity	O(2n^3/3)	O(n^3/3)
Numerical stability	Requires pivoting	Stable without pivoting
Memory	Two matrices L, U	One matrix L

Cholesky is twice as fast as LU and uses half the memory. No pivoting required - SPD structure provides stability for free. In PyTorch, `torch.linalg.cholesky` powers Gaussian-process posteriors and multivariate-normal sampling in one shot.

Andre-Louis Cholesky (1875-1918)

Cholesky, a French military geodesist, hammered out the method in 1910 for the linear systems that fall out of land surveys. He was killed in World War I in 1918. A colleague published the algorithm in 1924, six years after the author's death.

For which type of matrix is Cholesky decomposition guaranteed to work?

Iterative Methods

Picture a $10^6 \times 10^6$ matrix. LU needs $10^{18}$ operations - at 10 TFLOPS that is 100,000 seconds. Not happening. Real-world matrices (FEM, CFD, graphs) are **sparse**: 99.99% of entries are zero. **Iterative methods** start from a guess and refine it, touching only the nonzero entries at each pass.

**Jacobi** - updates every component in parallel using last iteration's values. Drop straight onto a GPU. **Gauss-Seidel** - updates one at a time, immediately reusing fresh values. Roughly 2x faster than Jacobi in practice, but the dependency chain kills easy parallelism.

**Conjugate Gradient (CG)** is the heavyweight champion for SPD matrices. In exact arithmetic it converges in at most n steps; with a preconditioner it nails a great approximation in $O(\sqrt{\kappa(A)})$ steps, where $\kappa$ is the condition number.

Method	Matrix type	Convergence rate	Parallelism
Jacobi	Diagonally dominant	Slow	High (GPU-friendly)
Gauss-Seidel	Diagonally dominant	~2x faster than Jacobi	Low
CG	SPD	O(sqrt(cond(A))) iterations	Moderate
GMRES	Any nonsingular	Depends on spectrum	Moderate

Why are iterative methods used for a $10^6 \times 10^6$ matrix instead of LU?

Sparse Matrices

A $10^6 \times 10^6$ matrix in dense float64 weighs 8 TB. A tridiagonal matrix of the same size carries only 3 million nonzero entries - 3 out of $10^{12}$. Storing just the nonzero entries is the **sparse** format, and it cuts memory by hundreds of thousands of times.

**CSR** (Compressed Sparse Row) holds values, column indices, and row pointers. Fast for matrix-vector products (Ax) and row-wise reads. **CSC** (Compressed Sparse Column) is the column-major twin, fast for triangular solves. Google's PageRank rides a 10-billion-page transition matrix stored in CSR.

Format	Storage	Fast operation	Typical use
CSR	By rows	A @ x, row slicing	Iterative solvers
CSC	By columns	Column slicing, triangular solve	Direct solvers (UMFPACK)
COO	Triples (i, j, val)	Fast construction	Assembling matrix from elements
LIL	Lists of lists	Fast insertion	Incremental construction

Best practice: assemble in COO or LIL (insertion is cheap), then convert to CSR before computing. Never call `.toarray()` on a large sparse matrix - SciPy will dutifully try to materialize 8 TB of zeros and crash the process.

numpy.linalg.solve is suitable for any system of linear equations

np.linalg.solve is a direct O(n^3) method for dense matrices. For systems of size 10^6 and above, iterative methods (CG, GMRES) combined with sparse formats (scipy.sparse) are the only practical option.

A 10^6 x 10^6 dense matrix occupies 8 TB; LU decomposition requires 10^18 operations. Iterative methods on sparse matrices use O(nnz) memory and converge in hundreds of iterations.

A tridiagonal $10^6 \times 10^6$ matrix has ~3 million nonzero entries. How much memory does CSR save versus dense storage?

Key Ideas

**LU decomposition** PA=LU - universal direct method at O(n^3); advantageous for many right-hand sides
**Cholesky** A=LL^T - twice as fast as LU for SPD matrices, no pivoting needed
**Iterative methods** (Jacobi, Gauss-Seidel, CG) - the only option for n > 10^5
**Sparse matrices** (CSR/CSC) store only nonzero entries, saving hundreds of thousands of times in memory

Вопросы для размышления

Why does Cholesky not need pivoting while LU does? What property of SPD matrices guarantees stability?
Designing a solver for a FEM mesh with 10^9 nodes - what combination of methods would be the right choice?
Why must the spectral radius of the iteration matrix be less than 1 for convergence?

Связанные уроки

sci-02 — NumPy linear algebra is the foundation - no LU without it
sci-01 — Model discretization produces Ax=b systems
sci-04 — Eigenvalue computation builds on linear solvers
nm-01 — Numerical methods go deeper: conditioning, norms, error bounds
ml-13 — SVM via QP relies on LU or CG under the hood
opt-03 — LP solvers execute LU steps at each simplex pivot
dm-01 — Graph adjacency matrices are sparse structures of the same type
nm-03

Scientific Computing

Systems of Linear Equations

Цели урока

Understand LU decomposition and its advantage for multiple right-hand sides
Know when Cholesky is faster than LU and why it is stable without pivoting
Explain the difference between Jacobi, Gauss-Seidel, and CG
Understand sparse matrix formats and the memory savings they provide

Предварительные знания

NumPy and Linear Algebra

**FEM in engineering** - computing structural stresses reduces to solving Ax=b with a sparse stiffness matrix involving millions of equations
**Computer graphics** - cloth, fluid, and smoke simulations solve sparse linear systems every frame
**Google PageRank** - ranking billions of web pages through iterative solution of a sparse link-matrix system

From Cholesky to sparse matrices

LU Decomposition

**PA = LU** - the row-pivoted variant. P is a permutation matrix that locks in numerical stability. Skip pivoting and a tiny diagonal entry will divide rounding errors straight into catastrophe.

Operation	Complexity	When needed
LU decomposition	O(2n^3/3)	Once per matrix A
Forward substitution	O(n^2)	For each new b
Back substitution	O(n^2)	For each new b
Full solve	O(2n^3/3)	When there is only one b

What is the main advantage of LU decomposition over calling solve directly, for multiple right-hand sides?

Cholesky Decomposition

Property	LU	Cholesky
Requirement on A	Square, nonsingular	Symmetric, positive definite
Formula	PA = LU	A = LL^T
Complexity	O(2n^3/3)	O(n^3/3)
Numerical stability	Requires pivoting	Stable without pivoting
Memory	Two matrices L, U	One matrix L

Andre-Louis Cholesky (1875-1918)

For which type of matrix is Cholesky decomposition guaranteed to work?

Iterative Methods

Method	Matrix type	Convergence rate	Parallelism
Jacobi	Diagonally dominant	Slow	High (GPU-friendly)
Gauss-Seidel	Diagonally dominant	~2x faster than Jacobi	Low
CG	SPD	O(sqrt(cond(A))) iterations	Moderate
GMRES	Any nonsingular	Depends on spectrum	Moderate

Why are iterative methods used for a $10^6 \times 10^6$ matrix instead of LU?

Sparse Matrices

Format	Storage	Fast operation	Typical use
CSR	By rows	A @ x, row slicing	Iterative solvers
CSC	By columns	Column slicing, triangular solve	Direct solvers (UMFPACK)
COO	Triples (i, j, val)	Fast construction	Assembling matrix from elements
LIL	Lists of lists	Fast insertion	Incremental construction

numpy.linalg.solve is suitable for any system of linear equations

A 10^6 x 10^6 dense matrix occupies 8 TB; LU decomposition requires 10^18 operations. Iterative methods on sparse matrices use O(nnz) memory and converge in hundreds of iterations.

A tridiagonal $10^6 \times 10^6$ matrix has ~3 million nonzero entries. How much memory does CSR save versus dense storage?

Key Ideas

**LU decomposition** PA=LU - universal direct method at O(n^3); advantageous for many right-hand sides
**Cholesky** A=LL^T - twice as fast as LU for SPD matrices, no pivoting needed
**Iterative methods** (Jacobi, Gauss-Seidel, CG) - the only option for n > 10^5
**Sparse matrices** (CSR/CSC) store only nonzero entries, saving hundreds of thousands of times in memory

Вопросы для размышления

Why does Cholesky not need pivoting while LU does? What property of SPD matrices guarantees stability?
Designing a solver for a FEM mesh with 10^9 nodes - what combination of methods would be the right choice?
Why must the spectral radius of the iteration matrix be less than 1 for convergence?

Связанные уроки

sci-02 — NumPy linear algebra is the foundation - no LU without it
sci-01 — Model discretization produces Ax=b systems
sci-04 — Eigenvalue computation builds on linear solvers
nm-01 — Numerical methods go deeper: conditioning, norms, error bounds
ml-13 — SVM via QP relies on LU or CG under the hood
opt-03 — LP solvers execute LU steps at each simplex pivot
dm-01 — Graph adjacency matrices are sparse structures of the same type
nm-03

Systems of Linear Equations

Цели урока

Предварительные знания

From Cholesky to sparse matrices

LU Decomposition

Cholesky Decomposition

Andre-Louis Cholesky (1875-1918)

Iterative Methods

Sparse Matrices

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки

Systems of Linear Equations

Цели урока

Предварительные знания

From Cholesky to sparse matrices

LU Decomposition

Cholesky Decomposition

Andre-Louis Cholesky (1875-1918)

Iterative Methods

Sparse Matrices

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки