Numerical Methods

Iterative Methods for Linear Systems

A graph neural network can have a million nodes. The Laplacian is a 10⁶ × 10⁶ matrix. LU decomposition would require petabytes. The conjugate gradient method solves the same system in minutes on a single GPU.

**Graph Neural Networks:** sparse Laplacian systems solved by CG; scipy.sparse.linalg.cg used in spectral clustering
**PyTorch sparse:** torch.sparse.mm for sparse matrices; CG and GMRES integrated in physics-informed neural solvers
**FEM solvers:** preconditioned CG (PCG) is the standard for elliptic PDEs; AMG preconditioner gives O(n) complexity

Предварительные знания

Direct Methods: LU, Cholesky, QR

Jacobi and Gauss-Seidel

Iterative methods are an alternative to direct solvers for large sparse systems where LU is too expensive. The idea: split A = M − N so that Mx = Nxₙ + b is easy to solve. Jacobi updates all variables simultaneously; Gauss-Seidel immediately uses updated values.

**Jacobi iteration:** xᵢ(ₙ₊₁) = (bᵢ − Σⱼ≠ᵢ aᵢⱼ xⱼ(ₙ)) / aᵢᵢ Matrix form: xₙ₊₁ = D⁻¹(b − (L + U)xₙ), D = diag(A) **Gauss-Seidel** uses updated values immediately-faster but not parallelizable. **Convergence condition:** spectral radius ρ(B) < 1. For diagonally dominant matrices convergence is guaranteed.

Jacobi and Gauss-Seidel converge slowly for ill-conditioned matrices. For production use, prefer conjugate gradient or GMRES.

Jacobi iteration converges for Ax = b when:

Conjugate Gradient Method

The conjugate gradient (CG) method is the optimal iterative solver for SPD matrices. It finds the exact solution in at most n steps in exact arithmetic, but converges much sooner in practice. Each search direction is A-conjugate to all previous ones.

**CG algorithm (Hestenes-Stiefel, 1952):** Initialize: r₀ = b − Ax₀, p₀ = r₀ For k = 0, 1, 2, ...: αₖ = rₖᵀrₖ / (pₖᵀApₖ) xₖ₊₁ = xₖ + αₖpₖ rₖ₊₁ = rₖ − αₖApₖ βₖ = rₖ₊₁ᵀrₖ₊₁ / rₖᵀrₖ pₖ₊₁ = rₖ₊₁ + βₖpₖ **Convergence:** ||eₖ||_A ≤ 2·((√κ−1)/(√κ+1))ᵏ·||e₀||_A, κ = κ(A).

CG converges in O(√κ) iterations, each costing O(nnz). For sparse n×n matrices with O(n) nonzeros: O(n√κ) versus O(n³/3) for LU. At κ ≈ 10⁴ and n = 10⁶ the gap is enormous.

The conjugate gradient method applies to Ax = b when:

GMRES and Preconditioning

GMRES (Generalized Minimal RESidual) is the Krylov method for non-symmetric systems. It builds an Arnoldi basis in Krylov space {b, Ab, A²b, ...} and finds x minimizing ||Ax − b||. Applicable to any square system.

**Preconditioning:** instead of Ax = b, solve M⁻¹Ax = M⁻¹b Goal: κ(M⁻¹A) << κ(A) Popular preconditioners: - **Jacobi (D⁻¹):** simple, not always effective - **ILU:** stores only elements in the sparsity pattern - **AMG (algebraic multigrid):** powerful for PDE, O(n) complexity - **ICC (incomplete Cholesky):** for SPD matrices

Method	Matrix type	Convergence	Use case
Jacobi	Any (ρ(B)<1)	Linear	Parallel computing
Gauss-Seidel	Diag. dominant	Linear, faster	SOR relaxation
CG	SPD	O(√κ) iterations	FEM, ML kernel systems
GMRES	Any square	O(n) iterations	CFD, non-symmetric
BiCGSTAB	Any square	Fast in practice	When GMRES uses too much memory

Why is preconditioning used in iterative methods?

Key Ideas

**Jacobi / Gauss-Seidel:** linear convergence when ρ(B) < 1; Gauss-Seidel ≈ 2× faster than Jacobi
**CG:** O(√κ) iterations for SPD; one matrix-vector product per step; optimal in the Krylov class
**GMRES:** for non-symmetric systems; builds Arnoldi basis; use GMRES(m) with restart to bound memory
**Preconditioning:** target κ(M⁻¹A) << κ(A); ILU, Jacobi, AMG are popular choices

Вопросы для размышления

Why does CG find the exact solution in exactly n steps theoretically, yet converges much faster in practice?
How is the restart parameter m chosen in GMRES(m)? What is the memory-speed trade-off?
Why are iterative methods often preferred over direct solvers for GNN and kernel SVM problems?

Связанные уроки

la-06-gauss

Jacobi and Gauss-Seidel

Jacobi and Gauss-Seidel converge slowly for ill-conditioned matrices. For production use, prefer conjugate gradient or GMRES.

Jacobi iteration converges for Ax = b when:

Conjugate Gradient Method

CG converges in O(√κ) iterations, each costing O(nnz). For sparse n×n matrices with O(n) nonzeros: O(n√κ) versus O(n³/3) for LU. At κ ≈ 10⁴ and n = 10⁶ the gap is enormous.

The conjugate gradient method applies to Ax = b when:

GMRES and Preconditioning

Method

Matrix type

Convergence

Use case

Jacobi

Any (ρ(B)<1)

Linear

Parallel computing

Gauss-Seidel

Diag. dominant

Linear, faster

SOR relaxation

SPD

O(√κ) iterations

FEM, ML kernel systems

GMRES

Any square

O(n) iterations

CFD, non-symmetric

BiCGSTAB

Any square

Fast in practice

When GMRES uses too much memory

Why is preconditioning used in iterative methods?

Key Ideas

**Jacobi / Gauss-Seidel:** linear convergence when ρ(B) < 1; Gauss-Seidel ≈ 2× faster than Jacobi

**CG:** O(√κ) iterations for SPD; one matrix-vector product per step; optimal in the Krylov class

**GMRES:** for non-symmetric systems; builds Arnoldi basis; use GMRES(m) with restart to bound memory

**Preconditioning:** target κ(M⁻¹A) << κ(A); ILU, Jacobi, AMG are popular choices

Iterative Methods for Linear Systems

Предварительные знания

Jacobi and Gauss-Seidel

Conjugate Gradient Method

GMRES and Preconditioning

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки

Iterative Methods for Linear Systems

Предварительные знания

Jacobi and Gauss-Seidel

Conjugate Gradient Method

GMRES and Preconditioning

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки