Numerical Methods
Iterative Methods for Linear Systems
A graph neural network can have a million nodes. The Laplacian is a 10⁶ × 10⁶ matrix. LU decomposition would require petabytes. The conjugate gradient method solves the same system in minutes on a single GPU.
- **Graph Neural Networks:** sparse Laplacian systems solved by CG; scipy.sparse.linalg.cg used in spectral clustering
- **PyTorch sparse:** torch.sparse.mm for sparse matrices; CG and GMRES integrated in physics-informed neural solvers
- **FEM solvers:** preconditioned CG (PCG) is the standard for elliptic PDEs; AMG preconditioner gives O(n) complexity
Предварительные знания
Jacobi and Gauss-Seidel
Iterative methods are an alternative to direct solvers for large sparse systems where LU is too expensive. The idea: split A = M − N so that Mx = Nxₙ + b is easy to solve. Jacobi updates all variables simultaneously; Gauss-Seidel immediately uses updated values.
**Jacobi iteration:** xᵢ(ₙ₊₁) = (bᵢ − Σⱼ≠ᵢ aᵢⱼ xⱼ(ₙ)) / aᵢᵢ Matrix form: xₙ₊₁ = D⁻¹(b − (L + U)xₙ), D = diag(A) **Gauss-Seidel** uses updated values immediately-faster but not parallelizable. **Convergence condition:** spectral radius ρ(B) < 1. For diagonally dominant matrices convergence is guaranteed.
Jacobi and Gauss-Seidel converge slowly for ill-conditioned matrices. For production use, prefer conjugate gradient or GMRES.
Jacobi iteration converges for Ax = b when:
Conjugate Gradient Method
The conjugate gradient (CG) method is the optimal iterative solver for SPD matrices. It finds the exact solution in at most n steps in exact arithmetic, but converges much sooner in practice. Each search direction is A-conjugate to all previous ones.
**CG algorithm (Hestenes-Stiefel, 1952):** Initialize: r₀ = b − Ax₀, p₀ = r₀ For k = 0, 1, 2, ...: αₖ = rₖᵀrₖ / (pₖᵀApₖ) xₖ₊₁ = xₖ + αₖpₖ rₖ₊₁ = rₖ − αₖApₖ βₖ = rₖ₊₁ᵀrₖ₊₁ / rₖᵀrₖ pₖ₊₁ = rₖ₊₁ + βₖpₖ **Convergence:** ||eₖ||_A ≤ 2·((√κ−1)/(√κ+1))ᵏ·||e₀||_A, κ = κ(A).
CG converges in O(√κ) iterations, each costing O(nnz). For sparse n×n matrices with O(n) nonzeros: O(n√κ) versus O(n³/3) for LU. At κ ≈ 10⁴ and n = 10⁶ the gap is enormous.
The conjugate gradient method applies to Ax = b when:
GMRES and Preconditioning
GMRES (Generalized Minimal RESidual) is the Krylov method for non-symmetric systems. It builds an Arnoldi basis in Krylov space {b, Ab, A²b, ...} and finds x minimizing ||Ax − b||. Applicable to any square system.
**Preconditioning:** instead of Ax = b, solve M⁻¹Ax = M⁻¹b Goal: κ(M⁻¹A) << κ(A) Popular preconditioners: - **Jacobi (D⁻¹):** simple, not always effective - **ILU:** stores only elements in the sparsity pattern - **AMG (algebraic multigrid):** powerful for PDE, O(n) complexity - **ICC (incomplete Cholesky):** for SPD matrices
| Method | Matrix type | Convergence | Use case |
|---|---|---|---|
| Jacobi | Any (ρ(B)<1) | Linear | Parallel computing |
| Gauss-Seidel | Diag. dominant | Linear, faster | SOR relaxation |
| CG | SPD | O(√κ) iterations | FEM, ML kernel systems |
| GMRES | Any square | O(n) iterations | CFD, non-symmetric |
| BiCGSTAB | Any square | Fast in practice | When GMRES uses too much memory |
Why is preconditioning used in iterative methods?
Key Ideas
- **Jacobi / Gauss-Seidel:** linear convergence when ρ(B) < 1; Gauss-Seidel ≈ 2× faster than Jacobi
- **CG:** O(√κ) iterations for SPD; one matrix-vector product per step; optimal in the Krylov class
- **GMRES:** for non-symmetric systems; builds Arnoldi basis; use GMRES(m) with restart to bound memory
- **Preconditioning:** target κ(M⁻¹A) << κ(A); ILU, Jacobi, AMG are popular choices
Related Topics
Iterative methods operate via matrix-vector products-sparse storage is essential:
- Sparse Matrices — Iterative methods work via A·v-for sparse matrices this is O(nnz) per iteration
- Direct Methods: LU, Cholesky, QR — Direct methods are the baseline; switch to iterative when n is large
Вопросы для размышления
- Why does CG find the exact solution in exactly n steps theoretically, yet converges much faster in practice?
- How is the restart parameter m chosen in GMRES(m)? What is the memory-speed trade-off?
- Why are iterative methods often preferred over direct solvers for GNN and kernel SVM problems?