Algebra

Singular Value Decomposition

LoRA lets one fine-tune GPT-4-scale models for $100 instead of $1M-using the SVD insight that weight updates are low-rank. The Netflix Prize was won with matrix factorization (SVD). SVD is the universal X-ray of any dataset.

**LoRA in LLMs:** fine-tuning via W + AB with rank 4-64 for matrices of size 4096×4096; used in llama.cpp, HuggingFace PEFT, QLoRA
**PCA via SVD:** sklearn.PCA uses truncated SVD (ARPACK/randomized SVD); numerically more stable than eigendecomposing the covariance matrix
**Recommender systems:** user×item rating matrix is factorized via SVD into latent factor spaces; Netflix, Spotify, Amazon all use variants

Предварительные знания

Quadratic Forms

SVD: Definition and Geometry

For any matrix A ∈ ℝ^{m×n}, there exists a decomposition A = UΣVᵀ, where U ∈ ℝ^{m×m} and V ∈ ℝ^{n×n} are orthogonal, and Σ ∈ ℝ^{m×n} is diagonal with σ₁ ≥ σ₂ ≥ … ≥ σₙ ≥ 0 (singular values). Geometrically: Vᵀ rotates, Σ scales along axes, U rotates again.

**Connection to eigenvalues:** σᵢ = √λᵢ(AᵀA), where λᵢ are eigenvalues of the PSD matrix AᵀA. Columns of V are eigenvectors of AᵀA; columns of U are eigenvectors of AAᵀ.

Use np.linalg.svd(A, full_matrices=False) for the thin SVD-it only returns the rank(A) singular triplets and is significantly faster for tall/wide matrices.

For a 5×3 full-rank matrix A, the thin SVD A = UΣVᵀ has shapes:

Eckart-Young Theorem: Best Low-Rank Approximation

The Eckart-Young theorem (1936): the best rank-k approximation of A in the Frobenius (and spectral) norm is Aₖ = U_k Σ_k V_kᵀ, the sum of the first k SVD terms. Error: ||A − Aₖ||_F = √(σ²_{k+1} + … + σ²_r).

**LoRA (Low-Rank Adaptation)** in LLMs: instead of fine-tuning the full weight matrix W ∈ ℝ^{d×d}, train only W + AB where A ∈ ℝ^{d×r}, B ∈ ℝ^{r×d} and r ≪ d. This is exactly a low-rank addition to the weights. Memory savings: 2dr vs d² parameters.

Matrix A has singular values σ₁=10, σ₂=5, σ₃=1. The Frobenius error of the best rank-2 approximation is:

Moore-Penrose Pseudoinverse

The Moore-Penrose pseudoinverse: A⁺ = VΣ⁺Uᵀ, where Σ⁺ is formed by replacing nonzero σᵢ with 1/σᵢ. A⁺b gives the minimum-norm least-squares solution. For full column rank: A⁺ = (AᵀA)⁻¹Aᵀ (left inverse for overdetermined systems).

Explicit pseudoinverse computation is numerically unstable. Use np.linalg.lstsq for least squares and np.linalg.pinv only when one need the pseudoinverse matrix itself. Both use SVD internally.

In A⁺ = VΣ⁺Uᵀ, what is Σ⁺?

PCA, LoRA, and Recommender Systems

PCA via SVD: for a centered data matrix X (rows = samples), compute X = UΣVᵀ. Principal components are the columns of V; projections are XVₖ = UₖΣₖ; explained variance ratio is σᵢ²/Σσᵢ². LoRA: W_adapted = W_pretrained + AB, rank(AB) = r ≪ min(m,n).

Application	Matrix	What we take	Why
PCA	X (data)	V: right singular vectors	Dimensionality reduction
LoRA	ΔW = AB	Low-rank addition	Efficient LLM fine-tuning
Recommender SVD	R (ratings)	U (users) and V (items)	Matrix factorization
LSA (NLP)	TF-IDF matrix	Semantic concepts	Latent Semantic Analysis

LoRA uses W + AB instead of full W. The number of trainable parameters drops from d² to:

Key Ideas

**A = UΣVᵀ:** U,V orthogonal; σᵢ = √λᵢ(AᵀA); geometrically: rotate, scale along axes, rotate back
**Eckart-Young:** best rank-k approximation = first k SVD terms; error = √(Σᵢ>ₖ σᵢ²)
**A⁺ = VΣ⁺Uᵀ:** Moore-Penrose pseudoinverse; A⁺b = minimum-norm least-squares solution
**LoRA = low-rank addition:** W + AB, rank = r ≪ d; saves 2dr vs d² parameters; PCA uses the same idea

Вопросы для размышления

LoRA uses rank 4-64 for matrices of size 4096×4096. Why does such a small rank work? What does this tell us about the structure of weight updates during fine-tuning?
PCA via SVD and via eigendecomposition of the covariance matrix give the same result. What is the numerical difference between the two approaches?
Randomized SVD (Halko-Martinsson-Tropp, 2011) computes an approximate SVD in O(mnk) instead of O(mn·min(m,n)). How do random projections help speed up the computation?

Связанные уроки

la-01-vectors-intro