Linear Algebra

Jordan Normal Form: When Diagonalization Breaks Down

Most matrices in practice are diagonalizable. But when they are not - repeated eigenvalues with insufficient eigenvectors - the Jordan Normal Form reveals the hidden structure. This situation appears in degenerate control systems, defective RNNs, and certain symmetry-breaking models.

Control theory: a double pole in the transfer function signals a Jordan block - critical damping
RNN analysis: defective weight matrices lead to polynomial (not exponential) growth modes
Numerical analysis: computing f(A) for non-diagonalizable A requires Jordan form theory
Differential equations: repeated eigenvalues produce t·e^(λt) solutions - from Jordan blocks
Symmetry breaking: degenerate eigenspaces appear in models with exact symmetry

Defective matrices and Jordan blocks

**Diagonalization fails at the most interesting moment.** A 4x4 matrix with four identical eigenvalues - and only a single eigenvector. No basis can be formed, the decomposition A = PDP⁻¹ collapses. Jordan normal form is what steps in: an almost-diagonal matrix with ones above the diagonal exactly where the space was short a dimension. This structure explains why gradients in RNNs vanish - not just "because lambda is small", but because of polynomial factors hiding in those ones.

Why do some RNNs train and others just explode? Why does the same learning rate that works for one weight matrix blow up for a nearly identical one? **Jordan form** is the answer. When two eigenvalues collide, a normally diagonalizable matrix loses its diagonal form - and iterates of it grow *polynomially*, not exponentially. That polynomial growth is the exact failure mode behind every vanilla RNN ever trained.

Defective matrices: not enough eigenvectors

A matrix is **defective** when it has fewer linearly independent eigenvectors than its dimension. This happens with **repeated eigenvalues** when the ~algebraic multiplicity~{how many times the root appears in the characteristic polynomial} exceeds the ~geometric multiplicity~{dimension of the eigenspace - how many independent eigenvectors exist for that eigenvalue}.

A = [ 3 1 ] characteristic polynomial: (lambda - 3)^2 = 0 [ 0 3 ] single eigenvalue lambda = 3 (multiplicity 2) (A - 3I) = [ 0 1 ] kernel: only e1 = (1, 0) [ 0 0 ] Algebraic multiplicity of lambda=3: 2 Geometric multiplicity of lambda=3: 1 Not enough eigenvectors for a basis - the matrix is defective.

**Quick defectivity check**: if det(A - lambda*I) = 0 and rank(A - lambda*I) > 0, the matrix is defective. In numpy: compare the number of eigenvectors from `np.linalg.eig` against `n - np.linalg.matrix_rank(np.eye(n)*lam - A)`.

Jordan block: almost diagonal

A Jordan block of size k for eigenvalue lambda has lambda on the diagonal and **ones above it**. Each one marks exactly where the space was missing a dimension:

[ lambda 1 0 0 ] [ 0 lambda 1 0 ] J_4(l)= [ 0 0 lambda 1 ] [ 0 0 0 lambda ] Larger block = more "entanglement" in the space. J_1(lambda) = [lambda] - just a plain diagonal entry. The full Jordan matrix is block-diagonal: [ J_{k1}(l1) 0 ... 0 ] J = [ 0 J_{k2}(l2) ... 0 ] [ ... ... ... ... ] [ 0 0 ... J_{km}(lm) ] For a diagonalizable matrix, all blocks have size 1 - ordinary diagonal form.

What makes a matrix defective (non-diagonalisable)?

Algebraic multiplicity = number of times λ appears in the characteristic polynomial. Geometric multiplicity = dimension of the eigenspace. When geometric is strictly smaller, there are not enough eigenvectors to form a basis, so the matrix is non-diagonalisable. Example: [[2,1],[0,2]] has λ=2 twice but only one eigenvector.

Generalised eigenvectors

Generalized eigenvectors: completing the basis

Since ordinary eigenvectors fall short, **generalized eigenvectors** are added - vectors from the kernels of powers of (A - lambda*I). Each next vector in the chain comes from applying (A - lambda*I) to the previous one:

For eigenvalue lambda and a Jordan block of size k, build a chain v1, v2, ..., vk: (A - lI) vk = vk-1 generalized eigenvector of rank k (A - lI) vk-1 = vk-2 ... (A - lI) v2 = v1 (A - lI) v1 = 0 v1 is an ordinary eigenvector Example (the 2x2 block from above): v1 = (1, 0) (A - 3I)v1 = 0 v2: such that (A - 3I)v2 = v1 [0 1; 0 0] v2 = [1; 0] v2 = (0, 1) works: [0*0+1*1; 0*0+0*1] = [1; 0] ok P = [v1 | v2] = I, J = A (the matrix was already in Jordan form).

The nilpotent part and ML

Every Jordan block splits as J_k(lambda) = lambda*I + N, where N is a **nilpotent matrix** (N^k = 0). N is responsible for the polynomial growth that hides inside the block - and in neural networks this has a direct, measurable consequence.

J_2(l)^n = [l^n n*l^(n-1)] [0 l^n ] J_3(l)^n = [l^n n*l^(n-1) C(n,2)*l^(n-2)] [0 l^n n*l^(n-1) ] [0 0 l^n ] Key point: if |lambda| < 1 then lambda^n -> 0, but n*lambda^(n-1) decays more slowly. If |lambda| = 1 (the critical case): lambda^n is bounded, but n*lambda^(n-1) grows without bound. This is polynomial growth fighting exponential decay.

**Connection to vanishing gradients in RNNs**: if the recurrent weight matrix has a Jordan block with |lambda| < 1, the gradient flowing back through t steps is proportional to t * lambda^t. At lambda = 0.9 and t = 100 this is 100 * 0.9^100 = 0.00003 - essentially zero. This is why LSTM and GRU use gating: the gates prevent "bad" Jordan blocks from forming during training.

What is a generalised eigenvector of level k?

Generalised eigenvectors complete the basis when ordinary eigenvectors are insufficient. The chain v_1, v_2, ..., v_k satisfies (A - λI)v_{i+1} = v_i with v_1 an ordinary eigenvector. The chain forms a Jordan block of size k.

Matrix exponential and applications

Matrix exponential: solving systems of ODEs

The system x'(t) = A*x(t) has solution x(t) = e^(At)*x(0). Jordan form makes the matrix exponential explicit and computable.

e^(J_k(l)*t) = e^(lt) * [ 1 t t^2/2! ... t^(k-1)/(k-1)! ] [ 0 1 t ... ] [ 0 0 1 ... ] [ 0 0 0 ... 1 ] For J_2(lambda): e^(Jt) = e^(lt) * [ 1 t ] [ 0 1 ] Resonance: repeated roots produce solutions of the form t*e^(lt), t^2*e^(lt), ... Even when Re(lambda) < 0 (a stable system), transients can last far longer than lambda alone would predict.

Jordan form in control theory and RNNs

Jordan form appears wherever the time-domain behavior of a linear system matters - in control engineering and in sequence models.

Field	How it is used	What Jordan form reveals
Control theory	Stability of x' = Ax	Stable iff Re(lambda) < 0 for all eigenvalues
RNN / LSTM	Gradient dynamics during BPTT	Block with \|lambda\| > 1 - explosion; \|lambda\| < 1 - vanishing
Signal processing	Systems with repeated poles (resonance)	Terms t*e^(lt) - resonant solutions
Numerical linear algebra	Convergence of iterative methods	Spectral radius rho(A) = max\|lambda\| sets the rate

**Jordan form is numerically unstable**: a small perturbation to a matrix can merge two distinct eigenvalues into one repeated one, or split a repeated eigenvalue into two. In practice, **Schur form** (A = Q T Q*, T upper triangular) is used instead - equally informative but stable under floating-point errors.

**The theory shows up wherever exact linear dynamics matter** - **RNN / LSTM / GRU**: Analyzing vanishing/exploding gradients. Spectral radius of the weight matrix; orthogonal/identity initialization (Saxe et al. 2013) - **Control systems (autopilot, robotics)**: Stability check: all Re(lambda) < 0. MATLAB Control Toolbox, Python control library, scipy.signal - **Signal processing**: Filters with repeated poles. IIR filters, resonance analysis, scipy.signal.lti - **Numerical methods**: Convergence of iterations (power method, PageRank). Spectral radius sets how fast the method converges

Practice: Jordan Block Powers

**Q:** A 3x3 matrix has eigenvalue lambda = 2 with algebraic multiplicity 3 and geometric multiplicity 1. What is the only possible Jordan form? - One block J_3(2) of size 3 - the only possibility when geometric multiplicity is 1 - If geometric multiplicity were 2: J_2(2) + J_1(2) - Number of blocks = dim(ker(A - lambda*I)) **Q:** Why is orthogonal initialization recommended for RNN weight matrices? - Orthogonal matrix: all |lambda| = 1 - no decay, no explosion - Jordan blocks collapse to diagonal (no nilpotent part) - Gradient preserves its norm through t backprop steps - vanishing problem disappears - This is the Saxe et al. 2013 orthogonal initialization result **Q:** Why is Schur form preferred over Jordan form for numerical computation? - Jordan form is discontinuous: an epsilon perturbation can change the block structure entirely - Schur form A = QTQ* is continuous - eigenvalues sit on the diagonal of T - For stability analysis, only the diagonal of T is needed - scipy.linalg.schur is numerically reliable; jordan_form is only for exact arithmetic

Takeaways from this lesson

**Defective matrix**: algebraic multiplicity > geometric; diagonalization impossible
**Jordan block J_k(lambda)**: lambda on the diagonal, ones above - each one marks a missing dimension
**J_k(lambda)^n** contains terms n*lambda^(n-1), C(n,2)*lambda^(n-2) - polynomial growth at |lambda| = 1
**RNN**: Jordan structure of the weight matrix governs the vanishing/exploding gradient regime
**Matrix exponential**: e^(Jt) produces terms t*e^(lt) - resonance from repeated ODE roots
**In practice**: Schur form instead of Jordan - stable under floating-point errors
**Orthogonal initialization** in RNNs is a direct application of Jordan theory

What comes next

Jordan form is the limiting case of spectral theory. The following topics generalize it to non-symmetric and rectangular settings.

Diagonalization — Jordan form is the generalization of diagonalization to defective matrices
SVD — Universal decomposition for rectangular matrices, avoids defectiveness entirely
Tensors — The next level - matrices generalized to multiple indices

Связанные уроки

de-01