Differential Equations
Systems of ODEs
2018. Neural ODE (Chen et al., NeurIPS). The authors replaced discrete ResNet layers with continuous dynamics: $dh/dt = f(h, t, \theta)$. Parameters are the right-hand side of the ODE; the forward pass is the solution. This is not an analogy: ResNet is literally an Euler method for a system of ODEs with step size 1. To understand Neural ODE, one must understand ODE systems. To understand training stability, one must understand eigenvalues.
- **Neural ODE (Chen 2018):** ResNet = discrete Euler solver for dx/dt = f(x,t). Adjoint method - backward ODE for gradients, O(1) memory instead of O(L)
- **SIR epidemic model:** a system of three ODEs, R_0 determined by the eigenvalue of the linearization. If Re(lambda) > 0 - the pandemic grows
- **Kalman filter (Tesla Autopilot, GPS):** linear stochastic ODE system, covariances via matrix exponential, stability via spectral criterion
Предварительные знания
The system dx/dt = Ax and the matrix exponential
2018. NeurIPS, Montreal. Tian Qi Chen shows: ResNet is not an architecture. It is an Euler method for the ODE system $dh/dt = f(h, t, \theta)$ with step size $\Delta t = 1$. Discrete layers are an approximation of continuous dynamics. Network parameters are the right-hand side of the equation. The forward pass is numerical integration.
The system $\mathbf{x}'(t) = A\mathbf{x}(t)$ generalizes the scalar equation $x' = ax$ to the vector case. The vector $\mathbf{x}(t)$ describes the state: position plus velocity of a pendulum, concentrations of reactants, hidden states of a neural network. The matrix $A$ encodes the laws of interaction between components.
By analogy with the scalar case, the solution is $\mathbf{x}(t) = e^{At}\mathbf{x}_0$. The **matrix exponential** is defined by the Taylor series: $e^{At} = I + At + \frac{(At)^2}{2!} + \frac{(At)^3}{3!} + \ldots$ This is not a scalar - it is a matrix whose every entry depends on $t$. All information about the behavior of the system is encoded in the spectrum of $A$.
Computing $e^{At}$ directly from the series is inefficient. If $A$ is diagonalizable ($A = PDP^{-1}$), then $e^{At} = Pe^{Dt}P^{-1}$, where $e^{Dt}$ is the diagonal matrix with $e^{\lambda_i t}$ on the diagonal. Each eigenvalue generates its own mode $c_i e^{\lambda_i t} \mathbf{v}_i$ - and they evolve independently.
The **adjoint method** computes gradients through a Neural ODE without storing all intermediate states. Instead of backprop through Euler steps, a co-state ODE is solved backward in time. It is the same matrix exponential, transposed with reversed time. Memory cost: $O(1)$ instead of $O(L)$ for $L$ layers.
| Method for computing $e^{At}$ | When it applies | Complexity |
|---|---|---|
| Diagonalization $A = PDP^{-1}$ | A has n linearly independent eigenvectors | $O(n^3)$ |
| Jordan normal form | Always | $O(n^3)$, numerically unstable |
| Pade approximation (scipy.expm) | Always | $O(n^3)$, stable |
| Truncated Taylor series | Small $\|A\|$ | $O(kn^3)$ for k terms |
Matrix $A = \text{diag}(2, -3)$. What is $e^{At}$?
Phase portraits: the map of a system's fate
Epidemiologists in 2020 did not solve the SIR model analytically - they looked at the phase portrait. The system $S' = -\beta SI$, $I' = \beta SI - \gamma I$, $R' = \gamma I$ is nonlinear, but linearization near equilibrium yields a matrix $A$ whose phase portrait immediately answers: does the pandemic grow or decay?
A **phase portrait** visualizes all trajectories of the system on the state-space plane $(x_1, x_2)$. There is no need to solve the equation - knowing the eigenvalues of $A$ is enough. They determine four fundamentally different scenarios.
| Type | Eigenvalues | Trajectory behavior |
|---|---|---|
| Stable node (sink) | $\lambda_1 < \lambda_2 < 0$ (real) | All approach 0, fastest along $v_1$ |
| Unstable node (source) | $0 < \lambda_1 < \lambda_2$ (real) | All diverge from 0 |
| Saddle | $\lambda_1 < 0 < \lambda_2$ | Attraction along $v_1$, repulsion along $v_2$ |
| Stable spiral (spiral sink) | $\text{Re}(\lambda) < 0$, $\text{Im}(\lambda) \neq 0$ | Spirals converging to 0 |
| Center | $\text{Re}(\lambda) = 0$, $\text{Im}(\lambda) \neq 0$ | Closed ellipses around 0 |
| Unstable spiral (spiral source) | $\text{Re}(\lambda) > 0$, $\text{Im}(\lambda) \neq 0$ | Spirals diverging from 0 |
The Lotka-Volterra predator-prey system: linearization near equilibrium gives purely imaginary eigenvalues $\lambda = \pm i\omega$ - a center. In the nonlinear system this means quasi-periodic oscillations. Adding predator mortality shifts $\text{Re}(\lambda) < 0$ - a spiral sink, eventual extinction of oscillations. One parameter change alters the type - and the fate of the ecosystem.
**Eigenvectors set the directions.** At a node: trajectories become asymptotically parallel to the eigenvector of the slow mode (smallest $|\text{Re}(\lambda)|$). At a saddle: eigenvectors are the axes of attraction and repulsion. The phase portrait literally draws those axes.
Matrix $A$ has eigenvalues $\lambda = -1 \pm 3i$. What is the phase portrait?
Stability: Re(lambda) < 0 is the law of survival
The Kalman filter is the optimal estimator for a linear ODE system with Gaussian noise. Inside: the linear system $\mathbf{x}' = F\mathbf{x} + \mathbf{w}$, the matrix exponential $e^{Ft}$, the covariance matrix evolved via the Riccati equation. Tesla Autopilot solves this system one hundred times per second - estimating vehicle position from lidar, cameras, and GPS. Kalman filter stability means all $\text{Re}(\lambda_i) < 0$ for $F$.
The **stability criterion** for the system $\mathbf{x}' = A\mathbf{x}$: the system is asymptotically stable if and only if $\text{Re}(\lambda_i) < 0$ for all eigenvalues. A single $\text{Re}(\lambda) > 0$ causes divergence, regardless of the rest.
| Criterion | Condition | Behavior |
|---|---|---|
| Asymptotically stable | All $\text{Re}(\lambda) < 0$ | $\mathbf{x}(t) \to 0$ exponentially |
| Marginally stable (Lyapunov) | All $\text{Re}(\lambda) \leq 0$, some $\text{Re} = 0$ | $\|\mathbf{x}(t)\|$ bounded |
| Unstable | Some $\text{Re}(\lambda) > 0$ | $\|\mathbf{x}(t)\| \to \infty$ |
| Convergence rate | $\max \text{Re}(\lambda)$ (spectral abscissa) | Further left means faster decay |
Lyapunov, 1892
In his 1892 dissertation, Alexander Lyapunov laid two foundations. First: for a nonlinear system $\mathbf{x}' = f(\mathbf{x})$, stability near an equilibrium $\mathbf{x}^*$ is determined by linearization via the Jacobian $A = Df(\mathbf{x}^*)$. This is the **first-approximation theorem**. Second: Lyapunov functions - a global stability criterion that needs no eigenvalues. Both tools appear in Neural ODE: adjoint stability and Lyapunov exponent analysis for nonlinear networks.
For 2x2 systems there is a fast criterion without computing eigenvalues: $\text{trace}(A) < 0$ and $\det(A) > 0$ guarantee both $\text{Re}(\lambda) < 0$. This is the **trace-determinant diagram** - a map of all phase portrait types on the plane $(\text{tr}(A), \det(A))$.
The eigenvalues of an ODE system matrix are always real
Complex eigenvalues $\alpha \pm \beta i$ are the norm, not the exception. They describe spirals and oscillations.
The characteristic polynomial of a real matrix has real coefficients - so complex roots come in conjugate pairs $\alpha \pm \beta i$. The imaginary part $\beta$ sets the oscillation frequency; the real part $\alpha$ sets the rate of growth or decay. Most physical systems - pendulums, circuits, waves, Neural ODEs - are oscillatory.
A system has eigenvalues $\lambda = \{-10, -0.01, 0.5\}$. What happens as $t \to \infty$?
Key Ideas
- **x' = Ax** is solved via the matrix exponential: $x(t) = e^{At}x_0$. ResNet is a discrete approximation of this solution
- **Phase portrait** - a visual map of system behavior: spirals (complex lambda), nodes (real lambda), saddles (opposite signs)
- **Re(lambda):** < 0 - decay, > 0 - growth, = 0 - oscillation. Im(lambda): non-zero means rotation/spiral
- **Stability:** all Re(lambda) < 0 means the system survives. One Re > 0 is catastrophic. The spectrum decides everything
Related Topics
ODE systems bridge scalar analysis and complex system dynamics:
- Second-Order ODEs — Any nth-order ODE reduces to a system of n first-order equations
- First-Order ODEs — A scalar ODE is a system of dimension 1
- Dynamical Systems — Nonlinear systems, attractors, chaos - the next level
Вопросы для размышления
- A Neural ODE is stable when Re(lambda) < 0 for the Jacobian of f. This is exactly what stability analysis of nonlinear networks checks via Lyapunov exponents. What happens to training when the spectrum drifts into the right half-plane?
- The system x' = Ax with A = [[0, 1], [-1, 0]] has lambda = +-i (center). A small damping term is added: A -> [[0, 1], [-1, -eps]]. How does the phase portrait change, and what does this mean for stability?
- The SIR model is nonlinear. Linearization near the equilibrium (S*, 0, R*) gives R_0 through an eigenvalue. When does linearization give the correct answer about global behavior, and when only a local one?
Связанные уроки
- de-02 — A 2nd-order ODE is equivalent to a system of two 1st-order ODEs
- de-01 — A scalar ODE is a system of dimension 1
- dyn-01 — Phase portraits and attractors - direct continuation
- nm-01 — Numerical solution of ODE systems - Euler, RK4, adjoint
- la-13-eigenvectors