Differential Equations

Systems of ODEs

2018. Neural ODE (Chen et al., NeurIPS). The authors replaced discrete ResNet layers with continuous dynamics: $dh/dt = f(h, t, \theta)$. Parameters are the right-hand side of the ODE; the forward pass is the solution. This is not an analogy: ResNet is literally an Euler method for a system of ODEs with step size 1. To understand Neural ODE, one must understand ODE systems. To understand training stability, one must understand eigenvalues.

**Neural ODE (Chen 2018):** ResNet = discrete Euler solver for dx/dt = f(x,t). Adjoint method - backward ODE for gradients, O(1) memory instead of O(L)
**SIR epidemic model:** a system of three ODEs, R_0 determined by the eigenvalue of the linearization. If Re(lambda) > 0 - the pandemic grows
**Kalman filter (Tesla Autopilot, GPS):** linear stochastic ODE system, covariances via matrix exponential, stability via spectral criterion

Предварительные знания

Second-Order ODEs

The system dx/dt = Ax and the matrix exponential

2018. NeurIPS, Montreal. Tian Qi Chen shows: ResNet is not an architecture. It is an Euler method for the ODE system $dh/dt = f(h, t, \theta)$ with step size $\Delta t = 1$. Discrete layers are an approximation of continuous dynamics. Network parameters are the right-hand side of the equation. The forward pass is numerical integration.

The system $\mathbf{x}'(t) = A\mathbf{x}(t)$ generalizes the scalar equation $x' = ax$ to the vector case. The vector $\mathbf{x}(t)$ describes the state: position plus velocity of a pendulum, concentrations of reactants, hidden states of a neural network. The matrix $A$ encodes the laws of interaction between components.

By analogy with the scalar case, the solution is $\mathbf{x}(t) = e^{At}\mathbf{x}_0$. The **matrix exponential** is defined by the Taylor series: $e^{At} = I + At + \frac{(At)^2}{2!} + \frac{(At)^3}{3!} + \ldots$ This is not a scalar - it is a matrix whose every entry depends on $t$. All information about the behavior of the system is encoded in the spectrum of $A$.

Computing $e^{At}$ directly from the series is inefficient. If $A$ is diagonalizable ($A = PDP^{-1}$), then $e^{At} = Pe^{Dt}P^{-1}$, where $e^{Dt}$ is the diagonal matrix with $e^{\lambda_i t}$ on the diagonal. Each eigenvalue generates its own mode $c_i e^{\lambda_i t} \mathbf{v}_i$ - and they evolve independently.

The **adjoint method** computes gradients through a Neural ODE without storing all intermediate states. Instead of backprop through Euler steps, a co-state ODE is solved backward in time. It is the same matrix exponential, transposed with reversed time. Memory cost: $O(1)$ instead of $O(L)$ for $L$ layers.

Method for computing $e^{At}$	When it applies	Complexity
Diagonalization $A = PDP^{-1}$	A has n linearly independent eigenvectors	$O(n^3)$
Jordan normal form	Always	$O(n^3)$, numerically unstable
Pade approximation (scipy.expm)	Always	$O(n^3)$, stable
Truncated Taylor series	Small $\\|A\\|$	$O(kn^3)$ for k terms

Matrix $A = \text{diag}(2, -3)$. What is $e^{At}$?

Phase portraits: the map of a system's fate

Epidemiologists in 2020 did not solve the SIR model analytically - they looked at the phase portrait. The system $S' = -\beta SI$, $I' = \beta SI - \gamma I$, $R' = \gamma I$ is nonlinear, but linearization near equilibrium yields a matrix $A$ whose phase portrait immediately answers: does the pandemic grow or decay?

A **phase portrait** visualizes all trajectories of the system on the state-space plane $(x_1, x_2)$. There is no need to solve the equation - knowing the eigenvalues of $A$ is enough. They determine four fundamentally different scenarios.

Type	Eigenvalues	Trajectory behavior
Stable node (sink)	$\lambda_1 < \lambda_2 < 0$ (real)	All approach 0, fastest along $v_1$
Unstable node (source)	$0 < \lambda_1 < \lambda_2$ (real)	All diverge from 0
Saddle	$\lambda_1 < 0 < \lambda_2$	Attraction along $v_1$, repulsion along $v_2$
Stable spiral (spiral sink)	$\text{Re}(\lambda) < 0$, $\text{Im}(\lambda) \neq 0$	Spirals converging to 0
Center	$\text{Re}(\lambda) = 0$, $\text{Im}(\lambda) \neq 0$	Closed ellipses around 0
Unstable spiral (spiral source)	$\text{Re}(\lambda) > 0$, $\text{Im}(\lambda) \neq 0$	Spirals diverging from 0

The Lotka-Volterra predator-prey system: linearization near equilibrium gives purely imaginary eigenvalues $\lambda = \pm i\omega$ - a center. In the nonlinear system this means quasi-periodic oscillations. Adding predator mortality shifts $\text{Re}(\lambda) < 0$ - a spiral sink, eventual extinction of oscillations. One parameter change alters the type - and the fate of the ecosystem.

**Eigenvectors set the directions.** At a node: trajectories become asymptotically parallel to the eigenvector of the slow mode (smallest $|\text{Re}(\lambda)|$). At a saddle: eigenvectors are the axes of attraction and repulsion. The phase portrait literally draws those axes.

Matrix $A$ has eigenvalues $\lambda = -1 \pm 3i$. What is the phase portrait?

Stability: Re(lambda) < 0 is the law of survival

The Kalman filter is the optimal estimator for a linear ODE system with Gaussian noise. Inside: the linear system $\mathbf{x}' = F\mathbf{x} + \mathbf{w}$, the matrix exponential $e^{Ft}$, the covariance matrix evolved via the Riccati equation. Tesla Autopilot solves this system one hundred times per second - estimating vehicle position from lidar, cameras, and GPS. Kalman filter stability means all $\text{Re}(\lambda_i) < 0$ for $F$.

The **stability criterion** for the system $\mathbf{x}' = A\mathbf{x}$: the system is asymptotically stable if and only if $\text{Re}(\lambda_i) < 0$ for all eigenvalues. A single $\text{Re}(\lambda) > 0$ causes divergence, regardless of the rest.

Criterion	Condition	Behavior
Asymptotically stable	All $\text{Re}(\lambda) < 0$	$\mathbf{x}(t) \to 0$ exponentially
Marginally stable (Lyapunov)	All $\text{Re}(\lambda) \leq 0$, some $\text{Re} = 0$	$\\|\mathbf{x}(t)\\|$ bounded
Unstable	Some $\text{Re}(\lambda) > 0$	$\\|\mathbf{x}(t)\\| \to \infty$
Convergence rate	$\max \text{Re}(\lambda)$ (spectral abscissa)	Further left means faster decay

Lyapunov, 1892

In his 1892 dissertation, Alexander Lyapunov laid two foundations. First: for a nonlinear system $\mathbf{x}' = f(\mathbf{x})$, stability near an equilibrium $\mathbf{x}^*$ is determined by linearization via the Jacobian $A = Df(\mathbf{x}^*)$. This is the **first-approximation theorem**. Second: Lyapunov functions - a global stability criterion that needs no eigenvalues. Both tools appear in Neural ODE: adjoint stability and Lyapunov exponent analysis for nonlinear networks.

For 2x2 systems there is a fast criterion without computing eigenvalues: $\text{trace}(A) < 0$ and $\det(A) > 0$ guarantee both $\text{Re}(\lambda) < 0$. This is the **trace-determinant diagram** - a map of all phase portrait types on the plane $(\text{tr}(A), \det(A))$.

The eigenvalues of an ODE system matrix are always real

Complex eigenvalues $\alpha \pm \beta i$ are the norm, not the exception. They describe spirals and oscillations.

The characteristic polynomial of a real matrix has real coefficients - so complex roots come in conjugate pairs $\alpha \pm \beta i$. The imaginary part $\beta$ sets the oscillation frequency; the real part $\alpha$ sets the rate of growth or decay. Most physical systems - pendulums, circuits, waves, Neural ODEs - are oscillatory.

A system has eigenvalues $\lambda = \{-10, -0.01, 0.5\}$. What happens as $t \to \infty$?

Key Ideas

**x' = Ax** is solved via the matrix exponential: $x(t) = e^{At}x_0$. ResNet is a discrete approximation of this solution
**Phase portrait** - a visual map of system behavior: spirals (complex lambda), nodes (real lambda), saddles (opposite signs)
**Re(lambda):** < 0 - decay, > 0 - growth, = 0 - oscillation. Im(lambda): non-zero means rotation/spiral
**Stability:** all Re(lambda) < 0 means the system survives. One Re > 0 is catastrophic. The spectrum decides everything

Вопросы для размышления

A Neural ODE is stable when Re(lambda) < 0 for the Jacobian of f. This is exactly what stability analysis of nonlinear networks checks via Lyapunov exponents. What happens to training when the spectrum drifts into the right half-plane?
The system x' = Ax with A = [[0, 1], [-1, 0]] has lambda = +-i (center). A small damping term is added: A -> [[0, 1], [-1, -eps]]. How does the phase portrait change, and what does this mean for stability?
The SIR model is nonlinear. Linearization near the equilibrium (S*, 0, R*) gives R_0 through an eigenvalue. When does linearization give the correct answer about global behavior, and when only a local one?

Связанные уроки

de-02 — A 2nd-order ODE is equivalent to a system of two 1st-order ODEs
de-01 — A scalar ODE is a system of dimension 1
dyn-01 — Phase portraits and attractors - direct continuation
nm-01 — Numerical solution of ODE systems - Euler, RK4, adjoint
la-13-eigenvectors

Differential Equations

Systems of ODEs

**Neural ODE (Chen 2018):** ResNet = discrete Euler solver for dx/dt = f(x,t). Adjoint method - backward ODE for gradients, O(1) memory instead of O(L)
**SIR epidemic model:** a system of three ODEs, R_0 determined by the eigenvalue of the linearization. If Re(lambda) > 0 - the pandemic grows
**Kalman filter (Tesla Autopilot, GPS):** linear stochastic ODE system, covariances via matrix exponential, stability via spectral criterion

Предварительные знания

Second-Order ODEs

The system dx/dt = Ax and the matrix exponential

Method for computing $e^{At}$	When it applies	Complexity
Diagonalization $A = PDP^{-1}$	A has n linearly independent eigenvectors	$O(n^3)$
Jordan normal form	Always	$O(n^3)$, numerically unstable
Pade approximation (scipy.expm)	Always	$O(n^3)$, stable
Truncated Taylor series	Small $\\|A\\|$	$O(kn^3)$ for k terms

Matrix $A = \text{diag}(2, -3)$. What is $e^{At}$?

Phase portraits: the map of a system's fate

Type	Eigenvalues	Trajectory behavior
Stable node (sink)	$\lambda_1 < \lambda_2 < 0$ (real)	All approach 0, fastest along $v_1$
Unstable node (source)	$0 < \lambda_1 < \lambda_2$ (real)	All diverge from 0
Saddle	$\lambda_1 < 0 < \lambda_2$	Attraction along $v_1$, repulsion along $v_2$
Stable spiral (spiral sink)	$\text{Re}(\lambda) < 0$, $\text{Im}(\lambda) \neq 0$	Spirals converging to 0
Center	$\text{Re}(\lambda) = 0$, $\text{Im}(\lambda) \neq 0$	Closed ellipses around 0
Unstable spiral (spiral source)	$\text{Re}(\lambda) > 0$, $\text{Im}(\lambda) \neq 0$	Spirals diverging from 0

Matrix $A$ has eigenvalues $\lambda = -1 \pm 3i$. What is the phase portrait?

Stability: Re(lambda) < 0 is the law of survival

Criterion	Condition	Behavior
Asymptotically stable	All $\text{Re}(\lambda) < 0$	$\mathbf{x}(t) \to 0$ exponentially
Marginally stable (Lyapunov)	All $\text{Re}(\lambda) \leq 0$, some $\text{Re} = 0$	$\\|\mathbf{x}(t)\\|$ bounded
Unstable	Some $\text{Re}(\lambda) > 0$	$\\|\mathbf{x}(t)\\| \to \infty$
Convergence rate	$\max \text{Re}(\lambda)$ (spectral abscissa)	Further left means faster decay

Lyapunov, 1892

The eigenvalues of an ODE system matrix are always real

Complex eigenvalues $\alpha \pm \beta i$ are the norm, not the exception. They describe spirals and oscillations.

A system has eigenvalues $\lambda = \{-10, -0.01, 0.5\}$. What happens as $t \to \infty$?

Key Ideas

**x' = Ax** is solved via the matrix exponential: $x(t) = e^{At}x_0$. ResNet is a discrete approximation of this solution
**Phase portrait** - a visual map of system behavior: spirals (complex lambda), nodes (real lambda), saddles (opposite signs)
**Re(lambda):** < 0 - decay, > 0 - growth, = 0 - oscillation. Im(lambda): non-zero means rotation/spiral
**Stability:** all Re(lambda) < 0 means the system survives. One Re > 0 is catastrophic. The spectrum decides everything

Вопросы для размышления

A Neural ODE is stable when Re(lambda) < 0 for the Jacobian of f. This is exactly what stability analysis of nonlinear networks checks via Lyapunov exponents. What happens to training when the spectrum drifts into the right half-plane?
The system x' = Ax with A = [[0, 1], [-1, 0]] has lambda = +-i (center). A small damping term is added: A -> [[0, 1], [-1, -eps]]. How does the phase portrait change, and what does this mean for stability?
The SIR model is nonlinear. Linearization near the equilibrium (S*, 0, R*) gives R_0 through an eigenvalue. When does linearization give the correct answer about global behavior, and when only a local one?

Связанные уроки

de-02 — A 2nd-order ODE is equivalent to a system of two 1st-order ODEs
de-01 — A scalar ODE is a system of dimension 1
dyn-01 — Phase portraits and attractors - direct continuation
nm-01 — Numerical solution of ODE systems - Euler, RK4, adjoint
la-13-eigenvectors

Systems of ODEs

Предварительные знания

The system dx/dt = Ax and the matrix exponential

Phase portraits: the map of a system's fate

Stability: Re(lambda) < 0 is the law of survival

Lyapunov, 1892

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки

Systems of ODEs

Предварительные знания

The system dx/dt = Ax and the matrix exponential

Phase portraits: the map of a system's fate

Stability: Re(lambda) < 0 is the law of survival

Lyapunov, 1892

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки