Scientific Computing
Ordinary Differential Equations
1687. Newton publishes Principia. $F=ma$ is not just a formula - it is an ODE: $m\ddot{x} = F(x,\dot{x},t)$. Three centuries later, the same notation is the foundation for climate simulation, neural networks, and the ISS orbit. Numerical ODE solvers are the bridge between the equations of physics and predictions about the real world.
- **Neural ODE (torchdiffeq):** ResNet as ODE discretization, backprop via adjoint method - O(1) memory instead of O(N steps)
- **SpaceX Dragon:** trajectory computed by solving a 6-DOF ODE of rocket motion every 10 ms
- **Climate models:** coupled ODEs for atmosphere + ocean + ice (>100,000 equations), BDF method for stiff interactions
- **Physics-Informed Neural Networks (PINNs):** ODE as a loss term during training - the model learns to satisfy the differential equation
Euler Method: Explicit, Implicit, Symplectic
**1687. Newton publishes Principia. F=ma is not just a formula - it is an ODE: $m\ddot{x} = F(x,\dot{x},t)$.** Three centuries later, the same notation drives climate simulations, neural networks, and the ISS orbital trajectory. Given $y' = f(t, y)$ and initial condition $y(t_0) = y_0$, the forward Euler method discretizes the derivative directly: $y_{n+1} = y_n + h f(t_n, y_n)$. Step size $h$ is the time increment. Global error $O(h)$: halving $h$ halves the error.
**Stability region of explicit Euler** is the set of $h\lambda$ in the complex plane where $|1 + h\lambda| \leq 1$. For real negative $\lambda$ this requires $h < 2/|\lambda|$. If a system has eigenvalue $\lambda = -10^6$ (a stiff system), the step must be smaller than $2 \times 10^{-6}$ - even if the solution changes slowly on that timescale.
**Symplectic Euler** is a modification for Hamiltonian systems (physics engines): update momentum first, then position using the updated momentum. Global error $O(h)$ like regular Euler, but energy is conserved on average - no long-term drift. This is the method used in game physics engines and molecular dynamics.
An ODE system has eigenvalue $\lambda = -1000$. Explicit Euler requires $h < 2/|\lambda| = 0.002$ for stability. Backward (implicit) Euler is unconditionally stable - but at what cost?
Runge-Kutta 4th Order: Accuracy vs Stiffness
**SpaceX Dragon: trajectory is computed by solving a 6-DOF ODE every 10 ms. With a failing solver - 1 meter deviation per minute. With the right scheme - 1 centimeter.** RK4 is the workhorse for non-stiff problems. The idea: instead of a single derivative estimate at the start of the step (Euler), take a weighted average of 4 derivative evaluations at different points within $[t_n, t_n+h]$.
**RK4: four derivative evaluations per step.** $k_1 = f(t_n, y_n)$, $k_2 = f(t_n + h/2, y_n + hk_1/2)$, $k_3 = f(t_n + h/2, y_n + hk_2/2)$, $k_4 = f(t_n + h, y_n + hk_3)$. Result: $y_{n+1} = y_n + \frac{h}{6}(k_1 + 2k_2 + 2k_3 + k_4)$. Global error $O(h^4)$ - halving the step increases accuracy 16-fold. Cost: 4 evaluations of $f$ per step.
RK4 makes 4 evaluations of f per step and has order $O(h^4)$. Euler makes 1 evaluation and has $O(h)$. To achieve accuracy $10^{-8}$, how many f evaluations does each method require?
Stiff Systems: When RK4 Fails
**Euler method is not a bad method. For non-stiff systems with a small step it works. The problem is that 'small' depends on the problem. For stiff systems 'small' means $10^{-12}$.** Stiffness is the ratio of the largest to the smallest eigenvalue of the Jacobian $\partial f / \partial y$. Combustion reactions: fast H-OH chemistry with $\lambda \sim 10^{10}$ and slow temperature evolution with $\lambda \sim 1$. Stiffness ratio $10^{10}$. RK45 would require $10^{10}$ steps per simulated second.
**BDF (Backward Differentiation Formulas)** - a family of implicit methods for stiff systems. BDF-k uses $k$ previous values to approximate the derivative. BDF-2: $\frac{3}{2}y_{n+1} - 2y_n + \frac{1}{2}y_{n-1} = h f(t_{n+1}, y_{n+1})$. The A-stability region covers nearly the entire left half-plane - step size is limited only by accuracy, not stability. scipy uses BDF via `method='BDF'` or Radau IIA via `method='Radau'`.
**scipy.integrate.solve_ivp defaults to method='RK45', which performs poorly on stiff systems.** If solve_ivp runs for a long time or needs millions of steps, the system is stiff. Switch to `method='BDF'` or `method='Radau'`. Radau is a 5th-order method with excellent A-stability - preferred over BDF for accurate computations.
A combustion ODE has components with timescales $10^{-9}$ s and $10^3$ s. The simulation interval is 100 seconds. Why is RK45 impractical?
Adaptive Step Size and Neural ODE
**Neural ODE (Chen et al. 2018) is a ResNet where the number of layers approaches infinity. Backpropagation through a Neural ODE is the solution of an adjoint ODE backward in time.** Adaptive step control is the key tool: large steps in smooth regions, small steps where dynamics change rapidly. The Dormand-Prince method (RK45): two solutions of different orders from the same evaluations. The embedded RK4/RK5 pair gives a local error estimate with no extra $f$ evaluations.
**PI controller for step size.** Basic rule: $h_{new} = h \cdot (\varepsilon_{tol} / \varepsilon_{local})^{1/(p+1)}$. Problem: step oscillations. A PI controller stabilizes it: $h_{new} = h \cdot (\varepsilon_{tol}/e_n)^{k_I} \cdot (e_{n-1}/e_n)^{k_P}$, where $k_I \approx 0.3/(p+1)$, $k_P \approx 0.4/(p+1)$. RTOL is relative (scales with $|y|$), ATOL is absolute (guards against division by zero when $y \approx 0$).
**Choosing RTOL/ATOL.** Typical values: rtol=1e-6, atol=1e-9. Too small - millions of steps. Too large - inaccurate solution. For Neural ODE training: rtol=1e-3, atol=1e-4 (speed over precision). For physical simulations: rtol=1e-8, atol=1e-10.
A smaller step always gives a more accurate ODE solution
For stiff systems with an adaptive solver, excessively tight atol/rtol can force extremely small steps where accumulated floating-point rounding error begins to dominate over approximation error. Adaptive solvers choose the step that balances both error sources.
Rounding error scales as $N \cdot \varepsilon_{machine}$ where N is the number of steps. Halving h doubles N and doubles rounding error, while halving approximation error. The optimum is at the balance point. For RK4, the optimal step is approximately $h_{opt} \approx (\varepsilon_{machine})^{1/5}$.
Neural ODE uses the adjoint method for backpropagation. What is the main advantage over naive backprop through all solver steps?
ODE: Key Ideas
- Forward Euler $y_{n+1} = y_n + h f(t_n, y_n)$: order $O(h)$, stability constraint $h < 2/|\lambda_{max}|$
- RK4: 4 derivative evaluations, order $O(h^4)$ - halving the step multiplies accuracy by 16; standard method for non-stiff problems
- Stiff systems: stiffness ratio = max/min eigenvalue ratio; implicit BDF/Radau methods have no stability constraint on step size
- Adaptive step control (Dormand-Prince RK45): embedded error estimate, dense output, PI step controller; foundation for Neural ODE via adjoint backprop
Related Topics
ODEs connect continuous mathematics with discrete computation and span physics, ML, and numerical analysis:
- Eigenvalues and SVD — ODE system stiffness = eigenvalue ratio of the Jacobian; implicit methods require LU decomposition at each step
- DSP: Sampling and Discretization — Time-stepping an ODE and the Nyquist theorem are the same concept in different domains
- Deep Learning: Neural ODE — Neural ODE = ResNet with infinite depth; backprop via adjoint ODE
- Algorithm Complexity — Accuracy order O(h^p) is a direct application of Big-O analysis to numerical methods
Вопросы для размышления
- A climate model has 100,000 equations with timescales from seconds (convection) to decades (climate change). Which method and why - RK45 or BDF? How does the choice change when moving from weather forecasting (days) to climate simulation (years)?
- A ResNet with 1000 layers is a discrete ODE with step $h = 1/1000$. Neural ODE is the limit as $h \to 0$. What is lost and what is gained in the transition to Neural ODE? How does the adjoint method solve the memory problem in backpropagation?
- Symplectic Euler conserves energy on average despite having only first-order accuracy. RK4 is more accurate but does not conserve energy. For planetary orbit simulation over billions of years - which method is better, and why are accuracy and energy conservation different requirements?