Dynamical Systems
Discrete Dynamical Systems
Open a calculator and press cos repeatedly. Any starting number - after 50 presses: 0.7391... Always. That is an attractor. Now replace cos with $\theta \mapsto \theta - \eta \nabla L(\theta)$ yields neural network training. The same mathematical object. The same question: where does this sequence converge?
- **Neural network training:** SGD and Adam are discrete dynamical systems on weight space; Neural ODE (Chen et al. 2018) shows that infinite-depth ResNets are the limit of a continuous ODE $dx/dt = F(x)$
- **Diffusion models:** DDPM runs 1000 steps of a forward Markov noise process, then 1000 steps of a reverse Langevin process to generate one image - iteration all the way down
- **Chaos and training instability:** the logistic map at $r > 3.57$ has positive Lyapunov exponent - explaining why two training runs with different random seeds diverge, not a bug but chaotic sensitivity
Iteration: The Basic Mechanism
A bank deposit at 5% annual interest. Each year the balance updates by the same rule: new balance = old balance * 1.05. That is iteration - applying the same function again and again. **Gradient descent training GPT works by the exact same principle**: $\theta_{n+1} = \theta_n - \eta \nabla L(\theta_n)$ - one rule, applied millions of times.
**Iteration** - the process of repeatedly applying a function f to an initial value x₀: x₁ = f(x₀), x₂ = f(x₁), x₃ = f(x₂), ... In general: **x_{n+1} = f(x_n)**. This is a discrete dynamical system - time advances in steps n = 0, 1, 2, 3, ...
The key idea: **the system's state is fully determined by the current value $x_n$ and the rule f**. No need to remember the entire history - only the current step. This property is called **Markovian** - the future depends only on the present. In ResNet, each layer is an iteration step $x_{n+1} = x_n + F(x_n)$; in 2018, Chen et al. showed that infinitely deep ResNets are the limit of an ordinary differential equation (Neural ODE).
| Example system | State x_n | Rule f(x) |
|---|---|---|
| Bank deposit | Balance | x * 1.05 |
| Bacterial population | Cell count | 2x (division) |
| Logistic map | Population fraction | r*x*(1-x) |
| Fibonacci sequence | (F_n, F_{n-1}) | (F_n + F_{n-1}, F_n) |
Poincaré and Discrete Dynamics
Henri Poincaré in the 1880s was the first to systematically study iterated functions while investigating the three-body problem. He discovered that even simple deterministic rules can produce unpredictable behavior - 80 years before chaos was formally identified.
Given the iteration x_{n+1} = x_n / 2 with initial condition x₀ = 8, what is x₃?
Orbit: The System's History
When we run the iteration from a point x₀, the sequence of values x₀, x₁, x₂, ... forms the **orbit** of x₀. An orbit is the complete trajectory of the system. Different starting points can produce entirely different orbits.
The **orbit** of a point x₀ under the map f is the sequence {x₀, f(x₀), f(f(x₀)), f(f(f(x₀))), ...}. Notation: **O(x₀) = {f^n(x₀) : n = 0, 1, 2, ...}**, where f^n denotes n-fold application of f (not the nth power!).
The **cobweb diagram** is a powerful tool for visualizing orbits. Draw the graph of f(x) and the diagonal y = x. Each iteration step is a jump up to the graph (apply f), then horizontally to the diagonal (transfer the result to the x-axis). The result is the characteristic "cobweb" pattern.
| Orbit type | Behavior | Example |
|---|---|---|
| Converging | x_n → x* as n → ∞ | f(x) = x/2, x₀ = 8 → 0 |
| Diverging | x_n → ∞ as n → ∞ | f(x) = 2x, x₀ = 1 → ∞ |
| Periodic | x_{n+p} = x_n for some p | f(x) = -x → {1, -1, 1, -1, ...} |
| Chaotic | Does not converge, diverge, or repeat | Logistic map at r = 4 |
**Sensitivity to initial conditions:** in chaotic systems, orbits starting from nearby points x₀ = 0.1000 and x₀ = 0.1001 may end up in completely different locations after 30 steps. This makes long-term prediction impossible.
On a cobweb diagram, the orbit spirals toward the intersection of f(x) and y = x. What does this mean?
Fixed Point: f(x*) = x*
**If the system reaches a point x* and stays there forever** - that is a fixed point. Mathematically: f(x*) = x*. The system gets "stuck" - applying the function once changes nothing.
A **fixed point** x* of a map f is the solution to f(x*) = x*. Geometrically: the intersection of the graph of f(x) with the diagonal y = x. Fixed points can be **stable** (attractors - nearby orbits are drawn in) or **unstable** (repellers - nearby orbits move away).
**Stability criterion** - the derivative at the fixed point. If |f'(x*)| < 1, then x* is stable: small deviations decay. If |f'(x*)| > 1, then x* is unstable: small deviations grow. If |f'(x*)| = 1 - a borderline case that requires further analysis.
| |f'(x*)| | Type | Behavior of nearby orbits |
|---|---|---|
| < 1 | Stable node (attractor) | Monotonically converge to x* |
| = 0 | Super-stable point | Very fast convergence |
| > 1 | Unstable node (repeller) | Monotonically move away from x* |
| -1 < f'(x*) < 0 | Stable spiral | Converge while oscillating around x* |
The intuition is simple: if f compresses the neighborhood of x* (derivative magnitude < 1), deviations shrink with each iteration. If it stretches (derivative magnitude > 1) - deviations grow.
For the map f(x) = x², the fixed points are 0 and 1. Which one is stable?
Periodic Orbits and the Road to Chaos
Let's return to the bank deposit example from the beginning of the lesson. We saw that a simple rule can produce a converging orbit. But what if the rule is a bit more complex? The **logistic map** x_{n+1} = r*x_n*(1-x_n) shows a stunning variety of behaviors depending on the parameter r.
A **periodic orbit** of period p is a set of points {x₀, x₁, ..., x_{p-1}} where f^p(x₀) = x₀ but f^k(x₀) ≠ x₀ for 0 < k < p. A 2-cycle: the system jumps between two points. Period 3: among three points. A fixed point is a cycle of period 1.
**Period-doubling cascade**: as r increases, the following sequence occurs: fixed point → 2-cycle → 4-cycle → 8-cycle → ... → chaos. Each doubling happens at a specific value of r, and the distances between those values shrink by the universal constant **Feigenbaum's δ ≈ 4.6692**.
| Parameter r | Behavior | Period |
|---|---|---|
| 0 < r < 1 | Everything converges to 0 | Extinction |
| 1 < r < 3 | Stable fixed point | Period 1 |
| 3 < r < 3.449 | 2-cycle | Period 2 |
| 3.449 < r < 3.544 | 4-cycle | Period 4 |
| 3.544 < r < 3.564 | 8-, 16-cycles, ... | Doubling |
| r ≈ 3.5699... | Onset of chaos | Chaos threshold |
| 3.57 < r ≤ 4 | Chaos (with periodic windows) | Chaos |
Feigenbaum Universality
In 1975, Mitchell Feigenbaum discovered on an HP-65 calculator that the constant δ ≈ 4.669 is the same for any single-humped map (quadratic maximum). This was a shock: completely different systems follow the same scenario for the transition to chaos. It was later confirmed experimentally in hydrodynamics, electronics, and lasers.
**Sharkovskii's theorem (1964):** if a map of the real line has a cycle of period 3, then it has cycles of ALL periods. The existence of a 3-cycle guarantees chaos!
Simple deterministic rules always produce simple, predictable behavior
The logistic map x_{n+1} = r*x_n*(1-x_n) - a single line of code - produces chaos at r > 3.57: deterministic but unpredictable behavior with sensitivity to initial conditions.
Nonlinearity ($x$ multiplied by $(1-x)$) stretches and folds phase space. Even an infinitesimally small difference in initial conditions grows exponentially - the Lyapunov exponent is positive. Determinism does not equal predictability. In ML this explains training instability: two runs with identical hyperparameters but different random seeds can diverge - not a bug, but chaotic sensitivity to initial conditions.
In the logistic map x_{n+1} = r*x_n*(1-x_n), as r increases from 3 to 4, one observes:
Key Takeaways
- **Iteration $x_{n+1} = f(x_n)$** - the fundamental mechanism; SGD, Adam, PageRank power iteration are all special cases
- **Orbit** - the complete trajectory; the attractor determines where training lands - not necessarily the global minimum, but the nearest stable fixed point
- **Stability via derivative**: $|f'(x^*)| < 1$ is an attractor, $|f'(x^*)| > 1$ is a repeller; in ML this is the learning rate condition - at $\eta > 1$ gradient descent diverges
- **Period-doubling cascade** (Feigenbaum $\delta \approx 4.669$) - the universal route to chaos; sensitivity to initial conditions explains why two training runs with different seeds converge to different weights
- **Neural ODE** (Chen et al. 2018): ResNet $x_{n+1} = x_n + F(x_n)$ at $n \to \infty$ becomes the ODE $dx/dt = F(x)$ - discrete dynamics as discretization of a continuous flow
Related Topics
Discrete dynamical systems are the foundation for studying continuous systems and stability theory:
- Continuous Dynamical Systems — The continuous analogue: instead of x_{n+1}=f(x_n), use dx/dt=f(x)
- Lyapunov Stability Theory — Formalizes the notion of stability through Lyapunov functions
- Chaos Theory — In-depth study of chaotic orbits and strange attractors
Вопросы для размышления
- Weather forecasts become unreliable after 7-10 days. How does this relate to sensitivity to initial conditions in discrete systems?
- If the function f has no fixed points (for example, f(x) = x + 1), what can be said about the behavior of all orbits?
- Returning to the opening experiment: pressing cos on a calculator converges to 0.7391... How can it be proved that cos(0.7391) ≈ 0.7391?