Dynamical Systems

Discrete Dynamical Systems

Open a calculator and press cos repeatedly. Any starting number - after 50 presses: 0.7391... Always. That is an attractor. Now replace cos with $\theta \mapsto \theta - \eta \nabla L(\theta)$ yields neural network training. The same mathematical object. The same question: where does this sequence converge?

  • **Neural network training:** SGD and Adam are discrete dynamical systems on weight space; Neural ODE (Chen et al. 2018) shows that infinite-depth ResNets are the limit of a continuous ODE $dx/dt = F(x)$
  • **Diffusion models:** DDPM runs 1000 steps of a forward Markov noise process, then 1000 steps of a reverse Langevin process to generate one image - iteration all the way down
  • **Chaos and training instability:** the logistic map at $r > 3.57$ has positive Lyapunov exponent - explaining why two training runs with different random seeds diverge, not a bug but chaotic sensitivity

Iteration: The Basic Mechanism

A bank deposit at 5% annual interest. Each year the balance updates by the same rule: new balance = old balance * 1.05. That is iteration - applying the same function again and again. **Gradient descent training GPT works by the exact same principle**: $\theta_{n+1} = \theta_n - \eta \nabla L(\theta_n)$ - one rule, applied millions of times.

**Iteration** - the process of repeatedly applying a function f to an initial value x₀: x₁ = f(x₀), x₂ = f(x₁), x₃ = f(x₂), ... In general: **x_{n+1} = f(x_n)**. This is a discrete dynamical system - time advances in steps n = 0, 1, 2, 3, ...

The key idea: **the system's state is fully determined by the current value $x_n$ and the rule f**. No need to remember the entire history - only the current step. This property is called **Markovian** - the future depends only on the present. In ResNet, each layer is an iteration step $x_{n+1} = x_n + F(x_n)$; in 2018, Chen et al. showed that infinitely deep ResNets are the limit of an ordinary differential equation (Neural ODE).

Example systemState x_nRule f(x)
Bank depositBalancex * 1.05
Bacterial populationCell count2x (division)
Logistic mapPopulation fractionr*x*(1-x)
Fibonacci sequence(F_n, F_{n-1})(F_n + F_{n-1}, F_n)

Poincaré and Discrete Dynamics

Henri Poincaré in the 1880s was the first to systematically study iterated functions while investigating the three-body problem. He discovered that even simple deterministic rules can produce unpredictable behavior - 80 years before chaos was formally identified.

Given the iteration x_{n+1} = x_n / 2 with initial condition x₀ = 8, what is x₃?

Orbit: The System's History

When we run the iteration from a point x₀, the sequence of values x₀, x₁, x₂, ... forms the **orbit** of x₀. An orbit is the complete trajectory of the system. Different starting points can produce entirely different orbits.

The **orbit** of a point x₀ under the map f is the sequence {x₀, f(x₀), f(f(x₀)), f(f(f(x₀))), ...}. Notation: **O(x₀) = {f^n(x₀) : n = 0, 1, 2, ...}**, where f^n denotes n-fold application of f (not the nth power!).

The **cobweb diagram** is a powerful tool for visualizing orbits. Draw the graph of f(x) and the diagonal y = x. Each iteration step is a jump up to the graph (apply f), then horizontally to the diagonal (transfer the result to the x-axis). The result is the characteristic "cobweb" pattern.

Orbit typeBehaviorExample
Convergingx_n → x* as n → ∞f(x) = x/2, x₀ = 8 → 0
Divergingx_n → ∞ as n → ∞f(x) = 2x, x₀ = 1 → ∞
Periodicx_{n+p} = x_n for some pf(x) = -x → {1, -1, 1, -1, ...}
ChaoticDoes not converge, diverge, or repeatLogistic map at r = 4

**Sensitivity to initial conditions:** in chaotic systems, orbits starting from nearby points x₀ = 0.1000 and x₀ = 0.1001 may end up in completely different locations after 30 steps. This makes long-term prediction impossible.

On a cobweb diagram, the orbit spirals toward the intersection of f(x) and y = x. What does this mean?

Fixed Point: f(x*) = x*

**If the system reaches a point x* and stays there forever** - that is a fixed point. Mathematically: f(x*) = x*. The system gets "stuck" - applying the function once changes nothing.

A **fixed point** x* of a map f is the solution to f(x*) = x*. Geometrically: the intersection of the graph of f(x) with the diagonal y = x. Fixed points can be **stable** (attractors - nearby orbits are drawn in) or **unstable** (repellers - nearby orbits move away).

**Stability criterion** - the derivative at the fixed point. If |f'(x*)| < 1, then x* is stable: small deviations decay. If |f'(x*)| > 1, then x* is unstable: small deviations grow. If |f'(x*)| = 1 - a borderline case that requires further analysis.

|f'(x*)|TypeBehavior of nearby orbits
< 1Stable node (attractor)Monotonically converge to x*
= 0Super-stable pointVery fast convergence
> 1Unstable node (repeller)Monotonically move away from x*
-1 < f'(x*) < 0Stable spiralConverge while oscillating around x*

The intuition is simple: if f compresses the neighborhood of x* (derivative magnitude < 1), deviations shrink with each iteration. If it stretches (derivative magnitude > 1) - deviations grow.

For the map f(x) = x², the fixed points are 0 and 1. Which one is stable?

Periodic Orbits and the Road to Chaos

Let's return to the bank deposit example from the beginning of the lesson. We saw that a simple rule can produce a converging orbit. But what if the rule is a bit more complex? The **logistic map** x_{n+1} = r*x_n*(1-x_n) shows a stunning variety of behaviors depending on the parameter r.

A **periodic orbit** of period p is a set of points {x₀, x₁, ..., x_{p-1}} where f^p(x₀) = x₀ but f^k(x₀) ≠ x₀ for 0 < k < p. A 2-cycle: the system jumps between two points. Period 3: among three points. A fixed point is a cycle of period 1.

**Period-doubling cascade**: as r increases, the following sequence occurs: fixed point → 2-cycle → 4-cycle → 8-cycle → ... → chaos. Each doubling happens at a specific value of r, and the distances between those values shrink by the universal constant **Feigenbaum's δ ≈ 4.6692**.

Parameter rBehaviorPeriod
0 < r < 1Everything converges to 0Extinction
1 < r < 3Stable fixed pointPeriod 1
3 < r < 3.4492-cyclePeriod 2
3.449 < r < 3.5444-cyclePeriod 4
3.544 < r < 3.5648-, 16-cycles, ...Doubling
r ≈ 3.5699...Onset of chaosChaos threshold
3.57 < r ≤ 4Chaos (with periodic windows)Chaos

Feigenbaum Universality

In 1975, Mitchell Feigenbaum discovered on an HP-65 calculator that the constant δ ≈ 4.669 is the same for any single-humped map (quadratic maximum). This was a shock: completely different systems follow the same scenario for the transition to chaos. It was later confirmed experimentally in hydrodynamics, electronics, and lasers.

**Sharkovskii's theorem (1964):** if a map of the real line has a cycle of period 3, then it has cycles of ALL periods. The existence of a 3-cycle guarantees chaos!

Simple deterministic rules always produce simple, predictable behavior

The logistic map x_{n+1} = r*x_n*(1-x_n) - a single line of code - produces chaos at r > 3.57: deterministic but unpredictable behavior with sensitivity to initial conditions.

Nonlinearity ($x$ multiplied by $(1-x)$) stretches and folds phase space. Even an infinitesimally small difference in initial conditions grows exponentially - the Lyapunov exponent is positive. Determinism does not equal predictability. In ML this explains training instability: two runs with identical hyperparameters but different random seeds can diverge - not a bug, but chaotic sensitivity to initial conditions.

In the logistic map x_{n+1} = r*x_n*(1-x_n), as r increases from 3 to 4, one observes:

Key Takeaways

  • **Iteration $x_{n+1} = f(x_n)$** - the fundamental mechanism; SGD, Adam, PageRank power iteration are all special cases
  • **Orbit** - the complete trajectory; the attractor determines where training lands - not necessarily the global minimum, but the nearest stable fixed point
  • **Stability via derivative**: $|f'(x^*)| < 1$ is an attractor, $|f'(x^*)| > 1$ is a repeller; in ML this is the learning rate condition - at $\eta > 1$ gradient descent diverges
  • **Period-doubling cascade** (Feigenbaum $\delta \approx 4.669$) - the universal route to chaos; sensitivity to initial conditions explains why two training runs with different seeds converge to different weights
  • **Neural ODE** (Chen et al. 2018): ResNet $x_{n+1} = x_n + F(x_n)$ at $n \to \infty$ becomes the ODE $dx/dt = F(x)$ - discrete dynamics as discretization of a continuous flow

Related Topics

Discrete dynamical systems are the foundation for studying continuous systems and stability theory:

  • Continuous Dynamical Systems — The continuous analogue: instead of x_{n+1}=f(x_n), use dx/dt=f(x)
  • Lyapunov Stability Theory — Formalizes the notion of stability through Lyapunov functions
  • Chaos Theory — In-depth study of chaotic orbits and strange attractors

Вопросы для размышления

  • Weather forecasts become unreliable after 7-10 days. How does this relate to sensitivity to initial conditions in discrete systems?
  • If the function f has no fixed points (for example, f(x) = x + 1), what can be said about the behavior of all orbits?
  • Returning to the opening experiment: pressing cos on a calculator converges to 0.7391... How can it be proved that cos(0.7391) ≈ 0.7391?

Связанные уроки

  • de-01
  • calc-01-sequences
Discrete Dynamical Systems

0

1

Sign In