Stochastic Processes
Stochastic Processes: Definitions
Stable Diffusion draws one image in 1000 steps. Each step is a Langevin stochastic process: $x_{t-1} = \frac{1}{\sqrt{\alpha_t}}(x_t - \frac{1-\alpha_t}{\sqrt{1-\bar{\alpha}_t}} \epsilon_\theta) + \sigma_t z$. A thousand iterations of a stochastic process - and a face emerges from pure noise. Not magic. A reverse Markov chain.
- **Diffusion models (DDPM):** DALL-E 3, Stable Diffusion, Midjourney - all reverse stochastic processes; the forward process adds noise over 1000 Markov steps, the reverse process recovers the image
- **Reinforcement Learning:** the environment in RL is formally a Markov Decision Process (MDP); $\mathbb{P}(s_{t+1} | s_t, a_t)$ is the transition kernel of a stochastic process
- **Bayesian inference:** MCMC (Metropolis-Hastings, NUTS) constructs stochastic processes whose stationary distribution is the target posterior; ergodicity guarantees convergence
Stochastic Process
A stock price changes every second. Temperature fluctuates every minute. Packet counts on a network jump every millisecond. All of these are random processes. A **stochastic process** {X(t), t ∈ T} is a family of random variables indexed by a parameter t (usually time).
Formally: a stochastic process is a function of two arguments X(t, ω), where t is time and ω is an element of a probability space. Fixing ω yields a single concrete trajectory (realization). Fixing t yields a random variable.
Processes may have **discrete time** (Markov chains, random walks - t = 0, 1, 2, ...) or **continuous time** (Brownian motion, Poisson process - t ∈ R+). The state space may also be discrete or continuous.
| Process type | Time | Values | Example |
|---|---|---|---|
| Markov chain | Discrete | Discrete | Weather (sunny/rainy) |
| Random walk | Discrete | Continuous | Stock price (discrete days) |
| Poisson process | Continuous | Discrete | Number of calls at a call center |
| Brownian motion | Continuous | Continuous | Motion of a particle in liquid |
What do we get by fixing time t=t₀ in a stochastic process X(t, ω)?
Realization (Trajectory)
One concrete "unfolding" of a stochastic process is called a **realization** or **trajectory**. If the process is the entire ensemble of possible histories, a realization is one particular history that we observe.
Yesterday's Bitcoin price is one realization. Today's price is another realization of the same process. In practice we usually observe a single realization and try to infer the properties of the whole process from it.
The **ensemble** is the set of all possible realizations of the process. Statistics can be computed in two ways: **over the ensemble** (fix t, average over ω) or **over time** (fix ω, average over t). The key question: do these two types of averaging coincide?
In practice we often have only one realization (one patient, one market, one planet). This makes ergodicity critically important: can we infer the statistical properties of the entire process from a single long realization?
We have temperature data for one year (one realization). What can we estimate without assuming ergodicity?
Stationarity
A process is **stationary** if its statistical properties do not change under a time shift. The temperature at noon on January 1st and at noon on July 1st have different distributions - the process is non-stationary. But noise in an electronic circuit may be stationary: its statistics are the same morning and evening.
**Strict stationarity:** all finite-dimensional distributions are invariant under time shifts: P(X(t₁),...,X(tₙ)) = P(X(t₁+τ),...,X(tₙ+τ)) for any τ. **Wide-sense stationarity (WSS):** E[X(t)] = const, Cov(X(t), X(t+τ)) depends only on τ.
Stationarity is a critical assumption for time-series analysis. Many methods (autocorrelation, spectral analysis) work only for stationary processes. A non-stationary series is often converted to a stationary one by differencing or detrending.
| Process | Stationary? | Why |
|---|---|---|
| White noise | Yes (strictly) | i.i.d., all moments are constant |
| AR(1) with |φ|<1 | Yes (WSS) | Mean and covariance do not depend on t |
| Random walk | No | Variance grows linearly with t |
| Seasonal sales series | No | Mean changes periodically |
The random walk X(t) = X(t-1) + ε(t) is non-stationary because:
Ergodicity
Stationarity tells us that the statistics do not change over time. **Ergodicity** goes further: the time average of a single infinitely long realization equals the ensemble average over all possible realizations. This allows us to study the process from just one trajectory.
**Birkhoff's Theorem:** For a stationary ergodic process: lim(T→∞) (1/T) · ∫₀ᵀ X(t)dt = E[X(t)] almost surely. The time average converges to the ensemble mean.
Not every stationary process is ergodic. Consider: we toss a coin once, and if heads - X(t) = +1 for all t, if tails - X(t) = -1 for all t. The process is stationary (the distribution does not depend on t), but the time average is +1 or -1, while the ensemble average is 0.
Ludwig Boltzmann and the Ergodic Hypothesis
The term "ergodic" was introduced by Ludwig Boltzmann in the 1870s to describe the behavior of gas molecules. His hypothesis: a gas, given enough time, passes through all admissible states. A rigorous proof (the ergodic theorem) was given by Birkhoff and von Neumann in the 1930s.
The practical significance of ergodicity is enormous. If a process is ergodic - a single long observation suffices to estimate all its statistical properties. If not - multiple independent realizations are needed, which is often impossible (we have one economy, one climate, one patient).
A stationary process is constant (does not change over time)
A stationary process fluctuates and can take different values, but its statistical properties (mean, variance, covariance) do not depend on the moment of observation.
White noise is a perfect stationary process, yet it changes continuously. Stationarity is a property of the statistics, not of individual realizations. In ML: the DDPM forward process is non-stationary (variance grows), but the reverse process is specifically constructed so its stationary distribution is the target data density.
Process: generate μ once from N(0,1), then X(t) = μ + ε(t) with ε ~ N(0,0.01). This process is:
Key Ideas
- **Stochastic process** $\{X(t)\}$ - a family of random variables parameterized by time; fixing $\omega$ gives a realization, fixing $t$ gives a random variable
- **Realization** - one trajectory; 1000 DDPM denoising steps = one realization of the reverse Markov process = one generated image
- **Stationarity** - statistics do not depend on the moment of observation; white noise is stationary, random walk is not ($\text{Var}(X(t)) = t\sigma^2$ grows)
- **Ergodicity** - time mean = ensemble mean; MCMC works precisely because the constructed Markov chain is ergodic with the desired stationary distribution
Related Topics
Stochastic processes are the foundation for Markov chains and beyond:
- Discrete-Time Markov Chains — The most important class of stochastic processes with the Markov property
- Continuous-Time Markov Chains — Extension to continuous time - models for queues and chemical reactions
Вопросы для размышления
- Is the heartbeat process stationary? Ergodic? What statistical test would distinguish the two cases?
- Why does averaging temperatures over 10 years not allow predicting tomorrow's weather?
- MCMC samples from a posterior by exploiting ergodicity. What happens to the algorithm if the Markov chain turns out to be non-ergodic?