Stochastic Processes

Point Processes

Цели урока

Define the conditional intensity function and the likelihood of a point process
Simulate and fit the Hawkes process via maximum likelihood
Apply the spatial Hawkes (ETAS) model to earthquake aftershock sequences
Connect temporal point processes to neural sequence models

Предварительные знания

Poisson processes
Queueing theory
Maximum likelihood estimation

An earthquake strikes. In the next 24 hours the probability of a magnitude 6+ aftershock is 8%. In 48 hours: 4%. The Omori-Utsu law has been verified on thousands of sequences worldwide. Point process theory is earthquake forecasting.

NYSE: Hawkes process for high-frequency trade clustering
Japan JMA: ETAS model for operational earthquake forecasting
Google DeepMind: neural Hawkes for earthquake catalog modeling
Netflix: marked temporal point processes for user engagement prediction

Hawkes, Ogata, and the self-exciting era

Alan Hawkes introduced the self-exciting process in 1971 to model earthquake aftershocks and financial contagion. Yosihiko Ogata developed the ETAS model and the thinning simulation algorithm in 1981-1988 - the standard tool for seismological analysis. Vere-Jones and colleagues established the theoretical foundations. The modern neural turn: Mei and Eisner (2017) introduced RMTPP; Zuo et al. (2020) introduced the Transformer Hawkes process.

Point Processes and Conditional Intensity

The Hawkes process drives trading algorithms on the NYSE: each large trade generates a cluster of subsequent trades at intensity lambda(t) reaching 1,500 transactions per second. The model explains market self-excitation and predicts volatility spikes with 73% accuracy over 5-minute horizons.

Neural spike trains as point processes

Modeling neuron firing in auditory cortex

A neuron in the auditory cortex fires in response to sound. Between stimuli: a baseline Poisson process at 10 Hz. After a stimulus: the firing rate jumps to 80 Hz and decays with time constant 50 ms. This is an inhomogeneous Poisson process with time-varying intensity. The likelihood of observing a given spike train is computed through the point process likelihood formula. Maximum likelihood estimation recovers the neuron's tuning curve.

Ogata's thinning algorithm simulates any point process with a known intensity upper bound. It is exact (not approximate) and works for any intensity function.

Under what condition is the Hawkes process stationary?

Stationarity of Hawkes: n = alpha/beta < 1. At n >= 1 the mean number of events per unit time diverges. The branching interpretation: each event generates on average n children.

Spatial Point Processes and the ETAS Model

Japan's earthquake early warning system detects the P-wave and predicts S-wave arrival within 2 seconds. The Epidemic Type Aftershock Sequence (ETAS) model powers the statistical layer: each quake triggers aftershocks that trigger their own aftershocks. It is Hawkes in space-time.

Google DeepMind's 2023 paper on earthquake forecasting uses a neural network to learn the triggering kernel g from the JMA earthquake catalog. The neural Hawkes process replaces the parametric ETAS kernel with a learned function - and achieves 20% better log-likelihood on held-out earthquakes.

The Void function (K-function in spatial statistics) measures clustering: K(r) = expected number of additional events within distance r of a typical event. For a homogeneous Poisson process K(r) = pi * r^2. Excess clustering (K(r) > pi * r^2) indicates self-excitation.

Why does the ETAS model use a power-law decay in time (Omori-Utsu) rather than exponential?

The Omori-Utsu law (1894-1961) states aftershock rate decays as t^{-p} with p around 1. This is empirically established across thousands of earthquake sequences worldwide.

Neural Point Processes and Event Sequence Modeling

User behavior on an e-commerce platform is a marked point process: purchases, clicks, returns - each is an event with a timestamp and a feature vector (the mark). The Transformer Hawkes Process (Zuo et al., 2020) learns the conditional intensity using attention mechanisms. This is Hawkes meets GPT.

Recurrent Marked Temporal Point Process (RMTPP, Du et al., 2016) uses an LSTM to encode history. Training objective: maximize the log-likelihood of observed event sequences. At inference: sample the next event time by inverting the CDF through numerical integration.

Content recommendation as a point process

User engagement modeling on streaming platforms

Netflix models user watch events as a marked temporal point process: each view is a point, the mark is the content category and watch duration. The conditional intensity lambda*(t) predicts the probability the user will watch something in the next hour. A Hawkes-style excitation: watching action movies increases the probability of watching action content in the next 2 hours. Lambda*(t) drops after prolonged engagement (inhibition).

For fitting point processes to data, the compensator Lambda*(t) = integral_0^t lambda*(s) ds transforms any point process to a standard Poisson process. This is the time-change theorem - useful for goodness-of-fit testing.

What advantage does a neural conditional intensity function have over the parametric Hawkes model?

The neural intensity function is a universal function approximator applied to the event history. It captures non-parametric triggering patterns that fixed-form kernels (exponential, power-law) cannot.

Connections to other topics

Point processes link stochastic analysis, spatial statistics, and sequence modeling in ML

Levy processes — Related topic
Neural sequence models — Related topic
Seismology — Related topic
High-frequency finance — Related topic

Итоги

Hawkes process: lambda*(t) = mu + sum alpha*exp(-beta*(t-t_i)) - self-exciting, branching number n < 1 for stationarity
Likelihood: product of intensities at events times survival probability between events
ETAS: space-time Hawkes with Omori-Utsu temporal kernel and magnitude scaling
Neural point processes: neural network as a universal conditional intensity function

Вопросы для размышления

How does the branching representation of the Hawkes process connect to Galton-Watson branching processes?
Why does the time-change theorem transform any point process to a standard Poisson process?
What is the difference between a marked and an unmarked temporal point process in terms of likelihood?

Связанные уроки

sp-22 — Queueing theory uses Poisson arrival processes
sp-17 — Hawkes and spatial processes generalize the Poisson process
sp-24-levy-processes — Levy processes use Poisson random measures for jumps