Stochastic Processes

Signature Methods for Time Series

Цели урока

Compute a truncated path signature and verify Chen's identity
Explain the universal approximation theorem via the signature
Apply the log-signature for compact path representation
Use the signature kernel for time series classification

Предварительные знания

Rough paths theory
Tensor algebra
Iterated integrals

Rough paths theory

Handwriting recognition at 94.5% accuracy - no neural network, no feature engineering. Just 31 signature features. A mathematical theorem about density of linear functionals becomes a production-ready ML method.

Oxford Math Finance: signature strategies for forex time series
DeepMind: sig features for irregular medical time series
Sig-Wasserstein GAN: generating financial time series
UEA Benchmark: best results on 30+ time series datasets

From rough paths to ML

Terry Lyons defined the signature as part of rough paths theory in the 1990s. First ML applications: Leerik et al. (2014) for gesture classification. Breakthrough: Chevyrev and Oberhauser (2016) proved universality. The esig Python library was created at Oxford (Reizenstein, 2017). The signature kernel via PDE: Salvi, Cass et al. (2021). Sig-GAN: Ni, Szpruch et al. (2021). Today sig methods are a standard toolbox for time series.

Path Signature and the Universal Approximation Theorem

Terry Lyons and Harald Oberhauser showed in 2016: the path signature is a universal feature for any continuous functional on paths. Google DeepMind uses it for irregular medical time series. This is not just a feature engineering trick - it is a mathematical theorem about density.

Truncated signature of order N: O(d^N) components. For d=5, N=4: (5^5-1)/(5-1) = 781 features. Log-signature is more compact: O(d^N/N) independent components via Hall basis.

Why is the path signature a universal feature for tasks on paths?

Universal approximation: linear L(S(X)) are dense in C(paths). Any continuous function on paths can be approximated by a linear combination of signature components.

Log-Signature and Computational Efficiency

Signature of order N for a d-dimensional path has (d^{N+1} - 1)/(d-1) components. At d=10, N=5: 111,111 numbers. Log-signature via Hall basis: O(d^N/N) - 5x fewer. And carries the same information.

d (dimension)	Order N	sig size	log-sig size	Compression
2	4	31	8	3.9x
3	4	121	18	6.7x
5	3	156	20	7.8x
10	3	1111	55	20x

The signature kernel K(X,Y) = inner product in the tensor algebra is computed via a PDE without explicitly computing components. Time: O(N * |X| * |Y|) instead of O(d^N * |X| * |Y|). Kernel method on paths - SVM without an explicit feature space.

Libraries: esig (Python/C++, main library), iisignature (PyTorch, faster for gradients), signatory (GPU, full backprop). For production: signatory by James Foster integrates with torch.autograd.

Why is the log-signature more compact than the signature while carrying the same information?

Chen's identity creates structural dependencies between sig components. The logarithm projects onto the free Lie algebra - the space without these dependencies. Size: O(d^N/N) vs O(d^N) for sig.

Signature Methods in ML Practice

Oxford Math Finance Group: signature features for financial time series. Signature trading strategies outperform LSTM on forex data. The reason: invariance to reparametrization - different sampling frequencies, gaps, and irregular grids do not break the features.

Sig-Wasserstein GAN (Ni et al., 2021): the discriminator uses the distance between signatures of real and generated paths. Enables training generative models for financial time series without the instability of adversarial training.

Signature features for handwriting recognition

Classifying handwritten characters

A handwritten letter is a 2D path (x(t), y(t)). Writing speed should not affect recognition - that is reparametrization invariance. The signature is invariant to it. Truncated signature of order 4 (d=2, N=4): 31 features. Linear SVM on these features achieves 94.5% accuracy on the UEA Multivariate Time Series Archive - no feature engineering, no neural networks.

Why is the signature kernel K(X,Y) advantageous for ML on paths compared to explicit feature computation?

The kernel K(X,Y) = inner product in tensor algebra, computed via PDE in O(|X|*|Y|). With explicit features of order N one needs to store O(d^N) numbers. Kernel method gives access to infinite-dimensional features in finite time.

Connections to other topics

Signature methods connect path geometry with practical ML on time series

Reservoir computing — Related topic
Kernel methods (SVM, GP) — Related topic
Rough paths theory — Related topic
Neural CDE — Related topic

Итоги

Signature: infinite tensor series of iterated integrals S(X) = (1, S^1, S^2, ...)
Chen's identity: S(X*Y) = S(X) tensor S(Y) - multiplicativity under concatenation
Universal approximation: linear L(S(X)) are dense in C(paths)
Log-signature: O(d^N/N) components vs O(d^N) for sig - N times more compact

Вопросы для размышления

Why is reparametrization invariance important for time series with irregular sampling?
How are the signature kernel and Random Fourier Features related for approximating kernel matrices?
Why does truncating the signature to order N still give good results in practice?

Связанные уроки

sp-27 — Signatures are built on the theory of iterated integrals
sp-29 — Reservoir computing is an alternative projection method for time series
sp-23 — Point processes are encoded via signature features

Path Signature and the Universal Approximation Theorem

Truncated signature of order N: O(d^N) components. For d=5, N=4: (5^5-1)/(5-1) = 781 features. Log-signature is more compact: O(d^N/N) independent components via Hall basis.

Why is the path signature a universal feature for tasks on paths?

Universal approximation: linear L(S(X)) are dense in C(paths). Any continuous function on paths can be approximated by a linear combination of signature components.

Log-Signature and Computational Efficiency

Signature of order N for a d-dimensional path has (d^{N+1} - 1)/(d-1) components. At d=10, N=5: 111,111 numbers. Log-signature via Hall basis: O(d^N/N) - 5x fewer. And carries the same information.

d (dimension)

Order N

sig size

log-sig size

Compression

3.9x

121

6.7x

156

7.8x

1111

20x