Computability

Kolmogorov Complexity and Randomness

What does 'random' really mean? Why can't zip compress an mp3? Why does regularization work in neural networks? All these questions have one answer - Kolmogorov complexity, the deepest measure of information.

Compression algorithms (zip, gzip) - practical K-approximators
Regularization in ML - the MDL principle in action
Cryptographically secure PRNGs: incompressible (but not random!)
Martin-Löf test: a formal definition of randomness

Kolmogorov Complexity: Information in a Program

The **Kolmogorov complexity** K(x) of a string x is the length of the shortest program on a fixed universal language (e.g., Python) that outputs x. It is the formal measure of the informational content of a string.

**Why is K uncomputable?** If an algorithm A could compute K(x), you could construct a program generating a string with K greater than its own length - a contradiction (Constructivity Paradox). This is the same idea as Cantor's diagonalization.

Why is Kolmogorov complexity uncomputable?

Incompressibility and Martin-Löf Randomness

**Martin-Löf randomness:** A string x is called (algorithmically) random if K(x) ≥ |x| - O(1). Random strings are incompressible - there is no pattern to exploit.

**Incompressibility ≠ randomness in practice:** PRNGs produce incompressible sequences, but they are deterministic (K = O(1)). True randomness requires physical sources (thermal noise, quantum effects).

The first 10,000 digits of π: what is their Kolmogorov complexity (approximately)?

MDL: Minimum Description Length in ML

**MDL (Minimum Description Length)** is a principle in machine learning based on K-complexity: the best model is the one that describes the data most concisely. It is a formalization of Occam's Razor.

**Solomonoff induction:** Solomonoff generalized MDL into a universal learning principle: a probability distribution weighted by K-complexity is the optimal Bayesian predictor. This is the mathematical foundation for AGI - theoretically optimal learning.

MDL and K-complexity are theoretical concepts. In ML the winning model is the one with lower test loss, no "description length" is needed.

MDL formalizes Occam's razor and is equivalent to Bayesian model selection with a uniform prior. Regularization (L1, L2, dropout), early stopping, minimal architecture size are all MDL heuristics: model complexity is penalized in favor of generalization. K-complexity is uncomputable, but its approximations (description length, MDL, BIC) already work.

Test loss looks like an exhaustive metric, so "description length" theory seems redundant. In practice architecture choice, hyperparameters and regularization are choices about shortest description at fixed loss. Without MDL intuition, overparameterized models lose on out-of-distribution data because they describe noise, not signal. Solomonoff induction is the formal limit of this logic.

How does regularization (L2/Ridge) relate to the MDL principle?

Key Takeaways

K(x) = length of the shortest program that generates x
K(x) is uncomputable - reduces to the Halting Problem
Algorithmically random string = incompressible (K(x) ≈ |x|)
Most strings are incompressible (simple counting argument)
MDL: the best model minimizes |model| + |data|model|

Вопросы для размышления

If K(x) is uncomputable, how do compression algorithms work in practice? What is their limitation?
Why can't you prove that a specific string (e.g., SHA256(0)) is algorithmically random?
How does the MDL principle explain why neural networks with dropout generalize better?

Связанные уроки

comp-09 — Undecidability of the halting problem is part of the incompressibility proof
comp-14 — Lambda calculus defines what a program is for Kolmogorov complexity
comp-16 — Descriptive complexity - the next lens on information-theoretic limits
crypto-05-information-theory — Shannon entropy and Kolmogorov complexity are two languages for measuring information
fl-01-intro