Information Theory

Kolmogorov Complexity

What makes the digits of π ordered rather than random? Why can't one build a compressor that shrinks every file? Kolmogorov complexity answers these questions by providing an individual measure of information-without probabilities.

**Data compression**: the incompressibility theorem explains why ZIP cannot shrink every file - most files are already incompressible
**Machine learning**: MDL and regularization are practical approximations of the principle "short model = good model"
**Cryptography**: randomness tests for keys rely on incompressibility - a good key must be algorithmically random

Предварительные знания

Definition of Kolmogorov Complexity

Two 24-bit strings: `010101010101010101010101` compresses to "repeat '01' 12 times" (15 chars). `1011001110100101110001` cannot be described shorter than itself. Kolmogorov complexity formalizes this: it is the length of the shortest program producing a given string. LLMs exploit low Kolmogorov complexity - language is compressible.

Key difference from Shannon entropy: entropy describes a **source** (a probability distribution), while Kolmogorov complexity describes a **specific string** without any probabilities. It is an individual measure of information, not a statistical one.

K(x) depends on the choice of universal Turing machine U, but only by a **constant**: if K_U(x) and K_V(x) are the complexities for two different machines, then |K_U(x) - K_V(x)| ≤ c_{U,V} for all x. This constant is independent of x, so for asymptotic analysis the choice of machine is irrelevant.

Three independent discoverers

Kolmogorov complexity was independently discovered by three scientists in the 1960s: Soviet mathematician Andrey Kolmogorov (1965), American Ray Solomonoff (1964), and American Gregory Chaitin (1969). Solomonoff actually preceded Kolmogorov, but Kolmogorov's name stuck. Kolmogorov himself called it "algorithmic information theory".

A string of 1,000,000 zeros has Kolmogorov complexity approximately:

Incompressibility and Randomness

Most long strings are **incompressible**. Formally: a string x is c-incompressible if K(x) ≥ |x| - c. Such strings are algorithmically random-they contain no patterns. This is the Kolmogorov definition of randomness: a string is random if it cannot be compressed.

This directly connects to the Kolmogorov definition of **randomness**: a Martin-Löf random string is an incompressible string. If a string is statistically random, it will be incompressible. Conversely, any statistical test for randomness can be used as a test for incompressibility.

Kolmogorov complexity is **uncomputable**: no algorithm can compute K(x) for arbitrary x. This follows from the halting problem: if K(x) were computable, one could solve whether a program halts. one can only approximate K(x) from above: enumerate all programs and find the shortest one outputting x, but one can never know if a shorter one exists.

Why is Kolmogorov complexity uncomputable?

Kolmogorov Complexity vs. Shannon Entropy

Shannon entropy and Kolmogorov complexity both measure "information", but from different angles. Shannon: a source is a probabilistic process, entropy describes an ensemble average. Kolmogorov: a specific string, no probabilities needed. How are they related?

Aspect	Shannon Entropy	Kolmogorov Complexity
Object	Distribution p(X)	Specific string x
Probabilities	Required	Not needed
Computability	Computable	Uncomputable
Dependence on n	Averaged over n	Individual property
Application	Coding, channel capacity	Complexity theory, randomness

The Minimum Description Length principle (MDL) is a practical approximation to K(x): choose the model that minimizes the sum of "model length + data length given the model". This is applicable in practice (unlike K) and approximates the Kolmogorov ideal. MDL is covered in lesson 19.

What is the key property that distinguishes Kolmogorov complexity from Shannon entropy?

Applications: Compression, ML, Randomness

Despite being uncomputable, Kolmogorov complexity has an enormous practical influence. It provides the theoretical foundation for data compression, machine learning, and randomness testing.

In **machine learning**, Kolmogorov complexity relates to generalization: a model that "compresses" training data (has a compact description) generalizes better. This underlies MDL (Minimum Description Length) and PAC learning. Modern neural networks seek "simple" solutions via regularization-a practical approximation to minimizing K.

Marcus Hutter (2000) used Kolmogorov complexity to define **AIXI**: a theoretically optimal intelligent agent. AIXI chooses actions maximizing reward, where the future is predicted by a weighted mixture of all computable worlds with weights 2^(-K(world)). This is uncomputable, but provides an ideal theoretical benchmark for AI research.

NCD (Normalized Compression Distance) is a practical application of Kolmogorov complexity. What does NCD(x, y) ≈ 0 indicate?

Key Ideas

**K(x)** = length of the shortest program outputting x - an individual measure of string complexity
**Incompressibility**: most strings of length n have K(x) ≈ n - randomness and incompressibility are equivalent
**Uncomputability**: K(x) cannot be computed exactly, only approximated from above via real-world compressors
**Connection to Shannon**: E[K(x)] ≈ H(X) - equal on average, but K is an individual property

Вопросы для размышления

Kolmogorov complexity is uncomputable. How does this affect practical applications - what do we lose and what do we still gain from the theory?
If randomness is defined as incompressibility, what does this say about pseudo-random numbers from a PRNG? How random are they in the Kolmogorov sense?
Occam's Razor says the simplest explanation is the best. How does Kolmogorov complexity formalize and justify this principle?

Связанные уроки

alg-01-big-o

Information Theory

Kolmogorov Complexity

**Data compression**: the incompressibility theorem explains why ZIP cannot shrink every file - most files are already incompressible
**Machine learning**: MDL and regularization are practical approximations of the principle "short model = good model"
**Cryptography**: randomness tests for keys rely on incompressibility - a good key must be algorithmically random

Предварительные знания

Definition of Kolmogorov Complexity

Three independent discoverers

A string of 1,000,000 zeros has Kolmogorov complexity approximately:

Incompressibility and Randomness

Why is Kolmogorov complexity uncomputable?

Kolmogorov Complexity vs. Shannon Entropy

Aspect	Shannon Entropy	Kolmogorov Complexity
Object	Distribution p(X)	Specific string x
Probabilities	Required	Not needed
Computability	Computable	Uncomputable
Dependence on n	Averaged over n	Individual property
Application	Coding, channel capacity	Complexity theory, randomness

What is the key property that distinguishes Kolmogorov complexity from Shannon entropy?

Applications: Compression, ML, Randomness

Despite being uncomputable, Kolmogorov complexity has an enormous practical influence. It provides the theoretical foundation for data compression, machine learning, and randomness testing.

NCD (Normalized Compression Distance) is a practical application of Kolmogorov complexity. What does NCD(x, y) ≈ 0 indicate?

Key Ideas

**K(x)** = length of the shortest program outputting x - an individual measure of string complexity
**Incompressibility**: most strings of length n have K(x) ≈ n - randomness and incompressibility are equivalent
**Uncomputability**: K(x) cannot be computed exactly, only approximated from above via real-world compressors
**Connection to Shannon**: E[K(x)] ≈ H(X) - equal on average, but K is an individual property

Вопросы для размышления

Kolmogorov complexity is uncomputable. How does this affect practical applications - what do we lose and what do we still gain from the theory?
If randomness is defined as incompressibility, what does this say about pseudo-random numbers from a PRNG? How random are they in the Kolmogorov sense?
Occam's Razor says the simplest explanation is the best. How does Kolmogorov complexity formalize and justify this principle?

Связанные уроки

alg-01-big-o

Kolmogorov Complexity

Предварительные знания

Definition of Kolmogorov Complexity

Three independent discoverers

Incompressibility and Randomness

Kolmogorov Complexity vs. Shannon Entropy

Applications: Compression, ML, Randomness

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки

Kolmogorov Complexity

Предварительные знания

Definition of Kolmogorov Complexity

Three independent discoverers

Incompressibility and Randomness

Kolmogorov Complexity vs. Shannon Entropy

Applications: Compression, ML, Randomness

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки