Cryptography
Information Theory and Entropy
In 1949 Claude Shannon - the father of information theory - mathematically proved that there is exactly one unbreakable cipher. Not "hard to break", but provably, absolutely, mathematically impossible to break, even with infinite computational resources. That cipher is the One-Time Pad. But Shannon also proved why we cannot use it for everything - and this tension between perfect security and practicality drives all of modern cryptography.
- **Password strength estimation** - password managers (1Password, Bitwarden) evaluate password strength through entropy: "password123" has ~20 bits of entropy (crackable in seconds), while 20 random characters yield ~130 bits (immune to brute force). NIST recommends a minimum of 80 bits of entropy for passwords.
- **The VENONA project (1943-1980)** - the NSA and CIA decrypted thousands of Soviet intelligence messages because one-time pads were reused. A perfect cipher, undone by breaking a single rule, led to the exposure of atomic spies Klaus Fuchs and Julius Rosenberg.
- **Compression before encryption** - TLS and archive formats (ZIP, 7z) compress data before encrypting not only to save space: compression reduces redundancy, increases unicity distance, and complicates cryptanalysis. Shannon showed that incompressible data contains maximum entropy.
Предварительные знания
Shannon Entropy
In 1948 Claude Shannon published "A Mathematical Theory of Communication", in which he gave the first **mathematical definition of information**. Before Shannon, information was an intuitive concept. After him - a precise quantity measured in bits. The key insight: information is a measure of **uncertainty**. The more surprising an event, the more information it carries. Being told that the sun rose this morning conveys almost no information (probability close to 1). Being told that a lottery was won (probability 1/10,000,000) conveys a great deal of information.
**Entropy H(X)** is the *average* information across all possible outcomes of a random variable X. Formula: **H(X) = -sum(p(x) * log2(p(x)))** over all x. Entropy is maximum when all outcomes are equally likely (maximum uncertainty) and equals zero when one outcome has probability 1 (no uncertainty). For cryptography this is key: the entropy of a key measures **how many bits of real uncertainty** it contains.
**Why entropy matters for cryptography:** - **Keys:** a 128-bit AES key has entropy 128 bits if every bit is random. But the password "password123" has 88 bits of length (11 characters × 8 bits) yet only ~20 bits of entropy, because the characters are predictable. - **Random number generators:** if a PRNG is seeded with 32 bits of entropy, its entire output contains at most 32 bits of entropy - no matter how many bytes it produces. - **Ciphertext:** a good cipher produces output with maximum entropy (~8 bits/byte). If the entropy of the ciphertext is lower, the cipher is leaking structure from the plaintext.
An important result of Shannon's: **entropy is a lower bound on compression**. Data cannot be compressed to less than its entropy without loss of information. English text (~1.3 bits/symbol at 8 bits/symbol in ASCII) can be compressed roughly sixfold. Random data (8 bits/byte) cannot be compressed at all. The direct implication for cryptography: **ideal ciphertext is indistinguishable from random data and incompressible**.
A password consists of 8 characters, but the user always picks a word from a dictionary of 50,000 words. What is the entropy of this password?
Perfect Secrecy
In 1949 Shannon published a second revolutionary paper - "Communication Theory of Secrecy Systems". In it he applied the mathematical tools of information theory to cryptography for the first time and formulated the concept of **perfect secrecy**. The definition is rigorous: a cipher has perfect secrecy if for every message M and every ciphertext C it holds that **P(M | C) = P(M)**. In other words, knowing the ciphertext **does not change** the probability that any particular message was encrypted.
It is important to appreciate how strong this statement is. Perfect secrecy does not mean "hard to break" or "takes a long time". It means that **even with unlimited computational resources** the adversary cannot extract a single bit of information about the message from the ciphertext. A quantum computer, all the computers on Earth, a million years of computation - nothing helps. This is **information-theoretic** security, as opposed to the **computational** security of AES or RSA, which rely on the hardness of specific mathematical problems.
**Shannon's theorem on necessary conditions for perfect secrecy:** To achieve perfect secrecy it is **necessary** that: 1. **Key space ≥ message space** - there must be at least as many keys as possible messages. In practice: key length ≥ message length. 2. **Each key is used exactly once** - reuse of a key completely destroys secrecy. 3. **The key is chosen uniformly at random** - if some keys are more probable, the adversary can exploit that. 4. **The key is independent of the message** - the choice of key must not depend on the content of the message. These conditions are not merely sufficient - they are **necessary**. They cannot be circumvented.
Shannon's result imposes a fundamental constraint: **perfect secrecy requires a key at least as long as the message**. This makes it impractical for most applications. Encrypting a gigabyte of data with perfect secrecy requires a gigabyte-long key that must be delivered securely to the recipient. But if a secure channel exists for a gigabyte-long key, why is encryption needed at all - just send the message over that channel!
A cipher has perfect secrecy. What does this mean for an adversary who has intercepted the ciphertext?
One-Time Pad (OTP)
The One-Time Pad is the only known cipher with **provable perfect secrecy**. The idea is simple to the point of elegance: each bit of the message is XOR-ed with the corresponding bit of a **random** key of equal length. The cipher was patented by Gilbert Vernam in 1917 (Vernam cipher), and its perfect secrecy was proved by Shannon in 1949. How it works: C = M XOR K; decryption: M = C XOR K. The key property of XOR: it perfectly masks the original bit whenever the key bit is random.
The last lines of code above demonstrate the essence of perfect secrecy: for any ciphertext and any hypothetical message, **there exists a key** that maps the ciphertext to that message. An adversary who has intercepted the ciphertext cannot know whether "Attack at dawn" or "Nothing here!" was encrypted - both are equally likely. This is P(M|C) = P(M) in practice.
**VENONA: the catastrophe of key reuse** During World War II, Soviet intelligence used One-Time Pads to encrypt messages. But due to a shortage of key material, some pad pages were **reused**. From 1943 to 1980 the VENONA project (NSA/CIA) broke these messages: If K is used twice: - C1 = M1 XOR K - C2 = M2 XOR K - C1 XOR C2 = M1 XOR M2 (the key cancels out!) M1 XOR M2 retains the structure of both messages. Using frequency analysis and known fragments (names, dates, standard phrases), cryptanalysts recovered hundreds of messages. Spy networks were exposed, including atomic spies.
**Why OTP is impractical for everyday use:** 1. **Key length = message length.** To encrypt 1 GB of data, a 1 GB random key is needed. 2. **Key delivery is the same problem.** If there is a secure channel for a 1 GB key, the message itself could be sent over that channel. 3. **True randomness is expensive.** os.urandom() uses a hardware generator, but its throughput is limited. 4. **Single use.** Each bit of the key is used exactly once, then destroyed. Storing huge volumes of key material is a significant challenge. 5. **No authentication.** Basic OTP provides no protection against modification: an adversary can XOR known bits to alter the message.
Despite its impracticality, OTP has been used - and is still used - in situations where cost is no object and security is critical. The "red phone" (the direct Washington-Moscow hotline) used OTP. The diplomatic channels of some countries still use OTP for their highest-classification messages. Keys are delivered by diplomatic couriers in physical containers.
Soviet intelligence used One-Time Pads, yet their messages were broken by the VENONA project. Why?
Unicity Distance
OTP with a key as long as the message is the only perfectly secret cipher. But real ciphers use short keys (128-256 bits). This raises the question: **how much ciphertext must be intercepted to theoretically identify a unique key?** This leads to the concept of **unicity distance** - the minimum length of ciphertext at which the key is determined uniquely.
Shannon derived the formula: **U = H(K) / D**, where U is the unicity distance, H(K) is the entropy of the key, and D is the **redundancy** of the language in bits per symbol. Redundancy D = R − r, where R is the maximum entropy (log2 of alphabet size) and r is the real entropy of the language. For English: R = log2(26) = 4.7 bits, r = 1.0-1.5 bits/symbol, D = 3.2-3.7 bits/symbol. Redundancy is the structure present in the language (letter frequencies, grammatical rules, limited vocabulary) that allows distinguishing meaningful text from random data.
Unicity distance is a **theoretical** measure. It says: "starting from this length, the key *in principle* may be determined uniquely". But this does not mean it can *in practice* be found. For AES-128 the unicity distance is about 19 bytes, yet no known method can actually determine an AES key even from terabytes of intercepted text. The difference lies between information-theoretic and computational security.
**Practical significance of unicity distance:** - **Small U (< typical message length):** most intercepted messages are long enough for theoretical key determination. Security of the cipher depends entirely on computational hardness of the search. - **Large U (> typical message length):** even theoretically the key is not uniquely determined. The cipher retains some information-theoretic security margin. - **U = infinity (OTP):** the key is never determined. Perfect secrecy. - **Higher language redundancy D → smaller U:** structured data (text, code) "reveals" the key faster than compressed data. This is why compressing before encrypting increases U.
Unicity distance explains why **compressing data before encryption** increases security: compressed data has entropy close to maximum (r close to R), redundancy D approaches zero, and U approaches infinity. It also explains why classical ciphers (Caesar, Vigenère, simple substitution) are so easily broken: their small key spaces combined with the high redundancy of natural language give a tiny U - 2 to 25 characters.
One-Time Pad is the ideal cipher and should be used everywhere - it is the only unbreakable one
OTP is perfect in theory but impractical: the key must be as long as the message, and delivering it securely is the same problem as delivering the message itself. Modern cryptography consciously trades perfect secrecy for computational security with short keys.
Shannon's theorem proves: perfect secrecy REQUIRES a key no shorter than the message. This is not a technical limitation that can be engineered away - it is a mathematical fact. If a key the size of the message can be securely delivered, encryption is unnecessary. AES-256 with a 256-bit key provides computational security for messages of any size, and no known attack can break it. Practical security matters more than theoretical perfection.
The unicity distance for AES-128 encrypting English text is approximately 19 bytes. Does this mean AES-128 is insecure for messages longer than 19 bytes?
Key Ideas
- **Shannon entropy H(X) = -sum(p(x) * log2(p(x)))** measures real uncertainty: a fair coin = 1 bit, a die = 2.58 bits, English text = 1.0-1.5 bits/symbol. The key cryptographic takeaway: the entropy of a key determines its true strength, not its bit-length.
- **Perfect secrecy P(M|C) = P(M)** means the ciphertext contains not a single bit of information about the message. Shannon's theorem: to achieve this, the key must be at least as long as the message, used only once, and uniformly random.
- **One-Time Pad** is the only provably unbreakable cipher: C = M XOR K with a uniformly random key of message length. Its impracticality (the key delivery problem) led to the deliberate trade-off toward computational security.
- **Unicity distance U = H(K) / D** indicates how much ciphertext is needed to theoretically determine the key. For OTP it is infinite (perfect secrecy); for AES-128 it is only 19 bytes, but computational security compensates. This trade-off between ideal security and practicality, which Shannon proved necessary, defines the architecture of all modern cryptography.
Related Topics
Shannon's information theory is the foundation on which all cryptography rests - from key generation to evaluating cipher strength:
- Probability and Randomness — Entropy is defined through probabilities, and perfect secrecy requires truly random keys. Random number generators are the practical side of Shannon's theory: their quality directly determines the entropy of the keys they produce.
- Introduction to Cryptography — Information theory formalizes what was intuitive in the introduction: why some ciphers are strong and others are not, and what 'secure cipher' means mathematically. Shannon gave the first rigorous definition of cryptographic security.
- Classical Cryptanalysis — Unicity distance explains why classical ciphers (Caesar, Vigenère, simple substitution) are so easily broken: their small key spaces combined with the high redundancy of natural language give a tiny U.
- AES — AES is the practical answer to Shannon's theorem: since perfect secrecy is impractical, AES provides computational security with a short 128/256-bit key. Its unicity distance is small, but computational hardness compensates.
Вопросы для размышления
- Shannon proved that perfect secrecy requires a key no shorter than the message. Does this mean that all real ciphers (AES, RSA) are merely 'temporarily secure' and will eventually be broken? Or can computational security be sufficient indefinitely?
- If data is compressed before encryption to maximum entropy (eliminating all redundancy), the unicity distance approaches infinity. Why does this not make compression-then-encryption equivalent to perfect secrecy?
- The VENONA project showed that even a perfect cipher is vulnerable when its usage rules are violated. What other 'human factors' can destroy a theoretically secure system?