Information Theory

Slepian-Wolf Theorem

Цели урока

State the SW problem and the three constraints of the rate region
Explain the random binning proof
Describe syndrome coding (DISCUS) as a practical SW realization
Generalize to K sources and connect to the MAC channel

Предварительные знания

Conditional and joint entropy H(X|Y), H(X,Y)
AEP and the method of typical sequences
Random coding with binning techniques
Multiterminal source-coding setup with two correlated sources

Two independent encoders do not communicate. The decoder combines their outputs. How does this allow achieving the H(X,Y) bound - the same as joint encoding?

ZigBee sensor network: SW coding saves 40-75% of node battery energy
DISCOVER video codec: encoding without reference frames via SW/WZ syndromes
LoRaWAN IoT: 500 sensors with 0.9 correlation - potential 5x savings
Federated learning: gradient transmission via SW principle, 10-50x compression

1973: a Paradoxical Theorem

David Slepian and Jack Wolf publish 'Noiseless Coding of Correlated Information Sources' in IEEE Transactions on Information Theory in 1973. The result initially seemed paradoxical: how can independent encoders achieve what a joint encoder achieves? The answer is in the joint decoder: it uses the X-Y correlation even though the encoders did not know about it. The theorem remained theoretical for 30 years. Practical codes: DISCUS (Pradhan and Ramchandran, 2003) via LDPC and turbo codes. DISCOVER (2006) - first video codec built on SW/WZ. Today SW is used in sensor networks, video coding, and federated learning.

The Slepian-Wolf Problem

2006. A ZigBee sensor network of 100 nodes collects temperature readings in an industrial space. Each node transmits independently - no coordination bus makes sense. But the data is correlated: neighboring sensors show similar values. Can encoding be done independently while achieving the same compression as joint encoding? Slepian and Wolf proved in 1973: yes.

ZigBee Sensor Network: SW in Practice

Network of N temperature sensors in an industrial space. Sensor i reads T_i = T_avg + delta_i, where delta_i ~ N(0, sigma^2). Neighboring sensors correlate: corr(delta_i, delta_j) = exp(-d_ij/L). At N=100, sigma=2 C, L=10m: H(T1,...,T100) << sum H(T_i). SW theorem: each sensor transmits H(T_i | T_rest) ~ 0.5 bits per symbol instead of H(T_i) ~ 2 bits. Battery savings: 75%.

SW theorem is part of the broader network information theory: multiple encoders, one decoder. The distinction from MAC channel: the MAC is about channel coding, SW is about source coding. Duality: MAC channel <-> SW coding. The SW rate region mirrors the MAC capacity region with I replaced by H.

What does the SW theorem guarantee for independent encoding of X and Y?

SW: independent encoders, joint decoder. Sum rate H(X,Y) suffices. Compare: independent compression to H(X)+H(Y). Savings = I(X;Y) = H(X)+H(Y)-H(X,Y). High correlation (large I(X;Y)) gives significant savings.

Random Binning: the Proof

The SW proof is elegant. Key idea: random hashing (random binning) replaces an explicit code construction. Encoder X does not know Y, but with high probability its bin contains exactly one sequence X^n that is jointly typical with the received Y^n.

Why does P(type 2 error) decrease when R_X > H(X|Y)?

Expected number of wrong jointly typical sequences in encoder X's bin: 2^{nH(X|Y)} / 2^{nR_X} = 2^{n(H(X|Y)-R_X)}. At R_X > H(X|Y) this is < 1 - no false typical element with high probability. The decoder finds the unique jointly typical (x^n, y^n).

Applications: Sensor Networks and Distributed Systems

The SW theorem is the theoretical ideal for distributed compression. Practical realizations use syndrome coding: instead of an explicit SW code, a parity-check matrix (as in LDPC) acts as the hash. This is DISCUS, turbo-SW, LDPC-SW - all the same idea: syndrome = bin hash.

DISCUS: Syndrome-Based SW Coding

DISCUS (Distributed Source Coding Using Syndromes, Pradhan and Ramchandran, 2003): the LDPC syndrome of x^n serves as the SW hash. Number of syndrome bits: nH(X|Y). Decoder: BP on a joint factor graph with side information Y^n. Wyner-Ziv video coding built on DISCUS achieved 1-2 dB below H.264 at 50x simpler encoding.

Federated learning through the SW lens: each client has a local gradient - correlated with the global gradient. If the client knew the global direction (side information at decoder = server), it would need to transmit only H(grad_local | grad_global) bits. This is precisely the SW/WZ problem for federated learning. PowerGossip (2021) uses random projections of gradients - effectively CS+SW - to save 10-50x in communication bandwidth.

Why is syndrome coding (DISCUS) a practical realization of the SW theorem?

In SW: hash = bin. In DISCUS: syndrome = bin index. Number of syndrome bits: nH(X|Y) = SW lower bound. Decoder with Y^n runs BP decoding: finds x^n in the syndrome kernel closest to the prediction from Y^n. This is a precise realization of random binning through a linear code.

K Sources and Continuous Alphabets

The two-source SW theorem generalizes to K sources. The SW region: for each subset S, the sum rate >= H(X_S | X_{S^c}). This is 2^K - 1 constraints - the same structure as the MAC capacity region. For continuous sources: the same with differential entropy h(X|Y).

LoRaWAN IoT standard for agricultural sensors: 500 soil moisture sensors with 0.9 correlation between neighbors. Without SW: 500 * H(X_i) = 500 * 4 bits = 2000 bits/second. With K-SW: H(X_1,...,X_500) ~ 100 * 4 bits = 400 bits/second (effective dimensionality ~100 out of 500). Savings 5x - decisive for battery-powered devices with 10-year lifetime.

What is the minimum achievable sum rate for K correlated sources in SW?

Sum rate >= H(X_1,...,X_K). Achievable by sequential 'onion peeling' decoding. Compare to independent coding: sum >= sum H(X_i) = H(X_1,...,X_K) + I(X_1;...;X_K). Savings = total mutual information across all K sources.

Connection to Other Topics

The SW theorem uses: AEP and typical sequences (proof), conditional entropy H(X|Y) (rate lower bound), and random coding (key technique). Duality with MAC channel: the SW region mirrors the MAC region with I(X_S;Y|X_{S^c}) replaced by H(X_S|X_{S^c}). Extensions: WZ (lossy), multiuser SW (K sources), distributed video coding. Connection to ML: SW principle in federated learning - distributed training without sharing all data.

Wyner-Ziv Theorem — Lossy generalization of Slepian-Wolf with side information available only at the decoder
Joint and Conditional Entropy — Achievable rates are exactly the conditional entropies H(X|Y) and H(Y|X) on the boundary
Network Information Theory — Multiterminal source coding extends Slepian-Wolf to K correlated sources

Итоги

SW: two independent encoders achieve sum rate H(X,Y) via joint decoding
Region: R_X >= H(X|Y), R_Y >= H(Y|X), R_X+R_Y >= H(X,Y)
Proof: random binning + typical sets; error ~ 2^{-n(R_X - H(X|Y))}
DISCUS: LDPC syndrome = bin hash; practical SW realization via BP

Вопросы для размышления

Why does joint decoding allow independent encoders to achieve maximum compression efficiency?
What is the analogy between the SW theorem for sources and the MAC channel for channels?
How does syndrome coding realize the 'random hash' from the SW proof?

Связанные уроки

it-28 — Wyner-Ziv generalizes SW to lossy coding with decoder-side information
it-23 — MAC channel and SW coding are dual multiuser problems
it-03 — conditional entropy H(X|Y) is the key SW bound