Information Theory
Slepian-Wolf Theorem
Цели урока
- State the SW problem and the three constraints of the rate region
- Explain the random binning proof
- Describe syndrome coding (DISCUS) as a practical SW realization
- Generalize to K sources and connect to the MAC channel
Предварительные знания
- Conditional and joint entropy H(X|Y), H(X,Y)
- AEP and the method of typical sequences
- Random coding with binning techniques
- Multiterminal source-coding setup with two correlated sources
Two independent encoders do not communicate. The decoder combines their outputs. How does this allow achieving the H(X,Y) bound - the same as joint encoding?
- ZigBee sensor network: SW coding saves 40-75% of node battery energy
- DISCOVER video codec: encoding without reference frames via SW/WZ syndromes
- LoRaWAN IoT: 500 sensors with 0.9 correlation - potential 5x savings
- Federated learning: gradient transmission via SW principle, 10-50x compression
1973: a Paradoxical Theorem
David Slepian and Jack Wolf publish 'Noiseless Coding of Correlated Information Sources' in IEEE Transactions on Information Theory in 1973. The result initially seemed paradoxical: how can independent encoders achieve what a joint encoder achieves? The answer is in the joint decoder: it uses the X-Y correlation even though the encoders did not know about it. The theorem remained theoretical for 30 years. Practical codes: DISCUS (Pradhan and Ramchandran, 2003) via LDPC and turbo codes. DISCOVER (2006) - first video codec built on SW/WZ. Today SW is used in sensor networks, video coding, and federated learning.
The Slepian-Wolf Problem
2006. A ZigBee sensor network of 100 nodes collects temperature readings in an industrial space. Each node transmits independently - no coordination bus makes sense. But the data is correlated: neighboring sensors show similar values. Can encoding be done independently while achieving the same compression as joint encoding? Slepian and Wolf proved in 1973: yes.
ZigBee Sensor Network: SW in Practice
Network of N temperature sensors in an industrial space. Sensor i reads T_i = T_avg + delta_i, where delta_i ~ N(0, sigma^2). Neighboring sensors correlate: corr(delta_i, delta_j) = exp(-d_ij/L). At N=100, sigma=2 C, L=10m: H(T1,...,T100) << sum H(T_i). SW theorem: each sensor transmits H(T_i | T_rest) ~ 0.5 bits per symbol instead of H(T_i) ~ 2 bits. Battery savings: 75%.
SW theorem is part of the broader network information theory: multiple encoders, one decoder. The distinction from MAC channel: the MAC is about channel coding, SW is about source coding. Duality: MAC channel <-> SW coding. The SW rate region mirrors the MAC capacity region with I replaced by H.
What does the SW theorem guarantee for independent encoding of X and Y?
SW: independent encoders, joint decoder. Sum rate H(X,Y) suffices. Compare: independent compression to H(X)+H(Y). Savings = I(X;Y) = H(X)+H(Y)-H(X,Y). High correlation (large I(X;Y)) gives significant savings.
Random Binning: the Proof
The SW proof is elegant. Key idea: random hashing (random binning) replaces an explicit code construction. Encoder X does not know Y, but with high probability its bin contains exactly one sequence X^n that is jointly typical with the received Y^n.
Why does P(type 2 error) decrease when R_X > H(X|Y)?
Expected number of wrong jointly typical sequences in encoder X's bin: 2^{nH(X|Y)} / 2^{nR_X} = 2^{n(H(X|Y)-R_X)}. At R_X > H(X|Y) this is < 1 - no false typical element with high probability. The decoder finds the unique jointly typical (x^n, y^n).
Applications: Sensor Networks and Distributed Systems
The SW theorem is the theoretical ideal for distributed compression. Practical realizations use syndrome coding: instead of an explicit SW code, a parity-check matrix (as in LDPC) acts as the hash. This is DISCUS, turbo-SW, LDPC-SW - all the same idea: syndrome = bin hash.
DISCUS: Syndrome-Based SW Coding
DISCUS (Distributed Source Coding Using Syndromes, Pradhan and Ramchandran, 2003): the LDPC syndrome of x^n serves as the SW hash. Number of syndrome bits: nH(X|Y). Decoder: BP on a joint factor graph with side information Y^n. Wyner-Ziv video coding built on DISCUS achieved 1-2 dB below H.264 at 50x simpler encoding.
Federated learning through the SW lens: each client has a local gradient - correlated with the global gradient. If the client knew the global direction (side information at decoder = server), it would need to transmit only H(grad_local | grad_global) bits. This is precisely the SW/WZ problem for federated learning. PowerGossip (2021) uses random projections of gradients - effectively CS+SW - to save 10-50x in communication bandwidth.
Why is syndrome coding (DISCUS) a practical realization of the SW theorem?
In SW: hash = bin. In DISCUS: syndrome = bin index. Number of syndrome bits: nH(X|Y) = SW lower bound. Decoder with Y^n runs BP decoding: finds x^n in the syndrome kernel closest to the prediction from Y^n. This is a precise realization of random binning through a linear code.
K Sources and Continuous Alphabets
The two-source SW theorem generalizes to K sources. The SW region: for each subset S, the sum rate >= H(X_S | X_{S^c}). This is 2^K - 1 constraints - the same structure as the MAC capacity region. For continuous sources: the same with differential entropy h(X|Y).
LoRaWAN IoT standard for agricultural sensors: 500 soil moisture sensors with 0.9 correlation between neighbors. Without SW: 500 * H(X_i) = 500 * 4 bits = 2000 bits/second. With K-SW: H(X_1,...,X_500) ~ 100 * 4 bits = 400 bits/second (effective dimensionality ~100 out of 500). Savings 5x - decisive for battery-powered devices with 10-year lifetime.
What is the minimum achievable sum rate for K correlated sources in SW?
Sum rate >= H(X_1,...,X_K). Achievable by sequential 'onion peeling' decoding. Compare to independent coding: sum >= sum H(X_i) = H(X_1,...,X_K) + I(X_1;...;X_K). Savings = total mutual information across all K sources.
Connection to Other Topics
The SW theorem uses: AEP and typical sequences (proof), conditional entropy H(X|Y) (rate lower bound), and random coding (key technique). Duality with MAC channel: the SW region mirrors the MAC region with I(X_S;Y|X_{S^c}) replaced by H(X_S|X_{S^c}). Extensions: WZ (lossy), multiuser SW (K sources), distributed video coding. Connection to ML: SW principle in federated learning - distributed training without sharing all data.
- Wyner-Ziv Theorem — Lossy generalization of Slepian-Wolf with side information available only at the decoder
- Joint and Conditional Entropy — Achievable rates are exactly the conditional entropies H(X|Y) and H(Y|X) on the boundary
- Network Information Theory — Multiterminal source coding extends Slepian-Wolf to K correlated sources
Итоги
- SW: two independent encoders achieve sum rate H(X,Y) via joint decoding
- Region: R_X >= H(X|Y), R_Y >= H(Y|X), R_X+R_Y >= H(X,Y)
- Proof: random binning + typical sets; error ~ 2^{-n(R_X - H(X|Y))}
- DISCUS: LDPC syndrome = bin hash; practical SW realization via BP
Вопросы для размышления
- Why does joint decoding allow independent encoders to achieve maximum compression efficiency?
- What is the analogy between the SW theorem for sources and the MAC channel for channels?
- How does syndrome coding realize the 'random hash' from the SW proof?