Information Theory

Wyner-Ziv Theorem

Цели урока

State the WZ theorem and explain why side information is free for the Gaussian source
Describe the nested lattice construction for WZ coding
Connect WZ to Distributed Video Coding and the DISCOVER codec
Explain the connection between the Information Bottleneck and the WZ theorem

Предварительные знания

Slepian-Wolf bounds for lossless distributed source coding
Rate-distortion function R(D) and mutual information minimization
Side information at the decoder and Markov chain X - Y - Z
Nested lattice quantization and lossy binning

How can an encoder that does not see side information Y code X as efficiently as if it did? Why are nested lattices needed?

H.264/AVC: each intra frame is a practical WZ problem with decoder-side context
DISCOVER DVC: IP cameras with WZ coding - 50x simpler encoding than H.264
VAE with beta-regularization: a direct neural implementation of WZ/IB
Federated learning: WZ principle for efficient gradient transmission

From SW to WZ: 3 Years and a Generalization

Slepian-Wolf (1973): lossless compression with side information. Wyner and Ziv extended the problem: what if distortion is allowed and side information is only at the decoder? Publication: 1976, IEEE Transactions on Information Theory. Shocking result for Gaussian sources: optimal rate equals the rate achievable with encoder-side Y as well. Practical codes were long absent. Zamir, Shamai, Erez (2002): nested lattices. Distributed Video Coding (DVC): first wide application of WZ. DISCOVER (EPFL, 2006): complete WZ video codec. Neural WZ codecs (2020+) surpassed DISCOVER. Tishby (1999): Information Bottleneck - WZ reformulated for ML.

The Wyner-Ziv Theorem: Side Information is Free

1976. Aaron Wyner and Jacob Ziv publish 'The Rate-Distortion Function for Source Coding with Side Information at the Decoder'. Main result for Gaussian sources: the optimal rate R*(D) = R_{X|Y}(D) - the same as if the encoder also saw the side information Y. Decoder-side information is equivalent to both-sides information. Paradox.

H.264/AVC intra frames: each frame is encoded with a reference frame available at the decoder. This is a WZ problem: the encoder cannot use the reference frame (it is needed for subsequent decoding on a different device), but the decoder has it. The WZ theorem says: the optimal rate with this setup equals the rate achievable if the encoder also saw the reference. H.264 approximates this bound via motion compensation.

WZ gain: G = R(D) - R*(D) >= 0. For Gaussian X with MSE: G = 1/2 * log(1 + P/sigma^2) as D -> 0. This equals I(X;Y). High correlation (P >> sigma^2): G -> 1/2 * log(P/sigma^2). Zero correlation: G=0, side information is useless.

What does the WZ result 'the encoder only pays for noise N' mean for the Gaussian source?

X = Y + N. Decoder knows Y, does not know N. Optimal encoder: encode only N via rate-distortion. R*(D) = 1/2*log(sigma^2/D) is R(D) for N~N(0,sigma^2). Side information Y requires no additional bits from the encoder, even though the encoder does not see it.

Nested Lattice Codes: WZ Construction

The WZ theorem is an existence result; nested lattice codes are a construction. Proposed by Zamir, Shamai, Erez (2002). Idea: a fine lattice for precise quantization, a coarse lattice for binning. The decoder with side information resolves the bin ambiguity.

What are the roles of the fine and coarse lattices in WZ coding?

Analogy with SW: fine lattice Lambda_f = precise quantization; coarse Lambda_c/Lambda_f = binning. Encoder transmits bin index = log|Lambda_c/Lambda_f| = nR*(D) bits. Decoder knows Y, finds the correct Lambda_f representative inside the Lambda_c bin.

WZ in Video Coding: DISCOVER

Distributed Video Coding (DVC) is a class of video codecs where encoding is simple and decoding is complex. This inverts H.264: the H.264 encoder is complex (motion estimation), the decoder simple. DVC applies where the encoder has limited computation: IP cameras, mobile video capture.

DISCOVER vs H.264

DISCOVER (2006, EPFL/IST): full DVC codec based on WZ. Format: 4:2:0, QCIF (176x144), 16x16 blocks. Comparison at 256 kbps: H.264 Intra PSNR = 32.5 dB, DISCOVER PSNR = 30.8 dB (1.7 dB worse). But: DISCOVER encoding complexity is 50x lower. Application: low-cost IP cameras, battery-powered sensors, body cameras without GPU.

Neural WZ codecs (2020-2024): replacing syndrome BP decoding with a neural decoder. Architecture: side information Y_side + syndrome -> CNN decoder -> reconstructed frame. Advantage: training on real data captures the statistics of specific video content. Google Brain (2022): neural DVC outperforms DISCOVER by 2-3 dB at the same rate.

Why is encoding simpler and decoding more complex in DVC compared to H.264?

H.264 encoder: motion estimation = expensive operation (block search). H.264 decoder: simple motion vector application. DVC: encoder = DCT + syndrome (cheap). Decoder = side information via interpolation + BP syndrome decoding (expensive). WZ theorem guarantees that with good side information the rate approaches the optimum.

WZ Extensions and the Information Bottleneck

WZ generalizes to multiple side information sources and the lossy case for the Heegard-Berger problem. Most importantly: the Information Bottleneck (Tishby 1999) is a reformulation of WZ for machine learning. IB minimizes I(X;Z) while maximizing I(Z;Y) - this is the WZ problem with a different sign.

Deep Variational Information Bottleneck (Alemi et al., 2017): a neural realization of IB. Encoder: f_theta(x) -> z. Loss: I(X;Z) - beta * I(Z;Y) estimated via variational lower bounds. This is WZ coding: find a minimal representation Z containing information about Y. VAE (beta=1) is a special case. Increasing beta empirically improves robustness to adversarial examples.

How is the Information Bottleneck (Tishby) related to the Wyner-Ziv theorem?

WZ: min I(X;U) - I(U;Y) subject to E[d(X,X-hat)] <= D. IB: min I(X;Z) - beta*I(Z;Y). At beta=1: structurally identical. This explains the success of VAE (beta=1): it solves a WZ problem in data space. The Lagrangian connection: both are tradeoffs between compression and relevance.

Connection to Other Topics

The WZ theorem synthesizes SW (lossless with side information) and R(D) theory (lossy compression). Gaussian case: R*(D) = 1/2*log(sigma_N^2/D) - only noise is coded. Nested lattices: practical construction. In ML: IB principle = WZ at beta=1; VAE = neural WZ encoder. DVC uses WZ for video with decoder-side information. Neural WZ codecs: encoder learns to minimize R*(D) without explicit lattices.

Slepian-Wolf Theorem — Lossless prototype that Wyner-Ziv extends with a distortion constraint
Rate-Distortion Theory — Wyner-Ziv adds side information at the decoder to the classical R(D) setup
Information Theory in Deep Learning — Information Bottleneck at beta=1 recovers the Wyner-Ziv encoding objective

Итоги

WZ: encoder without Y, decoder with Y. R*(D) = min I(X;U) - I(U;Y) over Markov U-X-Y
Gaussian WZ: R*(D) = 1/2*log(sigma_N^2/D) - only noise N coded, Y is free
Nested lattices Lambda_f ⊂ Lambda_c: fine = quantization, coarse = binning
WZ at D=0: equals SW. Connection to IB: WZ = IB at beta=1

Вопросы для размышления

Why is 'side information is free' in WZ a fundamentally different result from 'correlation reduces entropy'?
How do nested lattices solve the problem of the encoder not seeing Y but needing to identify the right bin?
What is the practical tradeoff in DVC: why do IP cameras choose this codec over H.264?

Связанные уроки

it-27 — WZ at D=0 reduces to SW: R*(0) = H(X|Y)
it-21 — WZ is rate-distortion with side information, generalizing both theories
it-29 — WZ principles appear in distributed MIMO systems