Data processing inequality

From The Right Wiki
Jump to navigationJump to search

The data processing inequality is an information theoretic concept that states that the information content of a signal cannot be increased via a local physical operation. This can be expressed concisely as 'post-processing cannot increase information'.[1]

Statement

Let three random variables form the Markov chain XYZ, implying that the conditional distribution of Z depends only on Y and is conditionally independent of X. Specifically, we have such a Markov chain if the joint probability mass function can be written as

p(x,y,z)=p(x)p(y|x)p(z|y)=p(y)p(x|y)p(z|y)

In this setting, no processing of Y, deterministic or random, can increase the information that Y contains about X. Using the mutual information, this can be written as :

I(X;Y)I(X;Z),

with the equality I(X;Y)=I(X;Z) if and only if I(X;YZ)=0. That is, Z and Y contain the same information about X, and XZY also forms a Markov chain.[2]

Proof

One can apply the chain rule for mutual information to obtain two different decompositions of I(X;Y,Z):

I(X;Z)+I(X;YZ)=I(X;Y,Z)=I(X;Y)+I(X;ZY)

By the relationship XYZ, we know that X and Z are conditionally independent, given Y, which means the conditional mutual information, I(X;ZY)=0. The data processing inequality then follows from the non-negativity of I(X;YZ)0.

See also

References

  1. Beaudry, Normand (2012), "An intuitive proof of the data processing inequality", Quantum Information & Computation, 12 (5–6): 432–441, arXiv:1107.0740, Bibcode:2011arXiv1107.0740B, doi:10.26421/QIC12.5-6-4, S2CID 9531510
  2. Cover; Thomas (2012). Elements of information theory. John Wiley & Sons.

External links