Information projection

In information theory, the information projection or I-projection of a probability distribution q onto a set of distributions P is

p^{*} = \underset{p \in P}{\arg \min} D_{K L} (p | | q)

.

where $D_{K L}$ is the Kullback–Leibler divergence from q to p. Viewing the Kullback–Leibler divergence as a measure of distance, the I-projection $p^{*}$ is the "closest" distribution to q of all the distributions in P. The I-projection is useful in setting up information geometry, notably because of the following inequality, valid when P is convex:^[1] $D_{K L} (p | | q) \geq D_{K L} (p | | p^{*}) + D_{K L} (p^{*} | | q)$ . This inequality can be interpreted as an information-geometric version of Pythagoras' triangle-inequality theorem, where KL divergence is viewed as squared distance in a Euclidean space. It is worthwhile to note that since $D_{K L} (p | | q) \geq 0$ and continuous in p, if P is closed and non-empty, then there exists at least one minimizer to the optimization problem framed above. Furthermore, if P is convex, then the optimum distribution is unique. The reverse I-projection also known as moment projection or M-projection is

p^{*} = \underset{p \in P}{\arg \min} D_{K L} (q | | p)

.

Since the KL divergence is not symmetric in its arguments, the I-projection and the M-projection will exhibit different behavior. For I-projection, $p (x)$ will typically under-estimate the support of $q (x)$ and will lock onto one of its modes. This is due to $p (x) = 0$ , whenever $q (x) = 0$ to make sure KL divergence stays finite. For M-projection, $p (x)$ will typically over-estimate the support of $q (x)$ . This is due to $p (x) > 0$ whenever $q (x) > 0$ to make sure KL divergence stays finite. The reverse I-projection plays a fundamental role in the construction of optimal e-variables. The concept of information projection can be extended to arbitrary f-divergences and other divergences.^[2]

References

↑ Cover, Thomas M.; Thomas, Joy A. (2006). Elements of Information Theory (2 ed.). Hoboken, New Jersey: Wiley Interscience. p. 367 (Theorem 11.6.1).
↑ Nielsen, Frank (2018). "What is... an information projection?" (PDF). Notices of the American Mathematical Society. 65 (3): 321–324. doi:10.1090/noti1647.

K. Murphy, "Machine Learning: a Probabilistic Perspective", The MIT Press, 2012.

This probability-related article is a stub. You can help Wikipedia by expanding it.

[1] Cover, Thomas M.; Thomas, Joy A. (2006). Elements of Information Theory (2 ed.). Hoboken, New Jersey: Wiley Interscience. p. 367 (Theorem 11.6.1).

[2] Nielsen, Frank (2018). "What is... an information projection?" (PDF). Notices of the American Mathematical Society. 65 (3): 321–324. doi:10.1090/noti1647.

[1]

[2]

Information projection

See also

References

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools

In other projects

In other languages