Bretagnolle–Huber inequality

In information theory, the Bretagnolle–Huber inequality bounds the total variation distance between two probability distributions $P$ and $Q$ by a concave and bounded function of the Kullback–Leibler divergence $D_{K L} (P ∥ Q)$ . The bound can be viewed as an alternative to the well-known Pinsker's inequality: when $D_{K L} (P ∥ Q)$ is large (larger than 2 for instance.^[1]), Pinsker's inequality is vacuous, while Bretagnolle–Huber remains bounded and hence non-vacuous. It is used in statistics and machine learning to prove information-theoretic lower bounds relying on hypothesis testing^[2]　 (Bretagnolle–Huber–Carol Inequality is a variation of Concentration inequality for multinomially distributed random variables which bounds the total variation distance.)

Formal statement

Preliminary definitions

Let $P$ and $Q$ be two probability distributions on a measurable space $(𝒳, ℱ)$ . Recall that the total variation between $P$ and $Q$ is defined by

d_{T V} (P, Q) = \sup_{A \in ℱ} {| P (A) - Q (A) |} .

The Kullback-Leibler divergence is defined as follows:

D_{K L} (P ∥ Q) = {\begin{cases} \int_{𝒳} \log (\frac{d P}{d Q}) d P & if P ≪ Q, \\ + \infty & otherwise . \end{cases}

In the above, the notation $P ≪ Q$ stands for absolute continuity of $P$ with respect to $Q$ , and $\frac{d P}{d Q}$ stands for the Radon–Nikodym derivative of $P$ with respect to $Q$ .

General statement

The Bretagnolle–Huber inequality says:

d_{T V} (P, Q) \leq \sqrt{1 - \exp (- D_{K L} (P ∥ Q))} \leq 1 - \frac{1}{2} \exp (- D_{K L} (P ∥ Q))

Alternative version

The following version is directly implied by the bound above but some authors^[2] prefer stating it this way. Let $A \in ℱ$ be any event. Then

P (A) + Q (\bar{A}) \geq \frac{1}{2} \exp (- D_{K L} (P ∥ Q))

where $\bar{A} = Ω ∖ A$ is the complement of $A$ . Indeed, by definition of the total variation, for any $A \in ℱ$ ,

\begin{aligned} Q (A) - P (A) \leq d_{T V} (P, Q) & \leq 1 - \frac{1}{2} \exp (- D_{K L} (P ∥ Q)) \\ = Q (A) + Q (\bar{A}) - \frac{1}{2} \exp (- D_{K L} (P ∥ Q)) \end{aligned}

Rearranging, we obtain the claimed lower bound on $P (A) + Q (\bar{A})$ .

Proof

We prove the main statement following the ideas in Tsybakov's book (Lemma 2.6, page 89),^[3] which differ from the original proof^[4] (see C.Canonne's note ^[1] for a modernized retranscription of their argument). The proof is in two steps: 1. Prove using Cauchy–Schwarz that the total variation is related to the Bhattacharyya coefficient (right-hand side of the inequality):

1 - d_{T V} (P, Q)^{2} \geq {(\int \sqrt{P Q})}^{2}

2. Prove by a clever application of Jensen’s inequality that

{(\int \sqrt{P Q})}^{2} \geq \exp (- D_{K L} (P ∥ Q))

Step 1:

First notice that

d_{T V} (P, Q) = 1 - \int \min (P, Q) = \int \max (P, Q) - 1

To see this, denote

A^{*} = \arg \max_{A \in Ω} | P (A) - Q (A) |

and without loss of generality, assume that

P (A^{*}) > Q (A^{*})

such that

d_{T V} (P, Q) = P (A^{*}) - Q (A^{*})

. Then we can rewrite

d_{T V} (P, Q) = \int_{A^{*}} \max (P, Q) - \int_{A^{*}} \min (P, Q)

And then adding and removing

\int_{\bar{A^{*}}} \max (P, Q) or \int_{\bar{A^{*}}} \min (P, Q)

we obtain both identities.

Then

\begin{aligned} 1 - d_{T V} (P, Q)^{2} & = (1 - d_{T V} (P, Q)) (1 + d_{T V} (P, Q)) \\ = \int \min (P, Q) \int \max (P, Q) \\ \geq {(\int \sqrt{\min (P, Q) \max (P, Q)})}^{2} \\ = {(\int \sqrt{P Q})}^{2} \end{aligned}

because

P Q = \min (P, Q) \max (P, Q) .

Step 2:

We write

(\cdot)^{2} = \exp (2 \log (\cdot))

and apply Jensen's inequality:

\begin{aligned} {(\int \sqrt{P Q})}^{2} & = \exp (2 \log (\int \sqrt{P Q})) \\ = \exp (2 \log (\int P \sqrt{\frac{Q}{P}})) \\ = \exp (2 \log (E_{P} [{(\sqrt{\frac{P}{Q}})}^{- 1}])) \\ \geq \exp (E_{P} [- \log (\frac{P}{Q})]) = \exp (- D_{K L} (P, Q)) \end{aligned}

Combining the results of steps 1 and 2 leads to the claimed bound on the total variation.

Examples of applications

Sample complexity of biased coin tosses

Source:^[1] The question is How many coin tosses do I need to distinguish a fair coin from a biased one? Assume you have 2 coins, a fair coin (Bernoulli distributed with mean $p_{1} = 1 / 2$ ) and an $ε$ -biased coin ( $p_{2} = 1 / 2 + ε$ ). Then, in order to identify the biased coin with probability at least $1 - δ$ (for some $δ > 0$ ), at least

n \geq \frac{1}{2 ε^{2}} \log (\frac{1}{2 δ}) .

In order to obtain this lower bound we impose that the total variation distance between two sequences of $n$ samples is at least $1 - 2 δ$ . This is because the total variation upper bounds the probability of under- or over-estimating the coins' means. Denote $P_{1}^{n}$ and $P_{2}^{n}$ the respective joint distributions of the $n$ coin tosses for each coin, then We have

\begin{aligned} (1 - 2 δ)^{2} & \leq d_{T V} {(P_{1}^{n}, P_{2}^{n})}^{2} \\ \leq 1 - e^{- D_{K L} (P_{1}^{n} ∥ P_{2}^{n})} \\ = 1 - e^{- n D_{K L} (P_{1} ∥ P_{2})} \\ = 1 - e^{- n \frac{\log (1 / (1 - 4 ε^{2}))}{2}} \end{aligned}

The result is obtained by rearranging the terms.

Information-theoretic lower bound for k-armed bandit games

In multi-armed bandit, a lower bound on the minimax regret of any bandit algorithm can be proved using Bretagnolle–Huber and its consequence on hypothesis testing (see Chapter 15 of Bandit Algorithms^[2]).

History

The result was first proved in 1979 by Jean Bretagnolle and Catherine Huber, and published in the proceedings of the Strasbourg Probability Seminar.^[4] Alexandre Tsybakov's book^[3] features an early re-publication of the inequality and its attribution to Bretagnolle and Huber, which is presented as an early and less general version of Assouad's lemma (see notes 2.8). A constant improvement on Bretagnolle–Huber was proved in 2014 as a consequence of an extension of Fano's Inequality.^[5]

References

↑ ^1.0 ^1.1 ^1.2 Canonne, Clément (2022). "A short note on an inequality between KL and TV". arXiv:2202.07198 [math.PR].
↑ ^2.0 ^2.1 ^2.2 Lattimore, Tor; Szepesvari, Csaba (2020). Bandit Algorithms (PDF). Cambridge University Press. Retrieved 18 August 2022.
↑ ^3.0 ^3.1 Tsybakov, Alexandre B. (2010). Introduction to nonparametric estimation. Springer Series in Statistics. Springer. doi:10.1007/b13794. ISBN 978-1-4419-2709-5. OCLC 757859245. S2CID 42933599.
↑ ^4.0 ^4.1 Bretagnolle, J.; Huber, C. (1978), "Estimation des densités : Risque minimax", Séminaire de Probabilités XII, Lecture notes in Mathematics, vol. 649, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 342–363, doi:10.1007/bfb0064610, ISBN 978-3-540-08761-8, S2CID 122597694, retrieved 2022-08-20
↑ Gerchinovitz, Sébastien; Ménard, Pierre; Stoltz, Gilles (2020-05-01). "Fano's Inequality for Random Variables". Statistical Science. 35 (2). arXiv:1702.05985. doi:10.1214/19-sts716. ISSN 0883-4237. S2CID 15808752.

[canonne-1] 1.0 ^1.1 ^1.2 Canonne, Clément (2022). "A short note on an inequality between KL and TV". arXiv:2202.07198 [math.PR].

[bandit_algs-2] 2.0 ^2.1 ^2.2 Lattimore, Tor; Szepesvari, Csaba (2020). Bandit Algorithms (PDF). Cambridge University Press. Retrieved 18 August 2022.

[:0-3] 3.0 ^3.1 Tsybakov, Alexandre B. (2010). Introduction to nonparametric estimation. Springer Series in Statistics. Springer. doi:10.1007/b13794. ISBN 978-1-4419-2709-5. OCLC 757859245. S2CID 42933599.

[:1-4] 4.0 ^4.1 Bretagnolle, J.; Huber, C. (1978), "Estimation des densités : Risque minimax", Séminaire de Probabilités XII, Lecture notes in Mathematics, vol. 649, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 342–363, doi:10.1007/bfb0064610, ISBN 978-3-540-08761-8, S2CID 122597694, retrieved 2022-08-20

[5] Gerchinovitz, Sébastien; Ménard, Pierre; Stoltz, Gilles (2020-05-01). "Fano's Inequality for Random Variables". Statistical Science. 35 (2). arXiv:1702.05985. doi:10.1214/19-sts716. ISSN 0883-4237. S2CID 15808752.

[1]

[2]

[3]

[4]

[5]

Bretagnolle–Huber inequality

Contents

Formal statement

Preliminary definitions

General statement

Alternative version

Proof

Examples of applications

Sample complexity of biased coin tosses

Information-theoretic lower bound for k-armed bandit games

History

See also

References

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools

In other projects

In other languages