McDiarmid's inequality

From The Right Wiki
Jump to navigationJump to search

In probability theory and theoretical computer science, McDiarmid's inequality (named after Colin McDiarmid [1]) is a concentration inequality which bounds the deviation between the sampled value and the expected value of certain functions when they are evaluated on independent random variables. McDiarmid's inequality applies to functions that satisfy a bounded differences property, meaning that replacing a single argument to the function while leaving all other arguments unchanged cannot cause too large of a change in the value of the function.

Statement

A function f:𝒳1×𝒳2××𝒳n satisfies the bounded differences property if substituting the value of the ith coordinate xi changes the value of f by at most ci. More formally, if there are constants c1,c2,,cn such that for all i[n], and all x1𝒳1,x2𝒳2,,xn𝒳n,

supxi𝒳i|f(x1,,xi1,xi,xi+1,,xn)f(x1,,xi1,xi,xi+1,,xn)|ci.

McDiarmid's Inequality[2] — Let f:𝒳1×𝒳2××𝒳n satisfy the bounded differences property with bounds c1,c2,,cn. Consider independent random variables X1,X2,,Xn where Xi𝒳i for all i. Then, for any ε>0,

P(f(X1,X2,,Xn)𝔼[f(X1,X2,,Xn)]ε)exp(2ε2i=1nci2),
P(f(X1,X2,,Xn)𝔼[f(X1,X2,,Xn)]ε)exp(2ε2i=1nci2),

and as an immediate consequence,

P(|f(X1,X2,,Xn)𝔼[f(X1,X2,,Xn)]|ε)2exp(2ε2i=1nci2).

Extensions

Unbalanced distributions

A stronger bound may be given when the arguments to the function are sampled from unbalanced distributions, such that resampling a single argument rarely causes a large change to the function value.

McDiarmid's Inequality (unbalanced)[3][4] — Let f:𝒳n satisfy the bounded differences property with bounds c1,c2,,cn. Consider independent random variables X1,X2,,Xn𝒳 drawn from a distribution where there is a particular value χ0𝒳 which occurs with probability 1p. Then, for any ε>0,

P(|f(X1,,Xn)𝔼[f(X1,,Xn)]|ε)2exp(ε22p(2p)i=1nci2+23εmaxici).

This may be used to characterize, for example, the value of a function on graphs when evaluated on sparse random graphs and hypergraphs, since in a sparse random graph, it is much more likely for any particular edge to be missing than to be present.

Differences bounded with high probability

McDiarmid's inequality may be extended to the case where the function being analyzed does not strictly satisfy the bounded differences property, but large differences remain very rare.

McDiarmid's Inequality (Differences bounded with high probability)[5] — Let f:𝒳1×𝒳2××𝒳n be a function and 𝒴𝒳1×𝒳2××𝒳n be a subset of its domain and let c1,c2,,cn0 be constants such that for all pairs (x1,,xn)𝒴 and (x'1,,x'n)𝒴,

|f(x1,,xn)f(x'1,,x'n)|i:xix'ici.

Consider independent random variables X1,X2,,Xn where Xi𝒳i for all i. Let p=1P((X1,,Xn)𝒴) and let m=𝔼[f(X1,,Xn)(X1,,Xn)𝒴]. Then, for any ε>0,

P(f(X1,,Xn)mε)p+exp(2max(0,εpi=1nci)2i=1nci2),

and as an immediate consequence,

P(|f(X1,,Xn)m|ε)2p+2exp(2max(0,εpi=1nci)2i=1nci2).

There exist stronger refinements to this analysis in some distribution-dependent scenarios,[6] such as those that arise in learning theory.

Sub-Gaussian and sub-exponential norms

Let the kth centered conditional version of a function f be

fk(X)(x):=f(x1,,xk1,Xk,xk+1,,xn)𝔼X'kf(x1,,xk1,X'k,xk+1,,xn),

so that fk(X) is a random variable depending on random values of x1,,xk1,xk+1,,xn.

McDiarmid's Inequality (Sub-Gaussian norm)[7][8] — Let f:𝒳1×𝒳2××𝒳n be a function. Consider independent random variables X=(X1,X2,,Xn) where Xi𝒳i for all i. Let fk(X) refer to the kth centered conditional version of f. Let ψ2 denote the sub-Gaussian norm of a random variable. Then, for any ε>0,

P(f(X1,,Xn)mε)exp(ε232ek[n]fk(X)ψ22).

McDiarmid's Inequality (Sub-exponential norm)[8] — Let f:𝒳1×𝒳2××𝒳n be a function. Consider independent random variables X=(X1,X2,,Xn) where Xi𝒳i for all i. Let fk(X) refer to the kth centered conditional version of f. Let ψ1 denote the sub-exponential norm of a random variable. Then, for any ε>0,

P(f(X1,,Xn)mε)exp(ε24e2k[n]fk(X)ψ12+2εemaxk[n]fk(X)ψ1).

Bennett and Bernstein forms

Refinements to McDiarmid's inequality in the style of Bennett's inequality and Bernstein inequalities are made possible by defining a variance term for each function argument. Let

B:=maxk[n]supx1,,xk1,xk+1,,xn|f(x1,,xk1,Xk,xk+1,,xn)𝔼Xkf(x1,,xk1,Xk,xk+1,,xn)|,Vk:=supx1,,xk1,xk+1,,xn𝔼Xk(f(x1,,xk1,Xk,xk+1,,xn)𝔼Xkf(x1,,xk1,Xk,xk+1,,xn))2,σ~2:=k=1nVk.

McDiarmid's Inequality (Bennett form)[4] — Let f:𝒳n satisfy the bounded differences property with bounds c1,c2,,cn. Consider independent random variables X1,X2,,Xn where Xi𝒳i for all i. Let B and σ~2 be defined as at the beginning of this section. Then, for any ε>0,

P(f(X1,,Xn)𝔼[f(X1,,Xn)]ε)exp(ε2Blog(1+Bεσ~2)).

McDiarmid's Inequality (Bernstein form)[4] — Let f:𝒳n satisfy the bounded differences property with bounds c1,c2,,cn. Let B and σ~2 be defined as at the beginning of this section. Then, for any ε>0,

P(f(X1,,Xn)𝔼[f(X1,,Xn)]ε)exp(ε22(σ~2+Bε3)).

Proof

The following proof of McDiarmid's inequality[2] constructs the Doob martingale tracking the conditional expected value of the function as more and more of its arguments are sampled and conditioned on, and then applies a martingale concentration inequality (Azuma's inequality). An alternate argument avoiding the use of martingales also exists, taking advantage of the independence of the function arguments to provide a Chernoff-bound-like argument.[4] For better readability, we will introduce a notational shorthand: zij will denote zi,,zj for any z𝒳n and integers 1ijn, so that, for example,

f(X1(i1),y,x(i+1)n):=f(X1,,Xi1,y,xi+1,,xn).

Pick any x1,x2,,xn. Then, for any x1,x2,,xn, by triangle inequality,

|f(x1n)f(x'1n)||f(x1n)f(x'1(n1),xn)|+cn|f(x1n)f(x'1(n2),x(n1)n)|+cn1+cni=1nci,

and thus f is bounded. Since f is bounded, define the Doob martingale {Zi} (each Zi being a random variable depending on the random values of X1,,Xi) as

Zi:=𝔼[f(X1n)X1i]

for all i1 and Z0:=𝔼[f(X1n)], so that Zn=f(X1n). Now define the random variables for each i

Ui:=supx𝒳i𝔼[f(X1(i1),x,X(i+1)n)X1(i1),Xi=x][f(X1(i1),Xin)X1(i1)],Li:=infx𝒳i𝔼[f(X1(i1),x,X(i+1)n)X1(i1),Xi=x][f(X1(i1),Xin)X1(i1)].

Since Xi,,Xn are independent of each other, conditioning on Xi=x does not affect the probabilities of the other variables, so these are equal to the expressions

Ui=supx𝒳i𝔼[f(X1(i1),x,X(i+1)n)f(X1(i1),Xin)X1(i1)],Li=infx𝒳i𝔼[f(X1(i1),x,X(i+1)n)f(X1(i1),Xin)X1(i1)].

Note that LiZiZi1Ui. In addition,

UiLi=supu𝒳i,𝒳i𝔼[f(X1(i1),u,X(i+1)n)X1(i1)]𝔼[f(X1(i1),,X(i+1)n)X1(i1)]=supu𝒳i,𝒳i𝔼[f(X1(i1),u,X(i+1)n)f(X1(i1),l,X(i+1)n)X1(i1)]supxu𝒳i,xl𝒳i𝔼[ciX1(i1)]ci

Then, applying the general form of Azuma's inequality to {Zi}, we have

P(f(X1,,Xn)𝔼[f(X1,,Xn)]ε)=P(ZnZ0ε)exp(2ε2i=1nci2).

The one-sided bound in the other direction is obtained by applying Azuma's inequality to {Zi} and the two-sided bound follows from a union bound.

See also

References

  1. McDiarmid, Colin (1989). "On the method of bounded differences". Surveys in Combinatorics, 1989: Invited Papers at the Twelfth British Combinatorial Conference: 148–188. doi:10.1017/CBO9781107359949.008. ISBN 978-0-521-37823-9.
  2. 2.0 2.1 Doob, J. L. (1940). "Regularity properties of certain families of chance variables" (PDF). Transactions of the American Mathematical Society. 47 (3): 455–486. doi:10.2307/1989964. JSTOR 1989964.
  3. Chou, Chi-Ning; Love, Peter J.; Sandhu, Juspreet Singh; Shi, Jonathan (2022). "Limitations of Local Quantum Algorithms on Random Max-k-XOR and Beyond". 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). 229: 41:13. arXiv:2108.06049. doi:10.4230/LIPIcs.ICALP.2022.41. Retrieved 8 July 2022.
  4. 4.0 4.1 4.2 4.3 Ying, Yiming (2004). "McDiarmid's inequalities of Bernstein and Bennett forms" (PDF). City University of Hong Kong. Retrieved 10 July 2022.
  5. Combes, Richard (2015). "An extension of McDiarmid's inequality". arXiv:1511.05240 [cs.LG].
  6. Wu, Xinxing; Zhang, Junping (April 2018). "Distribution-dependent concentration inequalities for tighter generalization bounds". Science China Information Sciences. 61 (4): 048105:1–048105:3. arXiv:1607.05506. doi:10.1007/s11432-017-9225-2. S2CID 255199895. Retrieved 10 July 2022.
  7. Kontorovich, Aryeh (22 June 2014). "Concentration in unbounded metric spaces and algorithmic stability". Proceedings of the 31st International Conference on Machine Learning. 32 (2): 28–36. arXiv:1309.1007. Retrieved 10 July 2022.
  8. 8.0 8.1 Maurer, Andreas; Pontil, Pontil (2021). "Concentration inequalities under sub-Gaussian and sub-exponential conditions" (PDF). Advances in Neural Information Processing Systems. 34: 7588–7597. Retrieved 10 July 2022.