Usual entropic inequalities

Fundamentally, shared information is visible in the whole X but not in the individual parts X1,,Xn, so that

  • entropy is subadditive: H[X]H[Xi] (shared information overcounted on the right);
  • entropy deficit is superadditive: D[X]D[Xi] (shared information is only counted on the left).

More generally, Shearer’s lemma says that the same observation not just for individual coordinates, but any groupings such that

Using independence to reverse them

When the Xi are independent, we can reverse these inequalities:

H[X]H[Xi]D[X]D[Xi].

And we even get inequalities for Information divergence and mutual information that can otherwise go both ways:

DXXDXiXiI[X;Y]I[Xi;Y].

Reversing them without independence

When the variables are not independent, we cannot in general hope for H[X] to be as big as the sum of the individual entropies H[Xi], or equivalently, for the individual deficits D[Xi] to be large if the overall deficit D[X] is large.

Conditional deficits

However, we can use the chain rule to show that the conditional deficits need to be large:

D[X]=D[Xi|X<i]D[Xi|X[n]{i}]

which is exactly the edge-isoperimetric inequality when X is flat.1 In particular, at least one of the conditional deficits must be D[X]n.

If X is uniform over a subset of {0,1}n, then we can identify it with the boolean function f:{0,1}n{0,1} such that X is uniform over f1(1). When D[X]1 (i.e. E[f]1/2), this means that

D[Xi|X[n]{i}]=Pr[f(xi)=0f(x)=1]=Infi[f]2E[f]=Θ(Infi[f]Var[f]),

so the Kahn–Kalai–Linial theorem shows that at least one of the conditional deficits is significantly bigger than D[X]n (at least when D[X]logn):

maxiD[Xi|X[n]{i}]Ω(lognn).

Combining the chain rule and Kahn–Kalai–Linial, we get the

maxiD[Xi|X[n]{i}]Ω(max(logn,D[X])n),

which (perhaps surprisingly) is tight for every value of D[X], as exemplified by the Tribes DNF.

When D[X]1, we instead have D[Xi|X[n]{i}]=Θ(Infi[f]) and Var[f]=Θ(D[X]), and therefore

maxiD[Xi|X[n]{i}]D[X]Ω(max(logn,log1D[X])n)

TODO: turn this into an actual #figure

Log-concave distributions

TODO: For distributions, “log-concavity” is a soft notion of independence which implies entropic inequalities, see Log-concave polynomials and distributions (not yet covered in that note).

  1. in particular, uniform over the 1-inputs of a function f1(1)