Shearer's lemma
Adapted from the tutorial Information theory in combinatorics I by Shachar Lovett at Simons.
Entropy is subadditive: the information you have in a pair of random variables
In fact, the difference between the two is precisely the mutual information
Shearer’s lemma generalizes this principle to groups of variables, not just individual variables: the point is that if both sides have the same variables, but the right side takes the entropy of smaller groups, then the right side overcounts some of the mutual information, so it’s bigger. For example:
And in general, if you have
Proof
You can “prove” the simple subadditivity principle using the chain rule:
since conditioning can only decrease entropy.
Similarly, we can prove Shearer’s lemma by applying the chain rule on both sides, and observing that the left side has stronger conditioning. Specifically, we rewrite the LHS as
and for each set
Because each variable
Shadows
Shearer’s lemma is particularly useful in combinatorics if you let
As a toy example, suppose I have a set
Although it’s very overkill, you could have used the subadditivity of entropy to prove this. Let
If you move to three dimensions, though, it’s less obvious. This time, I have
But the obvious example is the
Entropy deficit version
Because entropy deficit is basically “negative entropy”, we get an inequality in the other direction:
(where each variable is in exactly3
For example, suppose I have some events
- If the events are independent, you get
. - But absent any additional information, the best you can get is
: indeed, they could all be the same event.
Now, assume the events
Let
; can only take values in , so .
#to-write make the logic above clearer
Therefore, we get
Related
Other applications
- Shearer’s lemma gives tight bounds on the number of independent sets in bipartite
-regular graphs: , which is achieved by copies of . The variable to look at is the indicator (in ) of a uniformly random independent set.
Similar results
- Gavinsky, Lovett, Saks, and Srinivasan4 proved a read-
variant of the Chernoff bound (which similarly worsens the bound from to ). - The Local lemma is another result about a relaxed notion of independence. However,
- the assumption is stronger: the events are not allowed to share variables with too many other events, whereas Shearer’s lemma only assumes that each variable is queried few times;
- it’s a lower bound on
, not an upper bound on .
-
We could also write “at least
”, since adding more variables to a group can only increase the RHS. ↩ -
That is, if we write
then , and similarly for . ↩ -
#to-write ↩
-
Dmytro Gavinsky, Shachar Lovett, Michael Saks, Srikanth Srinivasan, “A Tail Bound for Read-k Families of Functions”. ↩