Suppose we have two (unnormalized) density functions and that can be easily evaluated on a point, and suppose that we’re able to get sample from :
Then, assuming that is not too much more concentrated than at any point, we can get samples from by sampling from then “tweaking the probabilities” appropriately.
Algorithm
Repeatedly:
- sample from ;
- with probability , return ;
- otherwise, “reject” and continue.
Analysis
As long as the “probability” in step 2 is never greater than , we’ll get value with probability
so this has the right distribution. And it’s enough to set .
But the issue is that the smaller we set , the longer the algorithm might take until it accepts for the first time. Indeed, assuming that and are actually probabilities (they sum to ), the probability of accepting at any iteration of the loop is
so it would take iterations on average until the algorithm succeeds. This quantity is the max-divergence between and (see Power entropies and divergences).
Example: sampling from DNFs
Say we want to sample from the set of satisfying assignments of a DNF formula
It’s easy to sample uniformly over the assignments that satisfy a particular term of the formula, since they form a hypercube. But the overlaps between the hypercubes for different make it nontrivial to sample uniformly over the entirety of .
That said, we can still easily sample according to the following distribution, which is somewhat close to the desired distribution:
- pick a term at random according to how large is,
- then pick a uniformly random within .
This would be perfect if the hypercubes were disjoint, but in general, assignments that satisfy many terms have a higher chance of being drawn: we get density
whereas the distribution we wish for is just
The ratio is always at most , so we can just set , and each iteration has at least a
chance of accepting , so it takes iterations on average.
#to-write swap and and capitalize