Juggling with randomness

Adapted from lectures 10 and 11 of Ryan O’Donnell’s Fall 2017 graduate complexity theory class at CMU.

In Buying randomness with quantifiers, we saw that in some cases, we can trade randomness for $\forall$ or $\exists$ quantifiers, proving that $BPP$ is in $P / poly$ and $Σ_{2} P$ . In this note, we extract the essence of those two proofs and apply the tricks more generally, with a focus on Arthur-Merlin complexity classes.

An interlude on quantifier notation

Since both of the two tricks we’ve seen allowed us to “play with the quantifiers” in the definitions of complexity classes in specific ways, it would be useful to make this more systematic by encapsulating the definitions of many complexity classes using quantifiers only.

For example, $Σ_{2} P$ corresponds to the languages $L$ for which there is a verifier $V$ such that¹

if $x \in L$ , then $\exists y, \forall z : V (x, y, z) = 1$ ;
if $x \notin L$ , then $\forall y, \exists z : V (x, y, z) = 0$ .

Because the quantifiers are $\exists \forall$ in one case and $\forall \exists$ in the other, we will simply call this class “ $\exists \forall / \forall \exists$ ”. In general, by “ $Q_{1}, \dots Q_{k} / Q_{1}^{'}, \dots, Q_{k}^{'}$ ” (where $Q_{i}, Q_{i}^{'}$ are quantifiers in $\exists, \forall, Я$ ), we mean the class of all languages $L$ for which there is a verifier $V$ such that

if $x \in L$ , then $Q_{1} y_{1}, \dots, Q_{k} y_{k} : V (x, y_{1}, \dots, y_{k}) = 1$ ;
if $x \notin L$ , then $Q_{1}^{'} y_{y}, \dots, Q_{k}^{'} y_{k} : V (x, y_{1}, \dots, y_{k}) = 0$ .

As such, the reader is encourage to verify that $P = /$ , $NP = \exists / \forall$ , $co NP = \forall / \exists$ , $Π_{2} P = \forall \exists / \exists \forall$ , $BPP = Я / Я$ , $RP = Я / \forall$ , $co RP = \forall / Я$ , $MA = \exists Я / \forall Я$ , and $AM = Я \exists / Я \forall$ . In general, to take the “ $co$ ” of a class, you simply swap both sides of the slash.

The tricks in our hands

Let’s recall the tools at our disposal: three easy ones, and the two we just proved.

Weaken: $\forall \Rightarrow Я \Rightarrow \exists$

This one is purely logical: we can always turn a $\forall$ into a $Я$ , and a $Я$ into $\exists$ , because this only weakens the guarantee. In words: “if all, then most, and if most, then some”. As an example, this shows that $RP = Я / \forall \subseteq \exists / \forall = NP$ .

Specialize: $\exists \forall \Rightarrow \forall \exists$

Another purely logical one: we can swap $\exists \forall$ into $\forall \exists$ , because all this does is to allow the $\exists$ quantifier to specify more. In words: “if a song is liked by everyone, then everyone has a song that they like”.

In fact, this principle also extends well to $Я$ ,² if you think of $Я$ as sort of “halfway in between $\exists$ and $\forall$ ”:

you can swap $\exists Я \to Я \exists$ , because “if a song is liked by most people, then most people have a song that they like”;
you can swap $Я \forall \to \forall Я$ , because “if most people like all songs, then all songs are liked by most people”.

Combine: $\forall \forall = \forall$

If we have the same quantifier twice, then we can reduce it to a single quantifier. For example, suppose we were to define a dumb class $\exists \exists / \forall \forall$ , i.e.

if $x \in L$ , then $\exists y, \exists z : V (x, y, z) = 1$ ;
if $x \notin L$ , then $\forall y, \forall z : V (x, y, z) = 0$ .

Then we could just combine strings $y$ and $z$ together:

if $x \in L$ , then $\exists (y, z) : V (x, y, z) = 1$ ;
if $x \notin L$ , then $\forall (y, z) : V (x, y, z) = 0$ .

And this is now functionally equivalent to $\exists / \forall = NP$ .

Union bound: $\forall Я = Я \forall$

The key of the proof of $BPP \subseteq P / poly$ was to boost the success probability so much that we can go from “for all inputs $x$ , most random strings $r$ give the correct answer” to “most random strings $r$ give the correct answer for all inputs $x$ ” by a union bound. In general, we can always³ flip $\forall Я$ into $Я \forall$ (even though this is the “wrong” direction in terms of direct implication).

In words: “if there’s more songs than people, and everyone likes all but a handful of songs, then most songs are liked by literally everyone”.

Squint: $Я / Я \subseteq Я \forall / \forall Я$

In the proof that $BPP \subseteq Σ_{2} P$ , we saw that if we have a set $S$ of random strings covering almost all of ${0, 1}^{s}$ , then the union of random offset copies $(S \oplus d^{(1)}) \cup \dots \cup (S \oplus d^{(s)})$ covers all of ${0, 1}^{s}$ with high probability, while if $S$ contains almost none of ${0, 1}^{s}$ , then no matter what offsets you choose, the offset copies will still cover only a tiny fraction of ${0, 1}^{s}$ . That is, after boosting the success probability,

$Я r : V (x, r) = 1$ implies $Я d^{(1)}, \dots, d^{(s)}, \forall r : V (x, r \oplus d^{(1)}) \lor \dots \lor V (x, r \oplus d^{(s)}) = 1$ ;
$Я r : V (x, r) = 0$ implies $\forall d^{(1)}, \dots, d^{(s)}, Я r : V (x, r \oplus d^{(1)}) \lor \dots \lor V (x, r \oplus d^{(s)}) = 0$ .

So⁴ we can transform $Я / Я$ into $Я \forall / \forall Я$ .

In words: “an almost full jar will typically look full if you squint, but an almost empty jar will always look mostly empty no matter how you squint”.

Also, by taking the “ $co$ ” of both sides, we also get $Я / Я \to \forall Я / Я \forall$ . This corresponds to taking the intersection of the offset copies $(S \oplus d^{(1)}) \cap \dots \cap (S \oplus d^{(s)})$ instead of the union (and therefore the AND instead of the OR). This ensures that when $S$ covers almost all of ${0, 1}^{s}$ , the intersection is still big, while if $S$ contains almost none of ${0, 1}^{s}$ , the intersection will be completely empty with high probability.

A slurry of corollaries

We’ve done all the hard thinking work, now we get to play around with quantifiers!

A alternative characterization of $BPP$

The proof that $BPP \subseteq Σ_{2} P$ that we saw before really went through an intermediate class $Я \forall / \forall Я$ :

BPP = Я / Я \overset{squint}{\subseteq} Я \forall / \forall Я \overset{weaken}{\subseteq} \exists \forall / \forall \exists = Σ^{2} P .

As it turns out, $Я \forall / \forall Я$ is equal to $BPP$ , since

Я \forall / \forall Я \overset{weaken}{\subseteq} Я Я / Я Я \overset{combine}{=} Я / Я = BPP .

One-sided error for $MA$ and $AM$

Both $MA$ and $AM$ correspond to two-round proof protocols between Arthur, a weak but honest mortal with the power of $BPP$ , and Merlin, a powerful but untrustworthy wizard with unbounded computation power (Merlin kind of plays the role of $NP$ here).

The two classes vary only in who goes first:

In $MA : = \exists Я / \forall Я$ , Merlin first sends a purported “proof” that $x \in L$ , then Arthur uses randomness to “verify” the proof.
In $AM : = Я \exists / Я \forall$ , Arthur first uses randomness to create an $NP$ -type “challenge” for Merlin (e.g. a SAT instance), then Merlin tries to find a “solution” to the challenge.

The same squinting trick that gives “absolute guarantees” for $BPP$ at the cost of an extra quantifier can also give absolute guarantees for $MA$ and $AM$ , but this time for free: Merlin’s $\exists / \forall$ quantifier can absorb one of the two quantifiers that appear.

Formally,

MA = \exists Я / \forall Я \overset{squint}{\subseteq} \exists (Я \forall) / \forall (\forall Я) \overset{weaken}{\subseteq} \exists \exists \forall / \forall \forall Я \overset{combine}{=} \exists \forall / \forall Я .

In effect, we asked Merlin to attach to his proof the offsets $d^{(1)}, \dots, d^{(s)}$ that make sure that Arthur always accepts when $x \in L$ . This means that there are no “false negatives” anymore. Also, since $\exists \forall / \forall Я \subseteq \exists \forall / \forall \exists$ by weakining, this shows that $MA \subseteq Σ_{2} P$ .

Similarly, for $AM$ ,

AM = Я \exists / Я \forall \overset{squint}{\subseteq} (\forall Я) \exists / (Я \forall) \forall \overset{weaken}{\subseteq} \forall \exists \exists / Я \forall \forall \overset{combine}{=} \forall \exists / Я \forall .

This time, we used the “ $co$ ” version of squinting, and Arthur is only drawing the offsets, while Merlin tries to find a random string such that the verifier accepts on all the offsets. When $x \in L$ , Merlin is always able to succeed, so there are no “false negatives”. Also, since $\forall \exists / Я \forall \subseteq \forall \exists / \exists \forall$ by weakning, this implies $AM \subseteq Π_{2} P$ .

$MA \subseteq AM$

On a surface level, $MA$ and $AM$ seem incomparable: they’re the same class with the order of quantifiers flipped, just like $Σ_{2} P$ and $Π_{2} P$ . But it turns out that $MA \subseteq AM$ : it’s always better for Arthur to go first.

For $x \in L$ , this is true because $\exists Я$ logically implies $Я \exists$ : the only thing that changes between $MA$ and $AM$ is that Merlin now gets to see Arthur’s coin flips before sending his message, so that can only help him make Arthur accept.
For $x \notin L$ , on the other hand (and for the same reasons), $\forall Я$ does not directly imply $Я \forall$ : even if no message of Merlin can fool Arthur on average when Merlin goes first, once the order is flipped, Merlin might be able to do better by adaptively choosing which message to send based on Arthur’s message. But if the probability Arthur is fooled is small compared to the number of possible messages for Merlin, then most of the time Arthur cannot be fooled by any message.

In math, this gives

MA = \exists Я / \forall Я \overset{specialize}{\subseteq} Я \exists / \forall Я \overset{union bound}{=} Я \exists / Я \forall = AM .

Overall, we can summarize what we know about these classes in this diagram (classes that are generally believed to be equal are shaded together):

Constant-round protocols are all in $AM$

One might think that by adding more rounds of interaction between Arthur and Merlin, we can keep extending the class of languages. But this is not the case! The reason is the intuition that we got from the previous section: it’s always better for Arthur to go before Merlin, so we can transform the protocol so that Arthur sends all of his messages first, then Merlin sends all of his messages.

For example, consider the class $AMA$ where Arthur goes first and last. Informally, we know that $MA \subseteq AM$ , so it’s tempting to write $AMA \subseteq AAM = AM$ . This is not quite how math works but it’s fundamentally the right idea. More formally, we can write $AMA$ as $Я \exists Я / Я \forall Я$ ,⁵ so we can pull the same trick where we flip $\exists Я / \forall Я$ into $Я \exists / Я \forall$ , then combine quantifiers.

AMA = Я \exists Я / Я \forall Я \overset{specialize}{\subseteq} Я Я \exists / Я \forall Я \overset{union bound}{=} Я Я \exists / Я Я \forall \overset{combine}{=} Я \exists / Я \forall = AM .

What if $NP \subseteq BPP$ ?

In complexity we often think of $P$ as the definition of “efficient computation”. But what would happen if we used $BPP$ instead?

One important result is to show that if $NP$ was “easy” (i.e. $P = NP$ ), then dramatic things would happen (i.e. the polynomial hierarchy would collapse). What would happen if $NP \subseteq BPP$ ?

If we had an $NP$ oracle within $BPP$ , then in particular, we could use it to solve $Σ_{2} P$ within $MA$ : use Merlin for the first quantifier, then let Arthur use the $NP$ oracle to handle the second quantifier:

Σ_{2} P = \exists \forall / \forall \exists \overset{NP \subseteq BPP}{\subseteq} \exists Я / \forall Я = MA .

But we know $MA \subseteq AM \subseteq Π_{2} P$ . So we would have $Σ_{2} P \subseteq Π_{2} P$ and the polynomial hierarchy would collapse to the second level.

Appendix: I lied a bit

This appendix is boring, please don’t read it.

I claimed earlier that we can always replace $\forall Я$ by $Я \forall$ , and $Я / Я$ by $Я \forall / \forall Я$ . Well, it’s a bit more complicated. The thing is, these tricks required some change to the computation that follows:

For the “union bound” in $\forall Я \to Я \forall$ to work, we need to boost the probability of success. This involves repeating the computation several times in parallel and taking the majority of the answers.
For the “squinting” trick in $Я / Я \to Я \forall / \forall Я$ we need to first boost the probability of succes then run this boosted computation at several offsets $d^{(1)}, \dots, d^{(s)}$ and take the OR.

This means that we should be able to take majorities or ORs in the underlying computation model. For $BPP$ , this was no problem: once $x$ and $r$ are chosen, the remaining computation $V (x, r)$ is deterministic polynomial time, so repeating $poly (n)$ times and computing the majority is no big deal. On the other hand, imagine we’re looking at $AM = Я \exists / Я \forall$ . Then the computation that follows the “for most” quantifier is an $NP$ -type computation. Can we compute the majority of some $NP$ computations within $NP$ itself? Given that we cannot even easily negate the result of an $NP$ computation (unless $NP = co NP$ ), this seems worrying.

However, it does turn out to be okay, and ultimately this comes down to the fact that you can push monotone operations like $\land$ and $\lor$ inside quantifiers $\exists$ and $\forall$ :⁶ for example <div>[(\exists y_1 : P(y_1)) \and (\exists y_2 : P(y_2)) \Leftrightarrow \exists y_1, y_2: P(y_1) \and P(y_2).]</div> Concretely, for boosting probabilities, we get

if $x \in L$ , then $Я r_{1}, \dots, r_{k} : {Maj}_{i} (1 [\exists y : V (x, r_{i}, y)) = 1]) = 1$ ;
if $x \notin L$ , then $Я r_{1}, \dots, r_{k} : {Maj}_{i} (1 [\forall y : V (x, r_{i}, y) = 0]) = 1$ .

Which is equivalent to

if $x \in L$ , then $Я r_{1}, \dots, r_{k}, \exists y_{1}, \dots, y_{k} : {Maj}_{i} (V (x, r_{i}, y)) = 1$ ;
if $x \notin L$ , then $Я r_{1}, \dots, r_{k}, \forall y_{1}, \dots, y_{k} : {Maj}_{i} (V (x, r_{i}, y)) = 0$ .

All strings are understood to have length polynomial in $n : = | x |$ . ↩
In fact, as already mentioned in a footnote in Buying randomness with quantifiers, you can always move $\exists$ to the inside (because you can just choose not to specialize) and $\forall$ to the outside (because if it always works, it cannot meaningfully specialize): whatever (monotone) quantifier $Q$ you might come up with, $\exists Q \to Q \exists$ and $Q \forall \to \forall Q$ . But we’re getting carried away! ↩
Actually, this is only the case if the underlying computation model allows us to boost the probability of success, see more details in the appendix. ↩
Actually, this is only the case if the underlying computation model allows us to boost the probability of success and compute ORs of polynomially many runs, see more details in the appendix. ↩
Actually even the claim that $AMA = Я \exists Я / Я \forall Я$ is not obvious. The protocol-based definition only promises an overall probability of success, not that Arthur “succeeds with high probability at every round”. You can (very carefully) prove it by parallel repetition, though. ↩
in this note, we won’t need to boost the probability for a computation that itself includes $Я$ . Ironically, for $Я$ , it’s less easy, but you can indeed do it… as long as the probability of that $Я$ is high enough. So as far as I can tell, if you want to boost the probability of some $Я$ , you need to first boost the $Я$ ’s right of it. ↩

An interlude on quantifier notation

The tricks in our hands

Weaken: Я∀⇒Я⇒∃

Specialize: ∃∀⇒∀∃

Combine: ∀∀=∀

Union bound: ЯЯ∀Я=Я∀

Squint: ЯЯЯЯЯ/Я⊆Я∀/∀Я