Switching lemma

$\require{mathtools} \newcommand{\nc}{\newcommand} % %%% GENERIC MATH %%% % % Environments \newcommand{\al}[1]{\begin{align}#1\end{align}} % need this for \tag{} to work \renewcommand{\r}{\mathrm} % BAD!! does cursed things with accents :(( \renewcommand{\t}{\textrm} \newcommand{\either}[1]{\begin{cases}#1\end{cases}} % % Delimiters % (I needed to create my own because the MathJax version of \DeclarePairedDelimiter doesn't have \mathopen{} and that messes up the spacing) % .. one-part \newcommand{\p}[1]{\mathopen{}\left( #1 \right)} \renewcommand{\P}[1]{^{\p{#1}}} \renewcommand{\b}[1]{\mathopen{}\left[ #1 \right]} \newcommand{\lopen}[1]{\mathopen{}\left( #1 \right]} \newcommand{\ropen}[1]{\mathopen{}\left[ #1 \right)} \newcommand{\set}[1]{\mathopen{}\left\{ #1 \right\}} \newcommand{\abs}[1]{\mathopen{}\left\lvert #1 \right\rvert} \newcommand{\floor}[1]{\mathopen{}\left\lfloor #1 \right\rfloor} \newcommand{\ceil}[1]{\mathopen{}\left\lceil #1 \right\rceil} \newcommand{\round}[1]{\mathopen{}\left\lfloor #1 \right\rceil} \newcommand{\inner}[1]{\mathopen{}\left\langle #1 \right\rangle} \newcommand{\norm}[1]{\mathopen{}\left\lVert #1 \strut \right\rVert} \newcommand{\frob}[1]{\norm{#1}_\mathrm{F}} \newcommand{\mix}[1]{\mathopen{}\left\lfloor #1 \right\rceil} %% .. two-part \newcommand{\inco}[2]{#1 \mathop{}\middle|\mathop{} #2} \newcommand{\co}[2]{ {\left.\inco{#1}{#2}\right.}} \newcommand{\cond}{\co} % deprecated \newcommand{\pco}[2]{\p{\inco{#1}{#2}}} \newcommand{\bco}[2]{\b{\inco{#1}{#2}}} \newcommand{\setco}[2]{\set{\inco{#1}{#2}}} \newcommand{\at}[2]{ {\left.#1\strut\right|_{#2}}} \newcommand{\pat}[2]{\p{\at{#1}{#2}}} \newcommand{\bat}[2]{\b{\at{#1}{#2}}} \newcommand{\para}[2]{#1\strut \mathop{}\middle\|\mathop{} #2} \newcommand{\ppa}[2]{\p{\para{#1}{#2}}} \newcommand{\pff}[2]{\p{\ff{#1}{#2}}} \newcommand{\bff}[2]{\b{\ff{#1}{#2}}} \newcommand{\bffco}[4]{\bff{\cond{#1}{#2}}{\cond{#3}{#4}}} \newcommand{\sm}[1]{\p{\begin{smallmatrix}#1\end{smallmatrix}}} % % Greek \newcommand{\eps}{\epsilon} \newcommand{\veps}{\varepsilon} \newcommand{\vpi}{\varpi} % the following cause issues with real LaTeX tho :/ maybe consider naming it \fhi instead? \let\fi\phi % because it looks like an f \let\phi\varphi % because it looks like a p \renewcommand{\th}{\theta} \newcommand{\Th}{\Theta} \newcommand{\om}{\omega} \newcommand{\Om}{\Omega} % % Miscellaneous \newcommand{\LHS}{\mathrm{LHS}} \newcommand{\RHS}{\mathrm{RHS}} \DeclareMathOperator{\cst}{const} % .. operators \DeclareMathOperator{\poly}{poly} \DeclareMathOperator{\polylog}{polylog} \DeclareMathOperator{\quasipoly}{quasipoly} \DeclareMathOperator{\negl}{negl} \DeclareMathOperator*{\argmin}{arg\thinspace min} \DeclareMathOperator*{\argmax}{arg\thinspace max} \DeclareMathOperator{\diag}{diag} % .. functions \DeclareMathOperator{\id}{id} \DeclareMathOperator{\sign}{sign} \DeclareMathOperator{\step}{step} \DeclareMathOperator{\err}{err} \DeclareMathOperator{\ReLU}{ReLU} \DeclareMathOperator{\softmax}{softmax} % .. analysis \let\d\undefined \newcommand{\d}{\operatorname{d}\mathopen{}} \newcommand{\dd}[1]{\operatorname{d}^{#1}\mathopen{}} \newcommand{\df}[2]{ {\f{\d #1}{\d #2}}} \newcommand{\ds}[2]{ {\sl{\d #1}{\d #2}}} \newcommand{\ddf}[3]{ {\f{\dd{#1} #2}{\p{\d #3}^{#1}}}} \newcommand{\dds}[3]{ {\sl{\dd{#1} #2}{\p{\d #3}^{#1}}}} \renewcommand{\part}{\partial} \newcommand{\ppart}[1]{\part^{#1}} \newcommand{\partf}[2]{\f{\part #1}{\part #2}} \newcommand{\parts}[2]{\sl{\part #1}{\part #2}} \newcommand{\ppartf}[3]{ {\f{\ppart{#1} #2}{\p{\part #3}^{#1}}}} \newcommand{\pparts}[3]{ {\sl{\ppart{#1} #2}{\p{\part #3}^{#1}}}} \newcommand{\grad}[1]{\mathop{\nabla\!_{#1}}} % .. sets \newcommand{\es}{\emptyset} \newcommand{\N}{\mathbb{N}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\R}{\mathbb{R}} \newcommand{\Rge}{\R_{\ge 0}} \newcommand{\Rgt}{\R_{> 0}} \newcommand{\C}{\mathbb{C}} \newcommand{\F}{\mathbb{F}} \newcommand{\zo}{\set{0,1}} \newcommand{\pmo}{\set{\pm 1}} \newcommand{\zpmo}{\set{0,\pm 1}} % .... set operations \newcommand{\sse}{\subseteq} \newcommand{\out}{\not\in} \newcommand{\minus}{\setminus} \newcommand{\inc}[1]{\union \set{#1}} % "including" \newcommand{\exc}[1]{\setminus \set{#1}} % "except" % .. over and under \renewcommand{\ss}[1]{_{\substack{#1}}} \newcommand{\OB}{\overbrace} \newcommand{\ob}[2]{\OB{#1}^\t{#2}} \newcommand{\UB}{\underbrace} \newcommand{\ub}[2]{\UB{#1}_\t{#2}} \newcommand{\ol}{\overline} \newcommand{\tld}{\widetilde} % deprecated \renewcommand{\~}{\widetilde} \newcommand{\HAT}{\widehat} % deprecated \renewcommand{\^}{\widehat} \newcommand{\rt}[1]{ {\sqrt{#1}}} \newcommand{\for}[2]{_{#1=1}^{#2}} \newcommand{\sfor}{\sum\for} \newcommand{\pfor}{\prod\for} % .... two-part \newcommand{\f}{\frac} \renewcommand{\sl}[2]{#1 /\mathopen{}#2} \newcommand{\ff}[2]{\mathchoice{\begin{smallmatrix}\displaystyle\vphantom{\p{#1}}#1\\[-0.05em]\hline\\[-0.05em]\hline\displaystyle\vphantom{\p{#2}}#2\end{smallmatrix}}{\begin{smallmatrix}\vphantom{\p{#1}}#1\\[-0.1em]\hline\\[-0.1em]\hline\vphantom{\p{#2}}#2\end{smallmatrix}}{\begin{smallmatrix}\vphantom{\p{#1}}#1\\[-0.1em]\hline\\[-0.1em]\hline\vphantom{\p{#2}}#2\end{smallmatrix}}{\begin{smallmatrix}\vphantom{\p{#1}}#1\\[-0.1em]\hline\\[-0.1em]\hline\vphantom{\p{#2}}#2\end{smallmatrix}}} % .. arrows \newcommand{\from}{\leftarrow} \DeclareMathOperator*{\<}{\!\;\longleftarrow\;\!} \let\>\undefined \DeclareMathOperator*{\>}{\!\;\longrightarrow\;\!} \let\-\undefined \DeclareMathOperator*{\-}{\!\;\longleftrightarrow\;\!} \newcommand{\so}{\implies} % .. operators and relations \renewcommand{\*}{\cdot} \newcommand{\x}{\times} \newcommand{\ox}{\otimes} \newcommand{\OX}[1]{^{\ox #1}} \newcommand{\sr}{\stackrel} \newcommand{\ce}{\coloneqq} \newcommand{\ec}{\eqqcolon} \newcommand{\ap}{\approx} \newcommand{\ls}{\lesssim} \newcommand{\gs}{\gtrsim} % .. punctuation and spacing \renewcommand{\.}[1]{#1\dots#1} \newcommand{\ts}{\thinspace} \newcommand{\q}{\quad} \newcommand{\qq}{\qquad} % % %%% SPECIALIZED MATH %%% % % Logic and bit operations \newcommand{\fa}{\forall} \newcommand{\ex}{\exists} \renewcommand{\and}{\wedge} \newcommand{\AND}{\bigwedge} \renewcommand{\or}{\vee} \newcommand{\OR}{\bigvee} \newcommand{\xor}{\oplus} \newcommand{\XOR}{\bigoplus} \newcommand{\union}{\cup} \newcommand{\dunion}{\sqcup} \newcommand{\inter}{\cap} \newcommand{\UNION}{\bigcup} \newcommand{\DUNION}{\bigsqcup} \newcommand{\INTER}{\bigcap} \newcommand{\comp}{\overline} \newcommand{\true}{\r{true}} \newcommand{\false}{\r{false}} \newcommand{\tf}{\set{\true,\false}} \DeclareMathOperator{\One}{\mathbb{1}} \DeclareMathOperator{\1}{\mathbb{1}} % use \mathbbm instead if using real LaTeX \DeclareMathOperator{\LSB}{LSB} % % Linear algebra \newcommand{\spn}{\mathrm{span}} % do NOT use \span because it causes misery with amsmath \DeclareMathOperator{\rank}{rank} \DeclareMathOperator{\proj}{proj} \DeclareMathOperator{\dom}{dom} \DeclareMathOperator{\Img}{Im} \DeclareMathOperator{\tr}{tr} \DeclareMathOperator{\perm}{perm} \DeclareMathOperator{\haf}{haf} \newcommand{\transp}{\mathsf{T}} \newcommand{\T}{^\transp} \newcommand{\par}{\parallel} % .. named tensors \newcommand{\namedtensorstrut}{\vphantom{fg}} % milder than \mathstrut \newcommand{\name}[1]{\mathsf{\namedtensorstrut #1}} \newcommand{\nbin}[2]{\mathbin{\underset{\substack{#1}}{\namedtensorstrut #2}}} \newcommand{\ndot}[1]{\nbin{#1}{\odot}} \newcommand{\ncat}[1]{\nbin{#1}{\oplus}} \newcommand{\nsum}[1]{\sum\limits_{\substack{#1}}} \newcommand{\nfun}[2]{\mathop{\underset{\substack{#1}}{\namedtensorstrut\mathrm{#2}}}} \newcommand{\ndef}[2]{\newcommand{#1}{\name{#2}}} \newcommand{\nt}[1]{^{\transp(#1)}} % % Probability \newcommand{\tri}{\triangle} \newcommand{\Normal}{\mathcal{N}} \newcommand{\Exp}{\mathcal{Exp}} % .. operators \DeclareMathOperator{\supp}{supp} \let\Pr\undefined \DeclareMathOperator*{\Pr}{Pr} \DeclareMathOperator*{\G}{\mathbb{G}} \DeclareMathOperator*{\Odds}{Od} \DeclareMathOperator*{\E}{E} \DeclareMathOperator*{\Var}{Var} \DeclareMathOperator*{\Cov}{Cov} \DeclareMathOperator*{\K}{K} \DeclareMathOperator*{\corr}{corr} \DeclareMathOperator*{\median}{median} \DeclareMathOperator*{\maj}{maj} % ... information theory \let\H\undefined \DeclareMathOperator*{\H}{H} \DeclareMathOperator*{\I}{I} \DeclareMathOperator*{\D}{D} \DeclareMathOperator*{\KL}{KL} % .. other divergences \newcommand{\dTV}{d_{\mathrm{TV}}} \newcommand{\dHel}{d_{\mathrm{Hel}}} \newcommand{\dJS}{d_{\mathrm{JS}}} % % Polynomials \DeclareMathOperator{\He}{He} \DeclareMathOperator{\coeff}{coeff} % %%% SPECIALIZED COMPUTER SCIENCE %%% % % Complexity classes % .. keywords \newcommand{\coclass}{\mathsf{co}} \newcommand{\Prom}{\mathsf{Promise}} % .. classical \newcommand{\PTIME}{\mathsf{P}} \newcommand{\NP}{\mathsf{NP}} \newcommand{\coNP}{\coclass\NP} \newcommand{\PH}{\mathsf{PH}} \newcommand{\PSPACE}{\mathsf{PSPACE}} \renewcommand{\L}{\mathsf{L}} \newcommand{\EXP}{\mathsf{EXP}} \newcommand{\NEXP}{\mathsf{NEXP}} % .. probabilistic \newcommand{\formost}{\mathsf{Я}} \newcommand{\RP}{\mathsf{RP}} \newcommand{\BPP}{\mathsf{BPP}} \newcommand{\ZPP}{\mathsf{ZPP}} \newcommand{\MA}{\mathsf{MA}} \newcommand{\AM}{\mathsf{AM}} \newcommand{\IP}{\mathsf{IP}} \newcommand{\RL}{\mathsf{RL}} % .. circuits \newcommand{\NC}{\mathsf{NC}} \newcommand{\AC}{\mathsf{AC}} \newcommand{\ACC}{\mathsf{ACC}} \newcommand{\ThrC}{\mathsf{TC}} \newcommand{\Ppoly}{\mathsf{P}/\poly} \newcommand{\Lpoly}{\mathsf{L}/\poly} % .. resources \newcommand{\TIME}{\mathsf{TIME}} \newcommand{\NTIME}{\mathsf{NTIME}} \newcommand{\SPACE}{\mathsf{SPACE}} \newcommand{\TISP}{\mathsf{TISP}} \newcommand{\SIZE}{\mathsf{SIZE}} % .. custom \newcommand{\NCP}{\mathsf{NCP}} % % Boolean analysis \newcommand{\harpoon}{\!\upharpoonright\!} \newcommand{\rr}[2]{#1\harpoon_{#2}} \newcommand{\Fou}[1]{\widehat{#1}} \DeclareMathOperator{\Ind}{\mathrm{Ind}} \DeclareMathOperator{\Inf}{\mathrm{Inf}} \newcommand{\Der}[1]{\operatorname{D}_{#1}\mathopen{}} % \newcommand{\Exp}[1]{\operatorname{E}_{#1}\mathopen{}} \DeclareMathOperator{\Stab}{\mathrm{Stab}} \DeclareMathOperator{\Tau}{T} \DeclareMathOperator{\sens}{\mathrm{s}} \DeclareMathOperator{\bsens}{\mathrm{bs}} \DeclareMathOperator{\fbsens}{\mathrm{fbs}} \DeclareMathOperator{\Cert}{\mathrm{C}} \DeclareMathOperator{\DT}{\mathrm{DT}} \DeclareMathOperator{\CDT}{\mathrm{CDT}} % canonical \DeclareMathOperator{\ECDT}{\mathrm{ECDT}} \DeclareMathOperator{\CDTv}{\mathrm{CDT_{vars}}} \DeclareMathOperator{\ECDTv}{\mathrm{ECDT_{vars}}} \DeclareMathOperator{\CDTt}{\mathrm{CDT_{terms}}} \DeclareMathOperator{\ECDTt}{\mathrm{ECDT_{terms}}} \DeclareMathOperator{\CDTw}{\mathrm{CDT_{weighted}}} \DeclareMathOperator{\ECDTw}{\mathrm{ECDT_{weighted}}} \DeclareMathOperator{\AvgDT}{\mathrm{AvgDT}} \DeclareMathOperator{\PDT}{\mathrm{PDT}} % partial decision tree \DeclareMathOperator{\DTsize}{\mathrm{DT_{size}}} \DeclareMathOperator{\W}{\mathbf{W}} % .. functions (small caps sadly doesn't work) \DeclareMathOperator{\Par}{\mathrm{Par}} \DeclareMathOperator{\Maj}{\mathrm{Maj}} \DeclareMathOperator{\HW}{\mathrm{HW}} \DeclareMathOperator{\Thr}{\mathrm{Thr}} \DeclareMathOperator{\Tribes}{\mathrm{Tribes}} \DeclareMathOperator{\RotTribes}{\mathrm{RotTribes}} \DeclareMathOperator{\CycleRun}{\mathrm{CycleRun}} \DeclareMathOperator{\SAT}{\mathrm{SAT}} \DeclareMathOperator{\UniqueSAT}{\mathrm{UniqueSAT}} % % Dynamic optimality \newcommand{\OPT}{\mathsf{OPT}} \newcommand{\Alt}{\mathsf{Alt}} \newcommand{\Funnel}{\mathsf{Funnel}} % % Alignment \DeclareMathOperator{\Amp}{\mathrm{Amp}} % %%% TYPESETTING %%% % % In "text" \newcommand{\heart}{\heartsuit} \newcommand{\nth}{^\t{th}} \newcommand{\degree}{^\circ} \newcommand{\qu}[1]{\text{``}#1\text{''}} % remove these last two if using real LaTeX \newcommand{\qed}{\blacksquare} \newcommand{\qedhere}{\tag*{$\blacksquare$}} % % Fonts % .. bold \newcommand{\BA}{\boldsymbol{A}} \newcommand{\BB}{\boldsymbol{B}} \newcommand{\BC}{\boldsymbol{C}} \newcommand{\BD}{\boldsymbol{D}} \newcommand{\BE}{\boldsymbol{E}} \newcommand{\BF}{\boldsymbol{F}} \newcommand{\BG}{\boldsymbol{G}} \newcommand{\BH}{\boldsymbol{H}} \newcommand{\BI}{\boldsymbol{I}} \newcommand{\BJ}{\boldsymbol{J}} \newcommand{\BK}{\boldsymbol{K}} \newcommand{\BL}{\boldsymbol{L}} \newcommand{\BM}{\boldsymbol{M}} \newcommand{\BN}{\boldsymbol{N}} \newcommand{\BO}{\boldsymbol{O}} \newcommand{\BP}{\boldsymbol{P}} \newcommand{\BQ}{\boldsymbol{Q}} \newcommand{\BR}{\boldsymbol{R}} \newcommand{\BS}{\boldsymbol{S}} \newcommand{\BT}{\boldsymbol{T}} \newcommand{\BU}{\boldsymbol{U}} \newcommand{\BV}{\boldsymbol{V}} \newcommand{\BW}{\boldsymbol{W}} \newcommand{\BX}{\boldsymbol{X}} \newcommand{\BY}{\boldsymbol{Y}} \newcommand{\BZ}{\boldsymbol{Z}} \newcommand{\Ba}{\boldsymbol{a}} \newcommand{\Bb}{\boldsymbol{b}} \newcommand{\Bc}{\boldsymbol{c}} \newcommand{\Bd}{\boldsymbol{d}} \newcommand{\Be}{\boldsymbol{e}} \newcommand{\Bf}{\boldsymbol{f}} \newcommand{\Bg}{\boldsymbol{g}} \newcommand{\Bh}{\boldsymbol{h}} \newcommand{\Bi}{\boldsymbol{i}} \newcommand{\Bj}{\boldsymbol{j}} \newcommand{\Bk}{\boldsymbol{k}} \newcommand{\Bl}{\boldsymbol{l}} \newcommand{\Bm}{\boldsymbol{m}} \newcommand{\Bn}{\boldsymbol{n}} \newcommand{\Bo}{\boldsymbol{o}} \newcommand{\Bp}{\boldsymbol{p}} \newcommand{\Bq}{\boldsymbol{q}} \newcommand{\Br}{\boldsymbol{r}} \newcommand{\Bs}{\boldsymbol{s}} \newcommand{\Bt}{\boldsymbol{t}} \newcommand{\Bu}{\boldsymbol{u}} \newcommand{\Bv}{\boldsymbol{v}} \newcommand{\Bw}{\boldsymbol{w}} \newcommand{\Bx}{\boldsymbol{x}} \newcommand{\By}{\boldsymbol{y}} \newcommand{\Bz}{\boldsymbol{z}} \newcommand{\Balpha}{\boldsymbol{\alpha}} \newcommand{\Bbeta}{\boldsymbol{\beta}} \newcommand{\Bgamma}{\boldsymbol{\gamma}} \newcommand{\Bdelta}{\boldsymbol{\delta}} \newcommand{\Beps}{\boldsymbol{\eps}} \newcommand{\Bveps}{\boldsymbol{\veps}} \newcommand{\Bzeta}{\boldsymbol{\zeta}} \newcommand{\Beta}{\boldsymbol{\eta}} \newcommand{\Btheta}{\boldsymbol{\theta}} \newcommand{\Bth}{\boldsymbol{\th}} \newcommand{\Biota}{\boldsymbol{\iota}} \newcommand{\Bkappa}{\boldsymbol{\kappa}} \newcommand{\Blambda}{\boldsymbol{\lambda}} \newcommand{\Bmu}{\boldsymbol{\mu}} \newcommand{\Bnu}{\boldsymbol{\nu}} \newcommand{\Bxi}{\boldsymbol{\xi}} \newcommand{\Bpi}{\boldsymbol{\pi}} \newcommand{\Bvpi}{\boldsymbol{\vpi}} \newcommand{\Brho}{\boldsymbol{\rho}} \newcommand{\Bsigma}{\boldsymbol{\sigma}} \newcommand{\Btau}{\boldsymbol{\tau}} \newcommand{\Bupsilon}{\boldsymbol{\upsilon}} \newcommand{\Bphi}{\boldsymbol{\phi}} \newcommand{\Bfi}{\boldsymbol{\fi}} \newcommand{\Bchi}{\boldsymbol{\chi}} \newcommand{\Bpsi}{\boldsymbol{\psi}} \newcommand{\Bom}{\boldsymbol{\om}} % .. calligraphic \newcommand{\CA}{\mathcal{A}} \newcommand{\CB}{\mathcal{B}} \newcommand{\CC}{\mathcal{C}} \newcommand{\CD}{\mathcal{D}} \newcommand{\CE}{\mathcal{E}} \newcommand{\CF}{\mathcal{F}} \newcommand{\CG}{\mathcal{G}} \newcommand{\CH}{\mathcal{H}} \newcommand{\CI}{\mathcal{I}} \newcommand{\CJ}{\mathcal{J}} \newcommand{\CK}{\mathcal{K}} \newcommand{\CL}{\mathcal{L}} \newcommand{\CM}{\mathcal{M}} \newcommand{\CN}{\mathcal{N}} \newcommand{\CO}{\mathcal{O}} \newcommand{\CP}{\mathcal{P}} \newcommand{\CQ}{\mathcal{Q}} \newcommand{\CR}{\mathcal{R}} \newcommand{\CS}{\mathcal{S}} \newcommand{\CT}{\mathcal{T}} \newcommand{\CU}{\mathcal{U}} \newcommand{\CV}{\mathcal{V}} \newcommand{\CW}{\mathcal{W}} \newcommand{\CX}{\mathcal{X}} \newcommand{\CY}{\mathcal{Y}} \newcommand{\CZ}{\mathcal{Z}} % .. typewriter \newcommand{\TA}{\mathtt{A}} \newcommand{\TB}{\mathtt{B}} \newcommand{\TC}{\mathtt{C}} \newcommand{\TD}{\mathtt{D}} \newcommand{\TE}{\mathtt{E}} \newcommand{\TF}{\mathtt{F}} \newcommand{\TG}{\mathtt{G}} \renewcommand{\TH}{\mathtt{H}} \newcommand{\TI}{\mathtt{I}} \newcommand{\TJ}{\mathtt{J}} \newcommand{\TK}{\mathtt{K}} \newcommand{\TL}{\mathtt{L}} \newcommand{\TM}{\mathtt{M}} \newcommand{\TN}{\mathtt{N}} \newcommand{\TO}{\mathtt{O}} \newcommand{\TP}{\mathtt{P}} \newcommand{\TQ}{\mathtt{Q}} \newcommand{\TR}{\mathtt{R}} \newcommand{\TS}{\mathtt{S}} \newcommand{\TT}{\mathtt{T}} \newcommand{\TU}{\mathtt{U}} \newcommand{\TV}{\mathtt{V}} \newcommand{\TW}{\mathtt{W}} \newcommand{\TX}{\mathtt{X}} \newcommand{\TY}{\mathtt{Y}} \newcommand{\TZ}{\mathtt{Z}} % % LEVELS OF CLOSENESS (basically deprecated) \newcommand{\scirc}[1]{\sr{\circ}{#1}} \newcommand{\sdot}[1]{\sr{.}{#1}} \newcommand{\slog}[1]{\sr{\log}{#1}} \newcommand{\createClosenessLevels}[7]{ \newcommand{#2}{\mathrel{(#1)}} \newcommand{#3}{\mathrel{#1}} \newcommand{#4}{\mathrel{#1\!\!#1}} \newcommand{#5}{\mathrel{#1\!\!#1\!\!#1}} \newcommand{#6}{\mathrel{(\sdot{#1})}} \newcommand{#7}{\mathrel{(\slog{#1})}} } \let\lt\undefined \let\gt\undefined % .. vanilla versions (is it within a constant?) \newcommand{\ez}{\scirc=} \newcommand{\eq}{\simeq} \newcommand{\eqq}{\mathrel{\eq\!\!\eq}} \newcommand{\eqqq}{\mathrel{\eq\!\!\eq\!\!\eq}} \newcommand{\lez}{\scirc\le} \renewcommand{\lq}{\preceq} \newcommand{\lqq}{\mathrel{\lq\!\!\lq}} \newcommand{\lqqq}{\mathrel{\lq\!\!\lq\!\!\lq}} \newcommand{\gez}{\scirc\ge} \newcommand{\gq}{\succeq} \newcommand{\gqq}{\mathrel{\gq\!\!\gq}} \newcommand{\gqqq}{\mathrel{\gq\!\!\gq\!\!\gq}} \newcommand{\lz}{\scirc<} \newcommand{\lt}{\prec} \newcommand{\ltt}{\mathrel{\lt\!\!\lt}} \newcommand{\lttt}{\mathrel{\lt\!\!\lt\!\!\lt}} \newcommand{\gz}{\scirc>} \newcommand{\gt}{\succ} \newcommand{\gtt}{\mathrel{\gt\!\!\gt}} \newcommand{\gttt}{\mathrel{\gt\!\!\gt\!\!\gt}} % .. dotted versions (is it equal in the limit?) \newcommand{\ed}{\sdot=} \newcommand{\eqd}{\sdot\eq} \newcommand{\eqqd}{\sdot\eqq} \newcommand{\eqqqd}{\sdot\eqqq} \newcommand{\led}{\sdot\le} \newcommand{\lqd}{\sdot\lq} \newcommand{\lqqd}{\sdot\lqq} \newcommand{\lqqqd}{\sdot\lqqq} \newcommand{\ged}{\sdot\ge} \newcommand{\gqd}{\sdot\gq} \newcommand{\gqqd}{\sdot\gqq} \newcommand{\gqqqd}{\sdot\gqqq} \newcommand{\ld}{\sdot<} \newcommand{\ltd}{\sdot\lt} \newcommand{\lttd}{\sdot\ltt} \newcommand{\ltttd}{\sdot\lttt} \newcommand{\gd}{\sdot>} \newcommand{\gtd}{\sdot\gt} \newcommand{\gttd}{\sdot\gtt} \newcommand{\gtttd}{\sdot\gttt} % .. log versions (is it equal up to log?) \newcommand{\elog}{\slog=} \newcommand{\eqlog}{\slog\eq} \newcommand{\eqqlog}{\slog\eqq} \newcommand{\eqqqlog}{\slog\eqqq} \newcommand{\lelog}{\slog\le} \newcommand{\lqlog}{\slog\lq} \newcommand{\lqqlog}{\slog\lqq} \newcommand{\lqqqlog}{\slog\lqqq} \newcommand{\gelog}{\slog\ge} \newcommand{\gqlog}{\slog\gq} \newcommand{\gqqlog}{\slog\gqq} \newcommand{\gqqqlog}{\slog\gqqq} \newcommand{\llog}{\slog<} \newcommand{\ltlog}{\slog\lt} \newcommand{\lttlog}{\slog\ltt} \newcommand{\ltttlog}{\slog\lttt} \newcommand{\glog}{\slog>} \newcommand{\gtlog}{\slog\gt} \newcommand{\gttlog}{\slog\gtt} \newcommand{\gtttlog}{\slog\gttt}$

The proof is adapted from Hamed Hatami’s lecture notes for “COMP 531: Advanced Complexity Theory” at McGill. Hamed Hatami doesn’t condone my crappy metaphors.

This note presents Håstad’s original proof of his switching lemma.

Hermits and terrorists

You’re at a board game night, playing a very disturbing version of one-night werewolf. You don’t know how many players there are, but you know that tonight:

some of the villagers are hermits, who will find their vocation and leave the village to go live in isolation;
those who are not hermits have probability $p$ of being a terrorist, who will blow themselves up and kill everyone;
the rest are normal villagers, who are fast asleep.

What’s the chance that at least $t$ villagers are still alive and present in the village the next day?

First of all, we can just disregard all the hermits. We don’t know how many non-hermits there are, but if at least $t$ people survive, then that means there were at least $t$ non-hermits, and all of them failed to be terrorists. That’s a pretty big surprise: the probability that they all fail to be terrorists was at most $(1-p)^t$! So regardless of how many non-hermits there were (and also regardless of how many people there were originally), the probability that $t$ people survive is $\leq (1-p)^t$.

Here, crucially, we assumed they all independently had a $p$ chance of being a terrorist. If they were correlated (for example, if you know they’re either all terrorists or all non-terrorists), then the probability that at least $t$ people survive could be arbitrarily large, and we would have to know something about the number of people there were originally.

ANDs and ORs

The politically incorrect situation above is analogous to what happens to DNFs under random restrictions. Take some term $T_j$ of width $w$ in a DNF $f = T_1 \or \cdots \or T_s$, and apply a random restriction $\rho$ with star probability $\delta$.

If any of the variables in $T_j$ get set to $0$, then $T_j$ evaluates to $0$ and disappears from the DNF without affecting the other terms (i.e. $T_j$ is a hermit).¹
On the other hand, assuming $T_j$ doesn’t get set to $0$ (all its variables were set to $1$ or $*$), there is a chance $p \ce (\frac{1-\delta}{1+\delta})^w$ that all of the variables in $T_j$ get set to $1$, in which case $T_j$ evaluates to $1$ and the whole DNF collapses to the constant $1$ (i.e. $T_j$ is a terrorist).

As long as $\delta = O(1/w)$, this probability $p$ of being a terrorist is a constant, and assuming these are all independent, this shows that

\[\Pr[\text{$\rr{f}{\rho}$ has size $\geq t$}] \leq (1-p)^t \leq 2^{-\Omega(t)}.\]

But the terms of a DNF tend to share a lot of variables,² so the events that two terms $T_{j_1}$ and $T_{j_2}$ are terrorists are far from independent: if $T_{j_1}$ and $T_{j_2}$ share a lot of the same variables, then conditioned on the fact that $T_{j_1}$ has some stars, $T_{j_2}$ is quite likely to have some stars too.

To rephrase, what goes wrong is: The terms shared some stars, which allowed them to both not be terrorists without it being a huge surprise.

It turns out this is the only issue! So if somehow after looking at $T_{j_1}$ we could fix all its stars, then it’s as if down the line, $T_{j_2}$ has a fresh $p$ chance of being a terrorist. Indeed, if we do that, the only variables we’ll have seen are all fixed to $0$ or $1$ (either by $\rho$ or by us), so conditioning on those should only increase the chance that the next term is a terrorist!

This insight then allows us to show that $f$ is very likely to be a small-depth decision tree. Indeed, say $T_{j_1}$ has $r$ stars. If we can inductively show that for all possible fixings of those $r$ stars, the resulting CNF has small decision tree depth, then so does $f$: just start out by querying those $r$ stars in the first $r$ levels.³

What we lost: a union bound

However, we did lose something: we now have to union-bound over all the possible assignments of those $r$ stars, and when the term isn’t a terrorist, that’s at least $2$ possibilities! So to counterbalance for this, it’s no longer enough for the probability $p$ of being a terrorist to be a constant; it should be at the very least $> 1/2$! This means that we cannot say anything when $\delta$ is bigger than $1/w$…

And this union bound over the $2^r$ branches seems to be justified in practice. Indeed, consider the “choose your DNF” DNF:

\[f(x, y^{(1)}, \ldots, y^{(2^w)}) = x_1 \cdots x_w g_1(y^{(1)}) \or x_1 \cdots x_{w-1} \comp{x_w} g_2(y^{(2)}) \or \cdots \or \comp{x_1} \cdots \comp{x_w} g_{2^w}(y^{(2^w)}).\]

where each $y^{(i)}$ is an input in $\zo^m$ and $g_j$ is itself a width-$w$ DNF (so that $f$ is a width-$2w$ DNF). Suppose that $r$ of the $w$ variables $x_1, \ldots, x_w$ end up being stars in $\rho$, then $2^r$ of the $g_j$’s will still be relevant after the restriction. So even pretty rare events about the structure of the $g_j$’s after the random restriction (like “$\rr{g_j}{\rho}$ has large decision tree depth”) could end up happening: they’re all independent and there’s $2^r$ of them, so the probability that at least one of them happens is roughly $2^r$ bigger than the probability of any specific one happening!

Nevertheless, this case is not actually problematic for say $\delta = 10/w$, since the probability to have $r$ stars in the first $w$ variables itself decreases very quickly (on the order of $r^{-\Omega(r)}$, as mentioned in this earlier footnote³). It would be very interesting to still be able to say something beyond the $\delta = 1/w$ barrier.

Statement and proof

Let $f = T_1 \or \cdots \or T_s$ be a width-$w$ DNF, and let $\rho$ be a random restriction with star probability $\delta$. Håstad’s switching lemma⁴ says that for any $t>0$,

\[\Pr[\DT(\rr{f}{\rho}) \geq t] \leq (C\delta w)^t\]

(for some constant $C$ to be set later).

We want to show that by induction: looking at the first non-hermit term $T_j$, then recursing on the rest of the DNF after fixing all its stars to $0$ or $1$. But this requires us to condition on $T_1, \ldots, T_{j-1}$ being hermits, and on the other variables in $T_j$ having the values that they had. Intuitively, this is fine, because those things only make it less likely to have stars going forward. But we do need to strengthen the induction hypothesis in order to handle this conditioning.

What we’ll actually prove is that for any auxiliary function $F: \zo^n \to \zo$, then

\[\Pr[\DT(\rr{f}{\rho}) \geq t \mid \rr{F}{\rho} \equiv 0] \leq (C\delta w)^t.\]

This condition “$\rr{F}{\rho} \equiv 0$” will be enough to encode all the things we wanted to condition on. And indeed, intuitively, conditioning on some function collapsing to a constant seems to only make it less likely for stars to appear.

We’ll induce on $s$, the number of terms of $f$. The base case when $f$ has no terms is obvious: that makes $f$ constant, so $\DT(f)=0$. Otherwise, look at the first term in $f$, i.e. $T_1$.

We can split the event $\DT(\rr{f}\rho) \geq t$ into two cases: either $\rr{T_1}{\rho} \equiv 0$ (i.e. $T_1$ is a hermit), or $\rr{T_1}\rho \not\equiv 0$ (it’s not). It suffices to show that conditioned on either of those, the probability is $\leq (C\delta w)^t$.

Case 1: $T_1$ is a hermit

Nothing interesting happens here: we’re just showing that “conditioning on the first few terms being hermits” can be achieved by “conditioning on some auxiliary function $F$ collapsing to $0$”.

We want to bound $\Pr[\DT(\rr{f}\rho) \geq t \mid \rr{F}\rho \equiv 0 \and \rr{T_1}\rho \equiv 0]$. We can observe two things:

Once we’ve conditioned on $T_1$ collapsing to $0$, then $f$ is equivalent $T_2 \or \cdots \or T_s$, and so their depths are the same.
Conditioning on $F$ and $T_1$ both collapsing to $0$ is the same as conditioning on their OR $F \or T_1$ collapsing to $0$.

So we can rewrite

\[ \Pr[\DT(\rr{f}\rho) \geq t \mid \rr{F}\rho \equiv 0 \and \rr{T_1}\rho \equiv 0] = \Pr[\DT(\rr{(T_2 \or \cdots \or T_s)}\rho) \geq t \mid \rr{(F \or T_1)}\rho \equiv 0],\]

and then the induction hypothesis for $f' \coloneqq T_2 \or \cdots \or T_s$ and $F' \coloneqq F \or T_1$ tells us that the RHS is at most $(C\delta w)^t$.

Case 2: $T_1$ is not a hermit

This is the interesting case.

Without loss of generality (up to a flipping of the variables), we can assume that $T_1$ contains no negations. This means that under our assumption that $\rr{T_1}{\rho} \not\equiv 0$, all the variables in $T_1$ are set either $1$ or $*$. We will roughly split the argument into two parts

the probability to get $r$ stars in $T_1$ decreases fast as $r$ grows;
the probability that “the rest of $f$ after fixing those stars” has depth $\geq t-r$ is small.

First, split into cases based on the subset $S \sse T_1$ of the variables in $T_1$ that are set to $*$:

\[ \begin{align*} &\ \Pr[\DT(\rr{f}\rho) \ge t \mid \rr{F}\rho \equiv 0, \rr{T_1}\rho \not\equiv 0]\\ = &\ \Pr[\DT(\rr{f}\rho) \ge t \mid \rr{F}\rho \equiv 0, \rho_{T_1} \in \set{1,*}]\\ = &\sum_{S \sse T_1}\Pr[\DT(\rr{f}\rho) \ge t, \rho_S = *, \rho_{T_1 \setminus S} = 1 \mid \rr{F}\rho \equiv 0, \rho_{T_1} \in \set{1,*}]\\ = &\sum_{r=1}^w \sum_{\substack{S \sse T_1 \\|S|=r}} \Pr[\rho_S = *, \rho_{T_1\setminus S} = 1 \mid \rr{F}\rho \equiv 0, \rho_{T_1} \in \set{1,*}] \cdot \Pr[\DT(\rr{f}{\rho}) \geq t \mid \rr{F}{\rho} \equiv 0, \rho_S = *, \rho_{T_1 \setminus S} = 1]\\ \leq &\sum_{r=1}^w \underbrace{\sum_{\substack{S \sse T_1 \\|S|=r}} \Pr[\rho_S = * \mid \rr{F}\rho \equiv 0, \rho_{T_1} \in \set{1,*}]}_\text{(a): ``$\Pr[\text{$T_1$ has $r$ stars}]$''} \cdot \underbrace{\Pr[\DT(\rr{f}{\rho}) \geq t \mid \rr{F}{\rho} \equiv 0, \rho_S = *, \rho_{T_1 \setminus S} = 1]}_{\text{(b): ``$\Pr[\text{the rest of $f$ has depth $\geq t-r$}]$''}} \end{align*} \]

(note that we’re using shortcuts like “$\rho_{T_1} \in \set{1,*}$” to mean “$\rho$ sets all of the variables of $T_1$ to either $1$ or $*$”). A crucial step in the above was to exclude the case $S = \emptyset$. We can do this because when there are no stars, $T_1$ is a terrorist: it evaluates to $1$ and collapses the whole function to the constant $1$, which sets the decision tree depth to $0$.

Part (a): $T_1$ is unlikely to have many stars

First, we deal with the sum $\sum_{\substack{S \sse T_1 \\|S|=r}} \Pr[\rho_S = * \mid \rr{F}\rho \equiv 0, \rho_{T_1} \in \set{1,*}]$, which represents the probability that $T_1$ has at least $r$ stars (conditioned on being all $1$ and $*$). These probabilities would be trivial to compute if only we were not conditioning on $\rr{F}{\rho} \equiv 0$: we have

\[\Pr[\rho_S = * \mid \rho_{T_1} \in \set{1,*}] = \p{\frac{\Pr[\rho_i=1]}{\Pr[\rho_i \in \set{1,*}]}}^r = \p{\frac{\delta}{\frac{1+\delta}{2}}}^r \leq (2\delta)^r.\]

We want to somehow argue that conditioning on $\rr{F}\rho \equiv 0$ can only decrease the probability that $\rho_S = *$. By Bayes’ rule, it suffices to prove the opposite: that conditioning on $\rho_S = *$ can only decrease the probability of $\rr{F}{\rho} \equiv 0$, that is

\[\Pr[\rr{F}\rho \equiv 0 \mid \rho_S = *, \rho_{T_1} \in \set{1,*}] \leq \Pr[\rr{F}\rho \equiv 0 \mid \rho_{T_1} \in \set{1,*}].\]

We’re comparing two situations. In both of them, we’re forcing $\rho$ to take values $1$ and $*$ over $T_1 \setminus S$. But in the first situation, we’re forcing $\rho_S = *$, while in the second one we’re only forcing $\rho_S \in \set{1,*}$. Clearly, if $F$ becomes fixed to $0$ in the first situation, it will still be fixed to $0$ in the second situation where some of its stars in $S$ are fixed to $1$’s.⁵

Therefore, we obtain⁶

\[ \begin{align*} \sum_{\substack{S \sse T_1 \\|S|=r}} \Pr[\rho_S = * \mid \rr{F}\rho \equiv 0, \rho_{T_1} \in \set{1,*}] &\leq \sum_{\substack{S \sse T_1 \\|S|=r}} \Pr[\rho_S = * \mid \rho_{T_1} \in \set{1,*}]\\ &\leq \sum_{\substack{S \sse T_1 \\|S|=r}} (2\delta)^r\\ &\leq \binom{w}{r} (2\delta)^r\\ &\leq (2\delta w)^r. \end{align*} \]

Part (b): the rest is unlikely to be deep

To bound the probability $\Pr[\DT(\rr{f}{\rho}) \geq t \mid \rr{F}{\rho} \equiv 0, \rho_S = *, \rho_{T_1 \setminus S} = 1]$, it suffices to union bound over all possible fixings $\sigma \in \zo^S$ of the stars: indeed, if $\rr{f}{\sigma\rho}$ has decision tree depth $<t-r$ for each $\sigma$, then $\rr{f}{\rho}$ must have decision tree depth $<t$.

If $\sigma = 1^S$, then $T_1$ gets set to $1$, so $\rr{f}{\sigma\rho} \equiv 1$, so $\Pr[\DT(\rr{f}{\sigma\rho}) \geq t-r \mid \cdot] = 0$ and we can just drop that case.
Otherwise, $T_1$ gets set to $0$, so $\rr{f}{\sigma\rho} \equiv \rr{(T_2 \or \cdots \or T_s)}{\sigma\rho}$. This will allow us to induce on $f' \coloneqq \rr{(T_2 \or \cdots \or T_s)}{\sigma}$, which has less terms than $f$.

But in order to apply the inductive hypothesis, we also need to somehow get rid of the conditioning on $\rho_S=*$ and incorporate $\rho_{T_1\setminus S} = 1$ into a the condition $\rr{F}{\rho} \equiv 0$. To do this, observe that:

conditioned on $\rho_S = *$, $\rr{F}{\rho} \equiv 0$ holds iff it holds under every possible fixing of the variables in $S$: that is, $F$ can be replaced by $\OR_{\sigma' \in \zo^S}\rr{F}{\sigma'}$;⁷
to make sure that $\rho_{T_1 \setminus S} = 1$, it suffices to add $\OR_{i \in T_1 \setminus S} \comp{x_i}$ to $F$.

All in all, “$\rr{F}{\rho} \equiv 0, \rho_S = *, \rho_{T_1 \setminus S} = 1$” is equivalent to “$\rr{F'}{\rho} \equiv 0, \rho_S = *$”, where the new function

\[F' \coloneqq \p{\OR_{\sigma' \in \zo^S}\rr{F}{\sigma'}} \or \p{\OR_{i \in T_1 \setminus S} \comp{x_i}}\]

doesn’t depend on the variables in $S$ anymore.

This gives us

\[ \begin{align*} \Pr[\DT(\rr{f}{\rho}) \geq t &\mid \rr{F}{\rho} \equiv 0, \rho_S = *, \rho_{T_1 \setminus S} = 1]\\ &\leq \sum_{\sigma \in \zo^S} \Pr[\DT(\rr{f}{\sigma\rho}) \geq t-r \mid \rr{F}{\rho} \equiv 0, \rho_S = *, \rho_{T_1 \setminus S} = 1]\\ &= \sum_{\sigma \in \zo^S} \Pr[\DT(\rr{f}{\sigma\rho}) \geq t-r \mid \rr{F'}{\rho} \equiv 0, \rho_S = *]\\ &= \sum_{\sigma \in \zo^S} \Pr[\DT(\rr{f}{\sigma\rho}) \geq t-r \mid \rr{F'}{\rho} \equiv 0]\\ &= \sum_{\substack{\sigma \in \zo^S\\\sigma \neq 1^S}} \Pr[\DT(\rr{(\underbrace{\rr{(T_2 \or \cdots \or T_s)}{\sigma}}_{f'})}{\rho}) \geq t-r \mid \rr{F'}{\rho} \equiv 0]\\ &\leq \sum_{\substack{\sigma \in \zo^S\\\sigma \neq 1^S}} (C\delta w)^{t-r}\\ &\leq 2^r(C\delta w)^{t-r},\\ \end{align*} \]

where the second equality holds because neither $\rr{f}{\sigma}$ nor $F'$ depend on the variables in $S$.

Putting (a) and (b) together

Coming back to our sum, we have

\[ \begin{align*} \Pr[&\DT(\rr{f}\rho) \ge t \mid \rr{F}\rho \equiv 0, \rr{T_1}\rho \not\equiv 0]\\ &\leq \sum_{r=1}^w \underbrace{\sum_{\substack{S \sse T_1 \\|S|=r}} \Pr[\rho_S = * \mid \rr{F}\rho \equiv 0, \rho_{T_1} \in \set{1,*}]}_\text{(a): ``$\Pr[\text{$T_1$ has $r$ stars}]$''} \cdot \underbrace{\Pr[\DT(\rr{f}{\rho}) \geq t \mid \rr{F}{\rho} \equiv 0, \rho_S = *, \rho_{T_1 \setminus S} = 1]}_{\text{(b): ``$\Pr[\text{the rest of $f$ has depth $\geq t-r$}]$''}}\\ &\leq \sum_{r=1}^w (2 \delta w)^r \cdot 2^r (C \delta w)^{t-r}\\ &\leq (C \delta w)^t \cdot\sum_{r=1}^w (4/C)^r. \end{align*} \]

It then suffices to set $C \ce 8$ in order to bound the sum $\sum_{r=1}^w (4/C)^r$ by $1$ and prove the lemma.

When $w$ is at least somewhat large, this is by far the most common outcome: the chance that no variable in $T_j$ gets set to $0$ is only $(\frac{1+\delta}{2})^{w} = 2^{-\Omega(w)}$. But Håstad’s switching lemma doesn’t use this fact at all! ↩
If they don’t (i.e. if $f$ is read-once), then the above reasoning is completely valid, though! ↩
To make this complete, we also have to argue that $r$ won’t be too big, but this is the case. In fact, the probability that $T_{j_1}$ has $r$ stars decreases as $r^{-\Omega(r)}$, because we’re counterbalancing a union bound $\binom{w}{r} \leq w^r/r!$ over the choice of the $r$ variables with a probability $\delta^r = O(1/w)^r$. ↩ ↩²
Johan Håstad, “Computational limitations of small-depth circuits” (PhD thesis). ↩
This is not quite formal, but can be made formal by first drawing $\rho_{[n] \setminus S}$, then drawing $\rho_S$ (which is okay to do because all coordinates of $\rho$ are independent). ↩
Note that we’re not using the factor $1/r!$ that we could have gotten from the binomial $\binom{w}{r}$. In this case, it would only help us improve the constant $C$ slightly. ↩
Instead of applying this ORing trick, Håstad instead makes the induction hypothesis more general by allowing $f$ and $\rho$ to be defined on a subset $I \sse [n]$ of the variables whereas $F$ remains a function over all $n$ variables ($\rr{F}{\rho}$ is understood to leave the variables outside of $I$ unfixed). In that case, we can replace $\rho$ by its projection $\rho'$ on $[n] \setminus S$, which doesn’t change anything because under the assumption $\rho_S = *$, we have $\rr{f}{\sigma\rho} \equiv \rr{f}{\sigma\rho'}$ and $\rr{F}{\rho} \equiv \rr{F}{\rho'}$. And once that’s done, the conditioning $\rho_S = *$ disappears because $\rho'$ doesn’t depend on $\rho_S$. ↩