Multi-switching lemma

$\require{mathtools} \newcommand{\nc}{\newcommand} % %%% GENERIC MATH %%% % % Environments \newcommand{\al}[1]{\begin{align}#1\end{align}} % need this for \tag{} to work \renewcommand{\r}{\mathrm} % BAD!! does cursed things with accents :(( \renewcommand{\t}{\textrm} \newcommand{\either}[1]{\begin{cases}#1\end{cases}} % % Delimiters % (I needed to create my own because the MathJax version of \DeclarePairedDelimiter doesn't have \mathopen{} and that messes up the spacing) % .. one-part \newcommand{\p}[1]{\mathopen{}\left( #1 \right)} \renewcommand{\P}[1]{^{\p{#1}}} \renewcommand{\b}[1]{\mathopen{}\left[ #1 \right]} \newcommand{\lopen}[1]{\mathopen{}\left( #1 \right]} \newcommand{\ropen}[1]{\mathopen{}\left[ #1 \right)} \newcommand{\set}[1]{\mathopen{}\left\{ #1 \right\}} \newcommand{\abs}[1]{\mathopen{}\left\lvert #1 \right\rvert} \newcommand{\floor}[1]{\mathopen{}\left\lfloor #1 \right\rfloor} \newcommand{\ceil}[1]{\mathopen{}\left\lceil #1 \right\rceil} \newcommand{\round}[1]{\mathopen{}\left\lfloor #1 \right\rceil} \newcommand{\inner}[1]{\mathopen{}\left\langle #1 \right\rangle} \newcommand{\norm}[1]{\mathopen{}\left\lVert #1 \strut \right\rVert} \newcommand{\frob}[1]{\norm{#1}_\mathrm{F}} \newcommand{\mix}[1]{\mathopen{}\left\lfloor #1 \right\rceil} %% .. two-part \newcommand{\inco}[2]{#1 \mathop{}\middle|\mathop{} #2} \newcommand{\co}[2]{ {\left.\inco{#1}{#2}\right.}} \newcommand{\cond}{\co} % deprecated \newcommand{\pco}[2]{\p{\inco{#1}{#2}}} \newcommand{\bco}[2]{\b{\inco{#1}{#2}}} \newcommand{\setco}[2]{\set{\inco{#1}{#2}}} \newcommand{\at}[2]{ {\left.#1\strut\right|_{#2}}} \newcommand{\pat}[2]{\p{\at{#1}{#2}}} \newcommand{\bat}[2]{\b{\at{#1}{#2}}} \newcommand{\para}[2]{#1\strut \mathop{}\middle\|\mathop{} #2} \newcommand{\ppa}[2]{\p{\para{#1}{#2}}} \newcommand{\pff}[2]{\p{\ff{#1}{#2}}} \newcommand{\bff}[2]{\b{\ff{#1}{#2}}} \newcommand{\bffco}[4]{\bff{\cond{#1}{#2}}{\cond{#3}{#4}}} \newcommand{\sm}[1]{\p{\begin{smallmatrix}#1\end{smallmatrix}}} % % Greek \newcommand{\eps}{\epsilon} \newcommand{\veps}{\varepsilon} \newcommand{\vpi}{\varpi} % the following cause issues with real LaTeX tho :/ maybe consider naming it \fhi instead? \let\fi\phi % because it looks like an f \let\phi\varphi % because it looks like a p \renewcommand{\th}{\theta} \newcommand{\Th}{\Theta} \newcommand{\om}{\omega} \newcommand{\Om}{\Omega} % % Miscellaneous \newcommand{\LHS}{\mathrm{LHS}} \newcommand{\RHS}{\mathrm{RHS}} \DeclareMathOperator{\cst}{const} % .. operators \DeclareMathOperator{\poly}{poly} \DeclareMathOperator{\polylog}{polylog} \DeclareMathOperator{\quasipoly}{quasipoly} \DeclareMathOperator{\negl}{negl} \DeclareMathOperator*{\argmin}{arg\thinspace min} \DeclareMathOperator*{\argmax}{arg\thinspace max} \DeclareMathOperator{\diag}{diag} % .. functions \DeclareMathOperator{\id}{id} \DeclareMathOperator{\sign}{sign} \DeclareMathOperator{\step}{step} \DeclareMathOperator{\err}{err} \DeclareMathOperator{\ReLU}{ReLU} \DeclareMathOperator{\softmax}{softmax} % .. analysis \let\d\undefined \newcommand{\d}{\operatorname{d}\mathopen{}} \newcommand{\dd}[1]{\operatorname{d}^{#1}\mathopen{}} \newcommand{\df}[2]{ {\f{\d #1}{\d #2}}} \newcommand{\ds}[2]{ {\sl{\d #1}{\d #2}}} \newcommand{\ddf}[3]{ {\f{\dd{#1} #2}{\p{\d #3}^{#1}}}} \newcommand{\dds}[3]{ {\sl{\dd{#1} #2}{\p{\d #3}^{#1}}}} \renewcommand{\part}{\partial} \newcommand{\ppart}[1]{\part^{#1}} \newcommand{\partf}[2]{\f{\part #1}{\part #2}} \newcommand{\parts}[2]{\sl{\part #1}{\part #2}} \newcommand{\ppartf}[3]{ {\f{\ppart{#1} #2}{\p{\part #3}^{#1}}}} \newcommand{\pparts}[3]{ {\sl{\ppart{#1} #2}{\p{\part #3}^{#1}}}} \newcommand{\grad}[1]{\mathop{\nabla\!_{#1}}} % .. sets \newcommand{\es}{\emptyset} \newcommand{\N}{\mathbb{N}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\R}{\mathbb{R}} \newcommand{\Rge}{\R_{\ge 0}} \newcommand{\Rgt}{\R_{> 0}} \newcommand{\C}{\mathbb{C}} \newcommand{\F}{\mathbb{F}} \newcommand{\zo}{\set{0,1}} \newcommand{\pmo}{\set{\pm 1}} \newcommand{\zpmo}{\set{0,\pm 1}} % .... set operations \newcommand{\sse}{\subseteq} \newcommand{\out}{\not\in} \newcommand{\minus}{\setminus} \newcommand{\inc}[1]{\union \set{#1}} % "including" \newcommand{\exc}[1]{\setminus \set{#1}} % "except" % .. over and under \renewcommand{\ss}[1]{_{\substack{#1}}} \newcommand{\OB}{\overbrace} \newcommand{\ob}[2]{\OB{#1}^\t{#2}} \newcommand{\UB}{\underbrace} \newcommand{\ub}[2]{\UB{#1}_\t{#2}} \newcommand{\ol}{\overline} \newcommand{\tld}{\widetilde} % deprecated \renewcommand{\~}{\widetilde} \newcommand{\HAT}{\widehat} % deprecated \renewcommand{\^}{\widehat} \newcommand{\rt}[1]{ {\sqrt{#1}}} \newcommand{\for}[2]{_{#1=1}^{#2}} \newcommand{\sfor}{\sum\for} \newcommand{\pfor}{\prod\for} % .... two-part \newcommand{\f}{\frac} \renewcommand{\sl}[2]{#1 /\mathopen{}#2} \newcommand{\ff}[2]{\mathchoice{\begin{smallmatrix}\displaystyle\vphantom{\p{#1}}#1\\[-0.05em]\hline\\[-0.05em]\hline\displaystyle\vphantom{\p{#2}}#2\end{smallmatrix}}{\begin{smallmatrix}\vphantom{\p{#1}}#1\\[-0.1em]\hline\\[-0.1em]\hline\vphantom{\p{#2}}#2\end{smallmatrix}}{\begin{smallmatrix}\vphantom{\p{#1}}#1\\[-0.1em]\hline\\[-0.1em]\hline\vphantom{\p{#2}}#2\end{smallmatrix}}{\begin{smallmatrix}\vphantom{\p{#1}}#1\\[-0.1em]\hline\\[-0.1em]\hline\vphantom{\p{#2}}#2\end{smallmatrix}}} % .. arrows \newcommand{\from}{\leftarrow} \DeclareMathOperator*{\<}{\!\;\longleftarrow\;\!} \let\>\undefined \DeclareMathOperator*{\>}{\!\;\longrightarrow\;\!} \let\-\undefined \DeclareMathOperator*{\-}{\!\;\longleftrightarrow\;\!} \newcommand{\so}{\implies} % .. operators and relations \renewcommand{\*}{\cdot} \newcommand{\x}{\times} \newcommand{\ox}{\otimes} \newcommand{\OX}[1]{^{\ox #1}} \newcommand{\sr}{\stackrel} \newcommand{\ce}{\coloneqq} \newcommand{\ec}{\eqqcolon} \newcommand{\ap}{\approx} \newcommand{\ls}{\lesssim} \newcommand{\gs}{\gtrsim} % .. punctuation and spacing \renewcommand{\.}[1]{#1\dots#1} \newcommand{\ts}{\thinspace} \newcommand{\q}{\quad} \newcommand{\qq}{\qquad} % % %%% SPECIALIZED MATH %%% % % Logic and bit operations \newcommand{\fa}{\forall} \newcommand{\ex}{\exists} \renewcommand{\and}{\wedge} \newcommand{\AND}{\bigwedge} \renewcommand{\or}{\vee} \newcommand{\OR}{\bigvee} \newcommand{\xor}{\oplus} \newcommand{\XOR}{\bigoplus} \newcommand{\union}{\cup} \newcommand{\dunion}{\sqcup} \newcommand{\inter}{\cap} \newcommand{\UNION}{\bigcup} \newcommand{\DUNION}{\bigsqcup} \newcommand{\INTER}{\bigcap} \newcommand{\comp}{\overline} \newcommand{\true}{\r{true}} \newcommand{\false}{\r{false}} \newcommand{\tf}{\set{\true,\false}} \DeclareMathOperator{\One}{\mathbb{1}} \DeclareMathOperator{\1}{\mathbb{1}} % use \mathbbm instead if using real LaTeX \DeclareMathOperator{\LSB}{LSB} % % Linear algebra \newcommand{\spn}{\mathrm{span}} % do NOT use \span because it causes misery with amsmath \DeclareMathOperator{\rank}{rank} \DeclareMathOperator{\proj}{proj} \DeclareMathOperator{\dom}{dom} \DeclareMathOperator{\Img}{Im} \DeclareMathOperator{\tr}{tr} \DeclareMathOperator{\perm}{perm} \DeclareMathOperator{\haf}{haf} \newcommand{\transp}{\mathsf{T}} \newcommand{\T}{^\transp} \newcommand{\par}{\parallel} % .. named tensors \newcommand{\namedtensorstrut}{\vphantom{fg}} % milder than \mathstrut \newcommand{\name}[1]{\mathsf{\namedtensorstrut #1}} \newcommand{\nbin}[2]{\mathbin{\underset{\substack{#1}}{\namedtensorstrut #2}}} \newcommand{\ndot}[1]{\nbin{#1}{\odot}} \newcommand{\ncat}[1]{\nbin{#1}{\oplus}} \newcommand{\nsum}[1]{\sum\limits_{\substack{#1}}} \newcommand{\nfun}[2]{\mathop{\underset{\substack{#1}}{\namedtensorstrut\mathrm{#2}}}} \newcommand{\ndef}[2]{\newcommand{#1}{\name{#2}}} \newcommand{\nt}[1]{^{\transp(#1)}} % % Probability \newcommand{\tri}{\triangle} \newcommand{\Normal}{\mathcal{N}} \newcommand{\Exp}{\mathcal{Exp}} % .. operators \DeclareMathOperator{\supp}{supp} \let\Pr\undefined \DeclareMathOperator*{\Pr}{Pr} \DeclareMathOperator*{\G}{\mathbb{G}} \DeclareMathOperator*{\Odds}{Od} \DeclareMathOperator*{\E}{E} \DeclareMathOperator*{\Var}{Var} \DeclareMathOperator*{\Cov}{Cov} \DeclareMathOperator*{\K}{K} \DeclareMathOperator*{\corr}{corr} \DeclareMathOperator*{\median}{median} \DeclareMathOperator*{\maj}{maj} % ... information theory \let\H\undefined \DeclareMathOperator*{\H}{H} \DeclareMathOperator*{\I}{I} \DeclareMathOperator*{\D}{D} \DeclareMathOperator*{\KL}{KL} % .. other divergences \newcommand{\dTV}{d_{\mathrm{TV}}} \newcommand{\dHel}{d_{\mathrm{Hel}}} \newcommand{\dJS}{d_{\mathrm{JS}}} % % Polynomials \DeclareMathOperator{\He}{He} \DeclareMathOperator{\coeff}{coeff} % %%% SPECIALIZED COMPUTER SCIENCE %%% % % Complexity classes % .. keywords \newcommand{\coclass}{\mathsf{co}} \newcommand{\Prom}{\mathsf{Promise}} % .. classical \newcommand{\PTIME}{\mathsf{P}} \newcommand{\NP}{\mathsf{NP}} \newcommand{\coNP}{\coclass\NP} \newcommand{\PH}{\mathsf{PH}} \newcommand{\PSPACE}{\mathsf{PSPACE}} \renewcommand{\L}{\mathsf{L}} \newcommand{\EXP}{\mathsf{EXP}} \newcommand{\NEXP}{\mathsf{NEXP}} % .. probabilistic \newcommand{\formost}{\mathsf{Я}} \newcommand{\RP}{\mathsf{RP}} \newcommand{\BPP}{\mathsf{BPP}} \newcommand{\ZPP}{\mathsf{ZPP}} \newcommand{\MA}{\mathsf{MA}} \newcommand{\AM}{\mathsf{AM}} \newcommand{\IP}{\mathsf{IP}} \newcommand{\RL}{\mathsf{RL}} % .. circuits \newcommand{\NC}{\mathsf{NC}} \newcommand{\AC}{\mathsf{AC}} \newcommand{\ACC}{\mathsf{ACC}} \newcommand{\ThrC}{\mathsf{TC}} \newcommand{\Ppoly}{\mathsf{P}/\poly} \newcommand{\Lpoly}{\mathsf{L}/\poly} % .. resources \newcommand{\TIME}{\mathsf{TIME}} \newcommand{\NTIME}{\mathsf{NTIME}} \newcommand{\SPACE}{\mathsf{SPACE}} \newcommand{\TISP}{\mathsf{TISP}} \newcommand{\SIZE}{\mathsf{SIZE}} % .. custom \newcommand{\NCP}{\mathsf{NCP}} % % Boolean analysis \newcommand{\harpoon}{\!\upharpoonright\!} \newcommand{\rr}[2]{#1\harpoon_{#2}} \newcommand{\Fou}[1]{\widehat{#1}} \DeclareMathOperator{\Ind}{\mathrm{Ind}} \DeclareMathOperator{\Inf}{\mathrm{Inf}} \newcommand{\Der}[1]{\operatorname{D}_{#1}\mathopen{}} % \newcommand{\Exp}[1]{\operatorname{E}_{#1}\mathopen{}} \DeclareMathOperator{\Stab}{\mathrm{Stab}} \DeclareMathOperator{\Tau}{T} \DeclareMathOperator{\sens}{\mathrm{s}} \DeclareMathOperator{\bsens}{\mathrm{bs}} \DeclareMathOperator{\fbsens}{\mathrm{fbs}} \DeclareMathOperator{\Cert}{\mathrm{C}} \DeclareMathOperator{\DT}{\mathrm{DT}} \DeclareMathOperator{\CDT}{\mathrm{CDT}} % canonical \DeclareMathOperator{\ECDT}{\mathrm{ECDT}} \DeclareMathOperator{\CDTv}{\mathrm{CDT_{vars}}} \DeclareMathOperator{\ECDTv}{\mathrm{ECDT_{vars}}} \DeclareMathOperator{\CDTt}{\mathrm{CDT_{terms}}} \DeclareMathOperator{\ECDTt}{\mathrm{ECDT_{terms}}} \DeclareMathOperator{\CDTw}{\mathrm{CDT_{weighted}}} \DeclareMathOperator{\ECDTw}{\mathrm{ECDT_{weighted}}} \DeclareMathOperator{\AvgDT}{\mathrm{AvgDT}} \DeclareMathOperator{\PDT}{\mathrm{PDT}} % partial decision tree \DeclareMathOperator{\DTsize}{\mathrm{DT_{size}}} \DeclareMathOperator{\W}{\mathbf{W}} % .. functions (small caps sadly doesn't work) \DeclareMathOperator{\Par}{\mathrm{Par}} \DeclareMathOperator{\Maj}{\mathrm{Maj}} \DeclareMathOperator{\HW}{\mathrm{HW}} \DeclareMathOperator{\Thr}{\mathrm{Thr}} \DeclareMathOperator{\Tribes}{\mathrm{Tribes}} \DeclareMathOperator{\RotTribes}{\mathrm{RotTribes}} \DeclareMathOperator{\CycleRun}{\mathrm{CycleRun}} \DeclareMathOperator{\SAT}{\mathrm{SAT}} \DeclareMathOperator{\UniqueSAT}{\mathrm{UniqueSAT}} % % Dynamic optimality \newcommand{\OPT}{\mathsf{OPT}} \newcommand{\Alt}{\mathsf{Alt}} \newcommand{\Funnel}{\mathsf{Funnel}} % % Alignment \DeclareMathOperator{\Amp}{\mathrm{Amp}} % %%% TYPESETTING %%% % % In "text" \newcommand{\heart}{\heartsuit} \newcommand{\nth}{^\t{th}} \newcommand{\degree}{^\circ} \newcommand{\qu}[1]{\text{``}#1\text{''}} % remove these last two if using real LaTeX \newcommand{\qed}{\blacksquare} \newcommand{\qedhere}{\tag*{$\blacksquare$}} % % Fonts % .. bold \newcommand{\BA}{\boldsymbol{A}} \newcommand{\BB}{\boldsymbol{B}} \newcommand{\BC}{\boldsymbol{C}} \newcommand{\BD}{\boldsymbol{D}} \newcommand{\BE}{\boldsymbol{E}} \newcommand{\BF}{\boldsymbol{F}} \newcommand{\BG}{\boldsymbol{G}} \newcommand{\BH}{\boldsymbol{H}} \newcommand{\BI}{\boldsymbol{I}} \newcommand{\BJ}{\boldsymbol{J}} \newcommand{\BK}{\boldsymbol{K}} \newcommand{\BL}{\boldsymbol{L}} \newcommand{\BM}{\boldsymbol{M}} \newcommand{\BN}{\boldsymbol{N}} \newcommand{\BO}{\boldsymbol{O}} \newcommand{\BP}{\boldsymbol{P}} \newcommand{\BQ}{\boldsymbol{Q}} \newcommand{\BR}{\boldsymbol{R}} \newcommand{\BS}{\boldsymbol{S}} \newcommand{\BT}{\boldsymbol{T}} \newcommand{\BU}{\boldsymbol{U}} \newcommand{\BV}{\boldsymbol{V}} \newcommand{\BW}{\boldsymbol{W}} \newcommand{\BX}{\boldsymbol{X}} \newcommand{\BY}{\boldsymbol{Y}} \newcommand{\BZ}{\boldsymbol{Z}} \newcommand{\Ba}{\boldsymbol{a}} \newcommand{\Bb}{\boldsymbol{b}} \newcommand{\Bc}{\boldsymbol{c}} \newcommand{\Bd}{\boldsymbol{d}} \newcommand{\Be}{\boldsymbol{e}} \newcommand{\Bf}{\boldsymbol{f}} \newcommand{\Bg}{\boldsymbol{g}} \newcommand{\Bh}{\boldsymbol{h}} \newcommand{\Bi}{\boldsymbol{i}} \newcommand{\Bj}{\boldsymbol{j}} \newcommand{\Bk}{\boldsymbol{k}} \newcommand{\Bl}{\boldsymbol{l}} \newcommand{\Bm}{\boldsymbol{m}} \newcommand{\Bn}{\boldsymbol{n}} \newcommand{\Bo}{\boldsymbol{o}} \newcommand{\Bp}{\boldsymbol{p}} \newcommand{\Bq}{\boldsymbol{q}} \newcommand{\Br}{\boldsymbol{r}} \newcommand{\Bs}{\boldsymbol{s}} \newcommand{\Bt}{\boldsymbol{t}} \newcommand{\Bu}{\boldsymbol{u}} \newcommand{\Bv}{\boldsymbol{v}} \newcommand{\Bw}{\boldsymbol{w}} \newcommand{\Bx}{\boldsymbol{x}} \newcommand{\By}{\boldsymbol{y}} \newcommand{\Bz}{\boldsymbol{z}} \newcommand{\Balpha}{\boldsymbol{\alpha}} \newcommand{\Bbeta}{\boldsymbol{\beta}} \newcommand{\Bgamma}{\boldsymbol{\gamma}} \newcommand{\Bdelta}{\boldsymbol{\delta}} \newcommand{\Beps}{\boldsymbol{\eps}} \newcommand{\Bveps}{\boldsymbol{\veps}} \newcommand{\Bzeta}{\boldsymbol{\zeta}} \newcommand{\Beta}{\boldsymbol{\eta}} \newcommand{\Btheta}{\boldsymbol{\theta}} \newcommand{\Bth}{\boldsymbol{\th}} \newcommand{\Biota}{\boldsymbol{\iota}} \newcommand{\Bkappa}{\boldsymbol{\kappa}} \newcommand{\Blambda}{\boldsymbol{\lambda}} \newcommand{\Bmu}{\boldsymbol{\mu}} \newcommand{\Bnu}{\boldsymbol{\nu}} \newcommand{\Bxi}{\boldsymbol{\xi}} \newcommand{\Bpi}{\boldsymbol{\pi}} \newcommand{\Bvpi}{\boldsymbol{\vpi}} \newcommand{\Brho}{\boldsymbol{\rho}} \newcommand{\Bsigma}{\boldsymbol{\sigma}} \newcommand{\Btau}{\boldsymbol{\tau}} \newcommand{\Bupsilon}{\boldsymbol{\upsilon}} \newcommand{\Bphi}{\boldsymbol{\phi}} \newcommand{\Bfi}{\boldsymbol{\fi}} \newcommand{\Bchi}{\boldsymbol{\chi}} \newcommand{\Bpsi}{\boldsymbol{\psi}} \newcommand{\Bom}{\boldsymbol{\om}} % .. calligraphic \newcommand{\CA}{\mathcal{A}} \newcommand{\CB}{\mathcal{B}} \newcommand{\CC}{\mathcal{C}} \newcommand{\CD}{\mathcal{D}} \newcommand{\CE}{\mathcal{E}} \newcommand{\CF}{\mathcal{F}} \newcommand{\CG}{\mathcal{G}} \newcommand{\CH}{\mathcal{H}} \newcommand{\CI}{\mathcal{I}} \newcommand{\CJ}{\mathcal{J}} \newcommand{\CK}{\mathcal{K}} \newcommand{\CL}{\mathcal{L}} \newcommand{\CM}{\mathcal{M}} \newcommand{\CN}{\mathcal{N}} \newcommand{\CO}{\mathcal{O}} \newcommand{\CP}{\mathcal{P}} \newcommand{\CQ}{\mathcal{Q}} \newcommand{\CR}{\mathcal{R}} \newcommand{\CS}{\mathcal{S}} \newcommand{\CT}{\mathcal{T}} \newcommand{\CU}{\mathcal{U}} \newcommand{\CV}{\mathcal{V}} \newcommand{\CW}{\mathcal{W}} \newcommand{\CX}{\mathcal{X}} \newcommand{\CY}{\mathcal{Y}} \newcommand{\CZ}{\mathcal{Z}} % .. typewriter \newcommand{\TA}{\mathtt{A}} \newcommand{\TB}{\mathtt{B}} \newcommand{\TC}{\mathtt{C}} \newcommand{\TD}{\mathtt{D}} \newcommand{\TE}{\mathtt{E}} \newcommand{\TF}{\mathtt{F}} \newcommand{\TG}{\mathtt{G}} \renewcommand{\TH}{\mathtt{H}} \newcommand{\TI}{\mathtt{I}} \newcommand{\TJ}{\mathtt{J}} \newcommand{\TK}{\mathtt{K}} \newcommand{\TL}{\mathtt{L}} \newcommand{\TM}{\mathtt{M}} \newcommand{\TN}{\mathtt{N}} \newcommand{\TO}{\mathtt{O}} \newcommand{\TP}{\mathtt{P}} \newcommand{\TQ}{\mathtt{Q}} \newcommand{\TR}{\mathtt{R}} \newcommand{\TS}{\mathtt{S}} \newcommand{\TT}{\mathtt{T}} \newcommand{\TU}{\mathtt{U}} \newcommand{\TV}{\mathtt{V}} \newcommand{\TW}{\mathtt{W}} \newcommand{\TX}{\mathtt{X}} \newcommand{\TY}{\mathtt{Y}} \newcommand{\TZ}{\mathtt{Z}} % % LEVELS OF CLOSENESS (basically deprecated) \newcommand{\scirc}[1]{\sr{\circ}{#1}} \newcommand{\sdot}[1]{\sr{.}{#1}} \newcommand{\slog}[1]{\sr{\log}{#1}} \newcommand{\createClosenessLevels}[7]{ \newcommand{#2}{\mathrel{(#1)}} \newcommand{#3}{\mathrel{#1}} \newcommand{#4}{\mathrel{#1\!\!#1}} \newcommand{#5}{\mathrel{#1\!\!#1\!\!#1}} \newcommand{#6}{\mathrel{(\sdot{#1})}} \newcommand{#7}{\mathrel{(\slog{#1})}} } \let\lt\undefined \let\gt\undefined % .. vanilla versions (is it within a constant?) \newcommand{\ez}{\scirc=} \newcommand{\eq}{\simeq} \newcommand{\eqq}{\mathrel{\eq\!\!\eq}} \newcommand{\eqqq}{\mathrel{\eq\!\!\eq\!\!\eq}} \newcommand{\lez}{\scirc\le} \renewcommand{\lq}{\preceq} \newcommand{\lqq}{\mathrel{\lq\!\!\lq}} \newcommand{\lqqq}{\mathrel{\lq\!\!\lq\!\!\lq}} \newcommand{\gez}{\scirc\ge} \newcommand{\gq}{\succeq} \newcommand{\gqq}{\mathrel{\gq\!\!\gq}} \newcommand{\gqqq}{\mathrel{\gq\!\!\gq\!\!\gq}} \newcommand{\lz}{\scirc<} \newcommand{\lt}{\prec} \newcommand{\ltt}{\mathrel{\lt\!\!\lt}} \newcommand{\lttt}{\mathrel{\lt\!\!\lt\!\!\lt}} \newcommand{\gz}{\scirc>} \newcommand{\gt}{\succ} \newcommand{\gtt}{\mathrel{\gt\!\!\gt}} \newcommand{\gttt}{\mathrel{\gt\!\!\gt\!\!\gt}} % .. dotted versions (is it equal in the limit?) \newcommand{\ed}{\sdot=} \newcommand{\eqd}{\sdot\eq} \newcommand{\eqqd}{\sdot\eqq} \newcommand{\eqqqd}{\sdot\eqqq} \newcommand{\led}{\sdot\le} \newcommand{\lqd}{\sdot\lq} \newcommand{\lqqd}{\sdot\lqq} \newcommand{\lqqqd}{\sdot\lqqq} \newcommand{\ged}{\sdot\ge} \newcommand{\gqd}{\sdot\gq} \newcommand{\gqqd}{\sdot\gqq} \newcommand{\gqqqd}{\sdot\gqqq} \newcommand{\ld}{\sdot<} \newcommand{\ltd}{\sdot\lt} \newcommand{\lttd}{\sdot\ltt} \newcommand{\ltttd}{\sdot\lttt} \newcommand{\gd}{\sdot>} \newcommand{\gtd}{\sdot\gt} \newcommand{\gttd}{\sdot\gtt} \newcommand{\gtttd}{\sdot\gttt} % .. log versions (is it equal up to log?) \newcommand{\elog}{\slog=} \newcommand{\eqlog}{\slog\eq} \newcommand{\eqqlog}{\slog\eqq} \newcommand{\eqqqlog}{\slog\eqqq} \newcommand{\lelog}{\slog\le} \newcommand{\lqlog}{\slog\lq} \newcommand{\lqqlog}{\slog\lqq} \newcommand{\lqqqlog}{\slog\lqqq} \newcommand{\gelog}{\slog\ge} \newcommand{\gqlog}{\slog\gq} \newcommand{\gqqlog}{\slog\gqq} \newcommand{\gqqqlog}{\slog\gqqq} \newcommand{\llog}{\slog<} \newcommand{\ltlog}{\slog\lt} \newcommand{\lttlog}{\slog\ltt} \newcommand{\ltttlog}{\slog\lttt} \newcommand{\glog}{\slog>} \newcommand{\gtlog}{\slog\gt} \newcommand{\gttlog}{\slog\gtt} \newcommand{\gtttlog}{\slog\gttt}$

Adapted from the introduction of “Improved pseudorandom generators from pseudorandom multi-switching lemmas” by Rocco Servedio and Li-Yang Tan, and from the talk “Switching lemmas in the 80ies and today” by Johan Håstad at IRIF.

Correlation bounds from the usual switching lemma suck

As shown in Corollaries of switching lemmas, switching lemmas for a class of functions automatically give correlation bounds with parities. Suppose you have the following switching lemma:

Let $(J|z)$ be a $p$-random restriction, then $\Pr[\DT(f_{J|z}) \geq t] \leq (p \lambda)^t$.

Then you automatically get $|\inner{f, \Par_n}| \leq 2^{-\Omega(n/\lambda)}$.

The failure probability of the switching lemma directly contributes to the correlation bound. So if you want correlation $\le \eps$, and $f$ needs several random restrictions before it becomes a small depth decision tree, then at the very least, the error needs to be $\le \eps$ at each step.

This is a big problem for $\AC^0$ circuits. Say you want to prove correlation bounds for a circuit of size $S$ and depth $d$, by repeatedly applying Switching lemma. Every time you apply it, if you want the failure probability to total to only $\le \eps$,¹ and you’re applying it to up to $S$ DNFs/CNFs. The failure probability is exponentially small in the decision tree depth $t$, so you need to set $t \ce \log (S/\eps)$. But this means that at the next depth reduction step, you’ll have to deal with a bottom fan-in of $\log(S/\eps)$, so you’ll have to set $p \ce \Theta(1/\log(S/\eps))$. Crucially, $p$ now depends on $\eps$, not just on size/depth of the circuit, and this will be directly reflected in the correlation you can hope to achieve.

Concretely, you’ll use

one restriction with $p\ce 1/2$ to reduce the bottom fan-in to $\log(S/\eps)$;
$d-1$ restrictions with $p \ce \Theta(1/\log(S/\eps))$ to reduce the depth $d-2$ times then finally collapse the remaining DNF/CNF to a decision tree;

and by the end, the remaining number of unfixed variables would be roughly $n/\log(S/\eps)^{d-1}$, so the failure probability you get at the end is roughly $2^{n/\log(S/\eps)^{d-1}}$.

This means that, even in the best case² where $S$ is pretty small (in fact, as long as $S \leq 2^{n^{1/d}}$), the correlation bound $\eps$ we get satisfies

\[\eps \geq 2^{n/\log (S/\eps)^{d-1}} \geq 2^{n/\log(1/\eps)^{d-1}} \Rightarrow \eps \approx 2^{-n^{1/d}}.\]

Effectively, every restriction we did that used a star probability $p \le \log(1/\eps)$ made the exponent of $n$ in the correlation bound decrease: $1 \rightarrow 1/2 \rightarrow 1/3 \rightarrow \cdots \rightarrow 1/d$.

$\AC^0$ circuits aren’t that great at correlating with parity

If you actually try to build $\AC^0$ circuits that correlate with parity, though, you won’t be able to get nearly as good as $2^{n^{1/d}}$, as long as the size is much less than $2^{n^{1/d}}$. Below, we show that with size (roughly) $S$ and depth $d$, you can get correlation $2^{-n/(\log S)^{d-1}}$ (we will show in the next section that this is optimal).

We’ll split the $n$ variables into $n/(\log S)^{d-1}$ groups of $(\log S)^{d-1}$, compute the parity of each group exactly, and OR them together. This returns true almost all the time, and returns false only when each of the groups have parity $0$, which happens a $2^{n/(\log S)^{d-1}}$ fraction of the time. But when it does, the overall parity is indeed $0$, so it’s correct! Therefore, its correlation with parity is exactly $2^{n/(\log S)^{d-1}}$.

Let’s see how to do this with a depth-$d$ circuit of size roughly $S$.

First, for each group of $(\log S)^{d-1}$ variables, we can compute its parity in this way:

first express that parity using a depth-$(d-1)$ tree of parity gates, each of fanin $\log S$;
then replace each of those parity gates by a CNF/DNF of size $S$ in an alternating way.

The resulting circuit has depth $d-1$ after collapsing OR and AND gates together, and has total size $\leq S (\log S)^{d-1}$. Also, you can make sure that its top gate is an OR.

Now, ORing the circuits corresponding to each group together, we get total size

\[n/(\log S)^{d-1} \cdot S(\log S)^{d-1} = nS,\]

which is basically $S$ for all intents and purposes.

The solution: multi-switching lemma

Intuition

Okay, so the loss is due to the dependence on $\eps$ in the fanin: if you could somehow get the worst fanin to be under $\log S$, then you could keep the star probability at $p \ce \Theta(1/\log S)$, and by the end, the number of unfixed variables would be about $n/(\log S)^{d-1}$, which means you would get correlation $\eps \approx 2^{-n/(\log S)^{d-1}}$, perfectly matching the correlation obtained by the circuit we just saw.

And really, this dependence on $\eps$ in the fanin is due to freak accidents. If we imagine that the $S$ DNFs (or CNFs) we’re applying the switching lemma are on disjoint sets of variables and therefore their decision trees are independent random variables,³ then the typical case is the case where all of them end up with decision tree depth $t \leq \log S$ (with a handful being exactly at $t= \log S$, twice more getting $t=\log S - 1$, four times more getting $t=\log S - 2$, etc.). When the biggest decision tree is a significantly above $\log S$ (say $\log S/\eps$), this is usually because there was one single very freakish DNF that ended up in a very outlier case, not just because “there are $S$ DNFs, so shit happens”.

And because of that, it feels like we should be able to handle this single DNF separately, by adding the first few layers of its decision tree to a global “outliers” decision tree that we go through before computing the decision trees of the individual DNFs. That is, there should be a common tree $T$ of depth $\leq \log(S/\eps)$ such each of the $S$ DNFs can be represented as a decision tree consisting of $T$ at the top, followed by just $\leq \log S$ additional layers added to the bottom of $T$ (really, the freakish outlier would just need that top part, and the normal DNFs would just need those bottom parts).

And the opposite scenario, where the DNFs share a lot of variables, doesn’t seem very worrying either, because the more variables they share, the easier it is to build a common decision tree $T$ that’s useful for all of them simultaneously.

This would help us because then we can just stick that common tree $T$ on top of the whole $\AC^0$ circuit we’re trying to collapse. For example, after the first restriction,⁴ you’d end up with a common tree $T$ of depth $\log(S/\eps)$, and at every leaf of $T$ you’d get a depth-$d$ circuit with bottom fan-in $\leq \log S$. This blows up the total size a little bit, by $2^{\log (S/\eps)}$ (the number of leaves of $T$), but the big upside is that for the following restriction, you can set the star probability to $p \ce \Theta(1/\log S)$ instead of $\Theta(1/\log(S/\eps))$, so you get to keep more variables alive. By the end, the only thing you lost is that you have a big decision tree of depth $O(\log(S/\eps))^d$ on top, but that’s much less than the number of remaining variables, so it doesn’t have a significant impact on the bounds.

Statement

Formally, for some collection $\CF = \set{f_1, \ldots, f_S}$ of functions, we say that that $\CF$ has an $l$-partial decision tree of depth $t$ if there exists some common decision tree $T$ of depth $\leq t$ such that each $f_j$ can be represented as a decision tree consisting of $T$ at the top, followed by some “specialized” decision trees of depth $\leq l$ at each of $T$’s leaves. Let $\PDT_l(\CF)$ be the smallest possible depth of such a common tree $T$.

TODO: #figure

The multi-switching lemma says that if $\CF$ is a family of $k$-DNFs, then under a random restriction $\rho$ of star probability $p$,

\[ \Pr[\PDT_l(\rr\CF\rho) \geq t] \leq S^{\ceil{t/l}}\cdot O(pk)^t. \]

The $O(pk)^t$ part is the usual probability we get from Håstad’s switching lemma: and indeed, by fixing $S \ce 1$ and $l\ce 0$, we recover it exactly (with no common tree). The “overhead” $S^{\ceil{t/l}}$ intuitively comes from a union bound over the $\leq \ceil{t/l}$ DNFs in $\CF$ that we will handle using the common tree. Since we only need to use the common tree when the DT depth of some DNF exceeds $l$, assuming the common tree has depth roughly $t$, it will be induced by $\leq t/l$ DNFs.

In our case, we want each DNF to require only $O(\log S)$ additional layers. So we can set $l \ce O(\log S)$, getting just

\[ \Pr[\PDT_{O(\log S)}(\rr\CF\rho) \geq t] \leq O(pk)^t. \]

That gives us exactly what we hoped for: setting $p \ce \Theta(1/k)$ and $t = \Theta(\log(S/\eps))$ tells us that except with probability $\eps$, $\CF$ has a common $O(\log S)$-partial decision tree of depth $O(\log (S/\eps))$. So it implies that TODO: explain the bounds you actually get.

Proof of the multi-switching lemma

This proof builds on Razborov’s proof of Håstad’s original switching lemma in a pretty straightforward way. The presentation here is based on Servedio and Tan’s⁵ (and probably does a much worse job than they did). I was tragically unable to understand the conditioning in Håstad’s version of the proof.⁶

Here’s one possible way you could build an $l$-partial decision tree for $\rr\CF\rho$: Take the first DNF $f_j$ such that $\DT(\rr{f_j}\rho)>l$, let $\tau$ be some path of maximal length in the canonical decision tree for $\rr{f_j}\rho$, and query all the associated variables in the common tree; then recurse on every subtree, until all the DNFs in $\CF$ have depth $\leq l$.

The proof basically works by assuming this construction gives an $l$-partial tree of depth $\geq t$, and using that fact to uniquely encode $\rho$ into a restriction $\rho'$ with more fixed variables (modulo some auxiliary information).

Encoding

For some $t' \geq t$, the encoding procedure will produce:

a restriction $\rho'$, which is a strengthening of $\rho$ with $t'$ additional stars fixed;
a string $j^* \in [S]^{\ceil{t/l}}$, indicating the indices of the successive DNFs in which long paths were found;
a string $\tau^* \in \zo^{t'}$, containing said long paths;
a string $a^* \in (\zo \times [w])^{t'}$ containing the auxiliary information from the original switching lemma;
a string $\pi^* \in \zo^{t'}$, containing the paths we should pick (instead of $\tau^*$) in order to maximize the remaining $l$-partial depth of $\CF$.

Concretely, it will recursively do the following. Initially, $\rho' \ce \rho$, and $j^*, \tau^*, p^*, \pi^*$ are all empty. Then:

Let $f_j$ be the first DNF in $\CF$ such that $\DT(\rr{f_j}{\rho}) > l$. Let $\tau \in \zo^{l'}$ be some path of maximal length in the canonical decision tree for $\rr{f_j}{\rho}$, and let $V \sse [n]$ be the corresponding set of variables. Razborov’s proof of the switching lemma tells us that there is some assignment $\sigma \in \zo^V$ (the “angel”) such that if you know

restriction $\rho\sigma$, or some strengthening of it, and
path $\tau$ (the “devil”) and auxiliary string $a \in (\zo \times [w])^{l'}$ (the clue), both potentially with some characters added at the end,

then you can recover $V$.

Now, we know that $\PDT_l(\rr\CF\rho) \geq t$, so that means that there must be some assignment $\pi \in \zo^V$ such that $\PDT_l(\rr\CF{\rho\pi}) \geq t - l'$. So we can add $j$ to $j^*$, $\tau$ to $\tau^*$, $a$ to $a^*$ and $\pi$ to $\pi^*$, then recurse on $\rr\CF{\rho\pi}$. Do this until the $l$-partial depth falls to zero or after $\ceil{t/l}$ iterations, whichever comes sooner (this will fix $t' \geq \min(t, \ceil{t/l}(l+1)) \geq t$ stars in either case). Let restriction $\rho'$ be $\rho$ completed with all of the “angel” assignments $\sigma$, and pad $j^*$ to length $\ceil{t/l}$ with arbitrary values if necessary.

Decoding

The decoding goes through every $j$ in $j^*$ one by one, finds the variables $V$ that were set at that step using $(\rho',\tau^*,a^*)$, replaces $\rho'_V$ by the appropriate section of $\pi^*$, drops the first $l' \ce |V|$ characters of the strings $\tau^*$, $a^*$, $\pi^*$, and iterates with the new value of $\rho'$ until it’s found the $t'$ variables. Then, letting $V^*$ be the union of all sets $V$, it sets all of $\rho'_{V^*}$ to stars, recovering $\rho$.

Counting

Let’s union bound over the number $t'$ of stars fixed. Each restriction $\rho$ such that $\PDT_l(\rr\CF\rho) \geq t$ and this specific value of $t'$ got uniquely mapped to a restriction $\rho'$ whose probability is $\p{\frac{1+p}{2p}}^{t'} \geq (1/2p)^{t'}$ as big, using auxiliary information that can take

\[ S^{\ceil{t/l}}\cdot 2^{t'} \cdot (2w)^{t'} \cdot 2^{t'} = S^{\ceil{t/l}} \cdot (8w)^{t'} \]

different values, so their total probability weight can have been at most

\[S^{\ceil{t/l}} \cdot (8w)^{t'} \cdot (2p)^{t'} = S^{\ceil{t/l}}\cdot (16pw)^{t'}.\]

So overall,

\[ \Pr_\rho[\PDT_l(\rr\CF\rho) \geq t] \leq \sum_{t'=t}^\infty S^{\ceil{t/l}}\cdot (16pw)^{t'} \leq S^{\ceil{t/l}}\cdot O(pw)^t. \]

or $\eps/d$ technically, but this factor $d$ is not going to matter too much ↩
I think Håstad was saying he can get $2^{-\Omega(n^{1/(d-1)})}$ from the usual switching lemma, but I’m not sure how. Maybe he’s able to skip the first step by doing a restriction based on size rather than width? Or maybe he’s able to skip the last step and think directly about the remaining CNF/DNF? ↩
Which is by the way exactly what the union bound is doing: when individual failure probabilities are small, the overall probability given by the union bound is very close to what it would be if failures were independent. ↩
Here, the first restriction is seen as applying to a family of $1$-DNFs corresponding to the ORs at the very bottom (or $1$-CNFs if the bottom layer is made of ANDs). ↩
Rocco Servedio, Li-Yang Tan, “Improved pseudorandom generators from pseudorandom multi-switching lemmas”, Appendix A. ↩
Johan Håstad, “On the correlation of parity and small-depth circuits”. ↩