$\require{mathtools} % %%% GENERIC MATH %%% % % Environments \newcommand{\al}[1]{\begin{align}#1\end{align}} % need this for \tag{} to work \renewcommand{\r}{\mathrm} % % Greek \newcommand{\eps}{\epsilon} \newcommand{\veps}{\varepsilon} \let\fi\phi % because it looks like an f \let\phi\varphi % because it looks like a p % % Miscellaneous shortcuts % .. over and under \newcommand{\ss}[1]{_{\substack{#1}}} \newcommand{\ob}{\overbrace} \newcommand{\ub}{\underbrace} \newcommand{\ol}{\overline} \newcommand{\tld}{\widetilde} \newcommand{\HAT}{\widehat} \newcommand{\f}{\frac} \newcommand{\s}[2]{#1 /\mathopen{}#2} \newcommand{\rt}{\sqrt} % .. relations \newcommand{\sr}{\stackrel} \newcommand{\sse}{\subseteq} \newcommand{\ce}{\coloneqq} \newcommand{\ec}{\eqqcolon} \newcommand{\ap}{\approx} \newcommand{\ls}{\lesssim} \newcommand{\gs}{\greatersim} % .. miscer \newcommand{\q}{\quad} \newcommand{\qq}{\qquad} \newcommand{\heart}{\heartsuit} % % Delimiters % (I needed to create my own because the MathJax version of \DeclarePairedDelimiter doesn't have \mathopen{} and that messes up the spacing) % .. one-part \newcommand{\p}[1]{\mathopen{}\left( #1 \right)} \newcommand{\b}[1]{\mathopen{}\left[ #1 \right]} \newcommand{\set}[1]{\mathopen{}\left\{ #1 \right\}} \newcommand{\abs}[1]{\mathopen{}\left\lvert #1 \right\rvert} \newcommand{\floor}[1]{\mathopen{}\left\lfloor #1 \right\rfloor} \newcommand{\ceil}[1]{\mathopen{}\left\lceil #1 \right\rceil} \newcommand{\inner}[1]{\mathopen{}\left\langle #1 \right\rangle} % .... (use phantom to force at least the standard height of double bars) \newcommand{\norm}[1]{\mathopen{}\left\lVert #1 \vphantom{f} \right\rVert} \newcommand{\frob}[1]{\norm{#1}_\mathrm{F}} %% .. two-part \newcommand{\incond}[2]{#1 \mathop{}\middle|\mathop{} #2} \newcommand{\cond}[2]{ {\left.\incond{#1}{#2}\right.}} \newcommand{\pco}[2]{\p{\incond{#1}{#2}}} \newcommand{\bco}[2]{\b{\incond{#1}{#2}}} \newcommand{\setco}[2]{\set{\incond{#1}{#2}}} \newcommand{\at}[2]{\left.#1\right|_{#2}} % ..... (use phantom to force at least the standard height of double bar) \newcommand{\oldpara}[2]{#1\vphantom{f} \mathop{}\middle\|\mathop{} #2} %\newcommand{\para}[2]{#1\vphantom{f} \mathop{}\middle\|\mathop{} #2} \newcommand{\para}[2]{\mathchoice{\begin{matrix}#1\\\hdashline#2\end{matrix}}{\begin{smallmatrix}#1\\\hdashline#2\end{smallmatrix}}{\begin{smallmatrix}#1\\\hdashline#2\end{smallmatrix}}{\begin{smallmatrix}#1\\\hdashline#2\end{smallmatrix}}} \newcommand{\ppa}[2]{\p{\para{#1}{#2}}} \newcommand{\bpa}[2]{\b{\para{#1}{#2}}} %\newcommand{\bpaco}[4]{\bpa{\incond{#1}{#2}}{\incond{#3}{#4}}} \newcommand{\bpaco}[4]{\bpa{\cond{#1}{#2}}{\cond{#3}{#4}}} % % Levels of closeness \newcommand{\scirc}[1]{\sr{\circ}{#1}} \newcommand{\sdot}[1]{\sr{.}{#1}} \newcommand{\slog}[1]{\sr{\log}{#1}} \newcommand{\createClosenessLevels}[7]{ \newcommand{#2}{\mathrel{(#1)}} \newcommand{#3}{\mathrel{#1}} \newcommand{#4}{\mathrel{#1\!\!#1}} \newcommand{#5}{\mathrel{#1\!\!#1\!\!#1}} \newcommand{#6}{\mathrel{(\sdot{#1})}} \newcommand{#7}{\mathrel{(\slog{#1})}} } \let\lt\undefined \let\gt\undefined % .. vanilla versions (is it within a constant?) \newcommand{\ez}{\scirc=} \newcommand{\eq}{\simeq} \newcommand{\eqq}{\mathrel{\eq\!\!\eq}} \newcommand{\eqqq}{\mathrel{\eq\!\!\eq\!\!\eq}} \newcommand{\lez}{\scirc\le} \newcommand{\lq}{\preceq} \newcommand{\lqq}{\mathrel{\lq\!\!\lq}} \newcommand{\lqqq}{\mathrel{\lq\!\!\lq\!\!\lq}} \newcommand{\gez}{\scirc\ge} \newcommand{\gq}{\succeq} \newcommand{\gqq}{\mathrel{\gq\!\!\gq}} \newcommand{\gqqq}{\mathrel{\gq\!\!\gq\!\!\gq}} \newcommand{\lz}{\scirc<} \newcommand{\lt}{\prec} \newcommand{\ltt}{\mathrel{\lt\!\!\lt}} \newcommand{\lttt}{\mathrel{\lt\!\!\lt\!\!\lt}} \newcommand{\gz}{\scirc>} \newcommand{\gt}{\succ} \newcommand{\gtt}{\mathrel{\gt\!\!\gt}} \newcommand{\gttt}{\mathrel{\gt\!\!\gt\!\!\gt}} % .. dotted versions (is it equal in the limit?) \newcommand{\ed}{\sdot=} \newcommand{\eqd}{\sdot\eq} \newcommand{\eqqd}{\sdot\eqq} \newcommand{\eqqqd}{\sdot\eqqq} \newcommand{\led}{\sdot\le} \newcommand{\lqd}{\sdot\lq} \newcommand{\lqqd}{\sdot\lqq} \newcommand{\lqqqd}{\sdot\lqqq} \newcommand{\ged}{\sdot\ge} \newcommand{\gqd}{\sdot\gq} \newcommand{\gqqd}{\sdot\gqq} \newcommand{\gqqqd}{\sdot\gqqq} \newcommand{\ld}{\sdot<} \newcommand{\ltd}{\sdot\lt} \newcommand{\lttd}{\sdot\ltt} \newcommand{\ltttd}{\sdot\lttt} \newcommand{\gd}{\sdot>} \newcommand{\gtd}{\sdot\gt} \newcommand{\gttd}{\sdot\gtt} \newcommand{\gtttd}{\sdot\gttt} % .. log versions (is it equal up to log?) \newcommand{\elog}{\slog=} \newcommand{\eqlog}{\slog\eq} \newcommand{\eqqlog}{\slog\eqq} \newcommand{\eqqqlog}{\slog\eqqq} \newcommand{\lelog}{\slog\le} \newcommand{\lqlog}{\slog\lq} \newcommand{\lqqlog}{\slog\lqq} \newcommand{\lqqqlog}{\slog\lqqq} \newcommand{\gelog}{\slog\ge} \newcommand{\gqlog}{\slog\gq} \newcommand{\gqqlog}{\slog\gqq} \newcommand{\gqqqlog}{\slog\gqqq} \newcommand{\llog}{\slog<} \newcommand{\ltlog}{\slog\lt} \newcommand{\lttlog}{\slog\ltt} \newcommand{\ltttlog}{\slog\lttt} \newcommand{\glog}{\slog>} \newcommand{\gtlog}{\slog\gt} \newcommand{\gttlog}{\slog\gtt} \newcommand{\gtttlog}{\slog\gttt} % % Miscellaneous \newcommand{\LHS}{\mathrm{LHS}} \newcommand{\RHS}{\mathrm{RHS}} % .. operators \DeclareMathOperator{\poly}{poly} \DeclareMathOperator{\polylog}{polylog} \DeclareMathOperator{\quasipoly}{quasipoly} \DeclareMathOperator{\negl}{negl} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator*{\argmax}{arg\,max} % .. functions \DeclareMathOperator{\id}{id} \DeclareMathOperator{\sign}{sign} \DeclareMathOperator{\err}{err} \DeclareMathOperator{\ReLU}{ReLU} % .. analysis \let\d\undefined \newcommand{\d}{\operatorname{d}\mathopen{}} \newcommand{\df}[2]{\f{\d #1}{\d #2}} \newcommand{\ds}[2]{\s{\d #1}{\d #2}} \newcommand{\part}{\partial} \newcommand{\partf}[2]{\f{\part #1}{\part #2}} \newcommand{\parts}[2]{\s{\part #1}{\part #2}} \newcommand{\grad}[1]{\mathop{\nabla\!_{#1}}} % .. sets \newcommand{\N}{\mathbb{N}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\R}{\mathbb{R}} \newcommand{\C}{\mathbb{C}} \newcommand{\F}{\mathbb{F}} \newcommand{\zo}{\set{0,1}} \newcommand{\pmo}{\set{\pm 1}} % %%% SPECIALIZED MATH %%% % % Logic \renewcommand{\and}{\wedge} \newcommand{\AND}{\bigwedge} \newcommand{\or}{\vee} \newcommand{\OR}{\bigvee} \newcommand{\xor}{\oplus} \newcommand{\XOR}{\bigoplus} \newcommand{\union}{\cup} \newcommand{\inter}{\cap} \newcommand{\UNION}{\bigcup} \newcommand{\INTER}{\bigcap} \newcommand{\comp}{\overline} \newcommand{\true}{\r{true}} \newcommand{\false}{\r{false}} \newcommand{\tf}{\set{\true,\false}} \DeclareMathOperator{\One}{\mathbb{1}} \DeclareMathOperator{\1}{\mathbb{1}} % % Linear algebra \renewcommand{\span}{\mathrm{span}} \DeclareMathOperator{\rank}{rank} \DeclareMathOperator{\proj}{proj} \DeclareMathOperator{\dom}{dom} \DeclareMathOperator{\Img}{Im} \newcommand{\transp}{\mathsf{T}} \renewcommand{\t}{^\transp} % ... named tensors \newcommand{\namedtensorstrut}{\vphantom{fg}} % milder than \mathstrut \newcommand{\name}[1]{\mathsf{\namedtensorstrut #1}} \newcommand{\nbin}[2]{\mathbin{\underset{\substack{#1}}{\namedtensorstrut #2}}} \newcommand{\ndot}[1]{\nbin{#1}{\odot}} \newcommand{\ncat}[1]{\nbin{#1}{\oplus}} \newcommand{\nsum}[1]{\sum\limits_{\substack{#1}}} \newcommand{\nfun}[2]{\mathop{\underset{\substack{#1}}{\namedtensorstrut\mathrm{#2}}}} \newcommand{\ndef}[2]{\newcommand{#1}{\name{#2}}} \newcommand{\nt}[1]{^{\transp(#1)}} % % Probability \newcommand{\Normal}{\mathcal{N}} \let\Pr\undefined \DeclareMathOperator*{\Pr}{Pr} \DeclareMathOperator*{\G}{\mathbb{G}} \DeclareMathOperator*{\Odds}{Od} \DeclareMathOperator*{\E}{E} \DeclareMathOperator*{\Var}{Var} \DeclareMathOperator*{\Cov}{Cov} \DeclareMathOperator*{\corr}{corr} \DeclareMathOperator*{\median}{median} \newcommand{\dTV}{d_{\mathrm{TV}}} \newcommand{\dHel}{d_{\mathrm{Hel}}} \newcommand{\dJS}{d_{\mathrm{JS}}} % ... information theory \let\H\undefined \DeclareMathOperator*{\H}{H} \DeclareMathOperator*{\I}{I} \DeclareMathOperator*{\D}{D} % %%% SPECIALIZED COMPUTER SCIENCE %%% % % Complexity classes % .. classical \newcommand{\Poly}{\mathsf{P}} \newcommand{\NP}{\mathsf{NP}} \newcommand{\PH}{\mathsf{PH}} \newcommand{\PSPACE}{\mathsf{PSPACE}} \renewcommand{\L}{\mathsf{L}} % .. probabilistic \newcommand{\formost}{\mathsf{Я}} \newcommand{\RP}{\mathsf{RP}} \newcommand{\BPP}{\mathsf{BPP}} \newcommand{\MA}{\mathsf{MA}} \newcommand{\AM}{\mathsf{AM}} \newcommand{\IP}{\mathsf{IP}} \newcommand{\RL}{\mathsf{RL}} % .. circuits \newcommand{\NC}{\mathsf{NC}} \newcommand{\AC}{\mathsf{AC}} \newcommand{\ACC}{\mathsf{ACC}} \newcommand{\TC}{\mathsf{TC}} \newcommand{\Ppoly}{\mathsf{P}/\poly} \newcommand{\Lpoly}{\mathsf{L}/\poly} % .. resources \newcommand{\TIME}{\mathsf{TIME}} \newcommand{\SPACE}{\mathsf{SPACE}} \newcommand{\TISP}{\mathsf{TISP}} \newcommand{\SIZE}{\mathsf{SIZE}} % .. keywords \newcommand{\co}{\mathsf{co}} \newcommand{\Prom}{\mathsf{Promise}} % % Boolean analysis \newcommand{\harpoon}{\!\upharpoonright\!} \newcommand{\rr}[2]{#1\harpoon_{#2}} \newcommand{\Fou}[1]{\widehat{#1}} \DeclareMathOperator{\Ind}{\mathrm{Ind}} \DeclareMathOperator{\Inf}{\mathrm{Inf}} \DeclareMathOperator{\Der}{\mathrm{D}} \DeclareMathOperator{\Stab}{\mathrm{Stab}} \DeclareMathOperator{\T}{T} \DeclareMathOperator{\sens}{\mathrm{s}} \DeclareMathOperator{\bsens}{\mathrm{bs}} \DeclareMathOperator{\fbsens}{\mathrm{fbs}} \DeclareMathOperator{\Cert}{\mathrm{C}} \DeclareMathOperator{\DT}{\mathrm{DT}} \DeclareMathOperator{\CDT}{\mathrm{CDT}} % canonical \DeclareMathOperator{\ECDT}{\mathrm{ECDT}} \DeclareMathOperator{\CDTv}{\mathrm{CDT_{vars}}} \DeclareMathOperator{\ECDTv}{\mathrm{ECDT_{vars}}} \DeclareMathOperator{\CDTt}{\mathrm{CDT_{terms}}} \DeclareMathOperator{\ECDTt}{\mathrm{ECDT_{terms}}} \DeclareMathOperator{\CDTw}{\mathrm{CDT_{weighted}}} \DeclareMathOperator{\ECDTw}{\mathrm{ECDT_{weighted}}} \DeclareMathOperator{\AvgDT}{\mathrm{AvgDT}} \DeclareMathOperator{\PDT}{\mathrm{PDT}} % partial decision tree \DeclareMathOperator{\DTsize}{\mathrm{DT_{size}}} \DeclareMathOperator{\W}{\mathbf{W}} % .. functions (small caps sadly doesn't work) \DeclareMathOperator{\Par}{\mathrm{Par}} \DeclareMathOperator{\Maj}{\mathrm{Maj}} \DeclareMathOperator{\HW}{\mathrm{HW}} \DeclareMathOperator{\Th}{\mathrm{Th}} \DeclareMathOperator{\Tribes}{\mathrm{Tribes}} \DeclareMathOperator{\RotTribes}{\mathrm{RotTribes}} \DeclareMathOperator{\CycleRun}{\mathrm{CycleRun}} \DeclareMathOperator{\SAT}{\mathrm{SAT}} \DeclareMathOperator{\UniqueSAT}{\mathrm{UniqueSAT}} % % Dynamic optimality \newcommand{\OPT}{\mathsf{OPT}} \newcommand{\Alt}{\mathsf{Alt}} \newcommand{\Funnel}{\mathsf{Funnel}} % % Alignment \DeclareMathOperator{\Amp}{\mathrm{Amp}} % %%% TYPESETTING %%% % % In text \renewcommand{\th}{^{\mathrm{th}}} \newcommand{\degree}{^\circ} % % Fonts % .. bold \newcommand{\BA}{\boldsymbol{A}} \newcommand{\BB}{\boldsymbol{B}} \newcommand{\BC}{\boldsymbol{C}} \newcommand{\BD}{\boldsymbol{D}} \newcommand{\BE}{\boldsymbol{E}} \newcommand{\BF}{\boldsymbol{F}} \newcommand{\BG}{\boldsymbol{G}} \newcommand{\BH}{\boldsymbol{H}} \newcommand{\BI}{\boldsymbol{I}} \newcommand{\BJ}{\boldsymbol{J}} \newcommand{\BK}{\boldsymbol{K}} \newcommand{\BL}{\boldsymbol{L}} \newcommand{\BM}{\boldsymbol{M}} \newcommand{\BN}{\boldsymbol{N}} \newcommand{\BO}{\boldsymbol{O}} \newcommand{\BP}{\boldsymbol{P}} \newcommand{\BQ}{\boldsymbol{Q}} \newcommand{\BR}{\boldsymbol{R}} \newcommand{\BS}{\boldsymbol{S}} \newcommand{\BT}{\boldsymbol{T}} \newcommand{\BU}{\boldsymbol{U}} \newcommand{\BV}{\boldsymbol{V}} \newcommand{\BW}{\boldsymbol{W}} \newcommand{\BX}{\boldsymbol{X}} \newcommand{\BY}{\boldsymbol{Y}} \newcommand{\BZ}{\boldsymbol{Z}} \newcommand{\Ba}{\boldsymbol{a}} \newcommand{\Bb}{\boldsymbol{b}} \newcommand{\Bc}{\boldsymbol{c}} \newcommand{\Bd}{\boldsymbol{d}} \newcommand{\Be}{\boldsymbol{e}} \newcommand{\Bf}{\boldsymbol{f}} \newcommand{\Bg}{\boldsymbol{g}} \newcommand{\Bh}{\boldsymbol{h}} \newcommand{\Bi}{\boldsymbol{i}} \newcommand{\Bj}{\boldsymbol{j}} \newcommand{\Bk}{\boldsymbol{k}} \newcommand{\Bp}{\boldsymbol{p}} \newcommand{\Bq}{\boldsymbol{q}} \newcommand{\Br}{\boldsymbol{r}} \newcommand{\Bs}{\boldsymbol{s}} \newcommand{\Bt}{\boldsymbol{t}} \newcommand{\Bu}{\boldsymbol{u}} \newcommand{\Bv}{\boldsymbol{v}} \newcommand{\Bw}{\boldsymbol{w}} \newcommand{\Bx}{\boldsymbol{x}} \newcommand{\By}{\boldsymbol{y}} \newcommand{\Bz}{\boldsymbol{z}} \newcommand{\Balpha}{\boldsymbol{\alpha}} \newcommand{\Bbeta}{\boldsymbol{\beta}} \newcommand{\Bgamma}{\boldsymbol{\gamma}} \newcommand{\Bdelta}{\boldsymbol{\delta}} \newcommand{\Beps}{\boldsymbol{\eps}} \newcommand{\Bveps}{\boldsymbol{\veps}} \newcommand{\Bzeta}{\boldsymbol{\zeta}} \newcommand{\Beta}{\boldsymbol{\eta}} \newcommand{\Btheta}{\boldsymbol{\theta}} \newcommand{\Biota}{\boldsymbol{\iota}} \newcommand{\Bkappa}{\boldsymbol{\kappa}} \newcommand{\Blambda}{\boldsymbol{\lambda}} \newcommand{\Bmu}{\boldsymbol{\mu}} \newcommand{\Bnu}{\boldsymbol{\nu}} \newcommand{\Bxi}{\boldsymbol{\xi}} \newcommand{\Bomicron}{\boldsymbol{\omicron}} \newcommand{\Bpi}{\boldsymbol{\pi}} \newcommand{\Brho}{\boldsymbol{\rho}} \newcommand{\Bsigma}{\boldsymbol{\sigma}} \newcommand{\Btau}{\boldsymbol{\tau}} \newcommand{\Bupsilon}{\boldsymbol{\upsilon}} \newcommand{\Bphi}{\boldsymbol{\phi}} \newcommand{\Bfi}{\boldsymbol{\fi}} \newcommand{\Bchi}{\boldsymbol{\chi}} \newcommand{\Bpsi}{\boldsymbol{\psi}} \newcommand{\Bomega}{\boldsymbol{\omega}} % .. calligraphic \newcommand{\CA}{\mathcal{A}} \newcommand{\CB}{\mathcal{B}} \newcommand{\CC}{\mathcal{C}} \newcommand{\CD}{\mathcal{D}} \newcommand{\CE}{\mathcal{E}} \newcommand{\CF}{\mathcal{F}} \newcommand{\CG}{\mathcal{G}} \newcommand{\CH}{\mathcal{H}} \newcommand{\CI}{\mathcal{I}} \newcommand{\CJ}{\mathcal{J}} \newcommand{\CK}{\mathcal{K}} \newcommand{\CL}{\mathcal{L}} \newcommand{\CM}{\mathcal{M}} \newcommand{\CN}{\mathcal{N}} \newcommand{\CO}{\mathcal{O}} \newcommand{\CP}{\mathcal{P}} \newcommand{\CQ}{\mathcal{Q}} \newcommand{\CR}{\mathcal{R}} \newcommand{\CS}{\mathcal{S}} \newcommand{\CT}{\mathcal{T}} \newcommand{\CU}{\mathcal{U}} \newcommand{\CV}{\mathcal{V}} \newcommand{\CW}{\mathcal{W}} \newcommand{\CX}{\mathcal{X}} \newcommand{\CY}{\mathcal{Y}} \newcommand{\CZ}{\mathcal{Z}} % .. typewriter \newcommand{\TA}{\mathtt{A}} \newcommand{\TB}{\mathtt{B}} \newcommand{\TC}{\mathtt{C}} \newcommand{\TD}{\mathtt{D}} \newcommand{\TE}{\mathtt{E}} \newcommand{\TF}{\mathtt{F}} \newcommand{\TG}{\mathtt{G}} \newcommand{\TH}{\mathtt{H}} \newcommand{\TI}{\mathtt{I}} \newcommand{\TJ}{\mathtt{J}} \newcommand{\TK}{\mathtt{K}} \newcommand{\TL}{\mathtt{L}} \newcommand{\TM}{\mathtt{M}} \newcommand{\TN}{\mathtt{N}} \newcommand{\TO}{\mathtt{O}} \newcommand{\TP}{\mathtt{P}} \newcommand{\TQ}{\mathtt{Q}} \newcommand{\TR}{\mathtt{R}} \newcommand{\TS}{\mathtt{S}} \newcommand{\TT}{\mathtt{T}} \newcommand{\TU}{\mathtt{U}} \newcommand{\TV}{\mathtt{V}} \newcommand{\TW}{\mathtt{W}} \newcommand{\TX}{\mathtt{X}} \newcommand{\TY}{\mathtt{Y}} \newcommand{\TZ}{\mathtt{Z}}$

$p$-norms are ways to aggregate numbers based on their $p\th$ power (for $1 \leq p \leq \infty$). The higher $p$ is, the more it cares about extreme values. In fact, the $\infty$-norm is precisely the maximum value.

Depending on how you scale them, there are two kinds of $p$-norms:

  • $p$-lengths, which “sum” the numbers,
  • $p$-means, which “average” the numbers.

$p$-length

This is the usual notion of $p$-norm of a vector $x \in \R^n$:

\[ \norm{x}_p \ce \p{|x_1|^p + \cdots |x_n|^p}^{1/p}. \]

In particular:

  • $\norm{x}_1$ is the Manhattan length,
  • $\norm{x}_2$ is the Euclidean length,
  • $\norm{x}_{\infty}$ is just the (absolute) maximum: $\norm{x}_\infty = \max_i |x_i|$.

The $p$-length decreases with $p$: when $p=1$, every number contributes to the sum “equally”, but as the power $p$ increases, the lower elements become less and less significant to the sum, until only the maximum value matters at all.

It decreases a lot unless $x$ is sparse

Let $1 \le p < q < \infty$. Since the function $x^{q/p}$ is strictly convex, by superadditivity, we have

\[ \al{ \norm{x}_q &= \p{|x_1|^q + \cdots |x_n|^q}^{1/q}\\ &= \p{\p{|x_1|^p}^{q/p} + \cdots + \p{|x_n|^p}^{q/p}}^{1/q}\\ &\le \p{\p{|x_1|^p + \cdots + |x_n|^p}^{q/p}}^{1/q}\tag{superadditivity}\\ &= \p{|x_1|^p + \cdots + |x_n|^p}^{1/p}\\ &= \norm{x}_p, } \]

with equality iff all but one of the coordinates are zero: vector $x$ is “perfectly sparse”.

This is actually robust fact: if the ratio $\frac{\norm{x}_p}{\norm{x}_q}$ is small, then $x$ must be sparse (in $q$-norm). Indeed, for any threshold $h$, you can separate $x = x^{\ge h} + x^{< h}$ where $x^{\ge h}$ contains the entries bigger than $h$ and $x^{<h}$ contains the rest. Then by the same calculation as above,

\[ \norm{x^{<h}}_q^q \le \norm{x^{<h}}_p^p h^{q-p} \le \norm{x}_p^p h^{q-p}, \]

so you can make sure that $x^{<h}$ contains at most a $\delta$ fraction of the $q$-norm by setting

\[ \norm{x}_p^p h^{q-p} = \delta \norm{x}_q^q \Leftrightarrow h \ce \p{\frac{\delta \norm{x}_q^q}{\norm{x}_p^p}}^{\frac{1}{q-p}}, \]

and then the number of nonzero entries in $x^{\ge h}$ is at most

\[ \frac{\norm{x}_p^p}{h^p} = \p{\frac{\p{\norm{x}_p/\norm{x}_q}^q}{\delta}}^{\frac{p}{q-p}}. \]

See Lower-degree norms give sparsity for more intuition and details.

$p$-mean

Let $a_1, \ldots, a_n \in \R_{\geq 0}$. If you divide the sum of the $p\th$ powers by $n$, you get a sort of average called “$p$-mean”1

\[ M_p(a) \ce \p{\frac{a_1^p + \cdots a_n^p}{n}}^{1/p}. \]

More generally, we can consider the $p\th$ moment of a random variable $\BX$ (which we’ll assume is nonnegative for simplicity)

\[ \norm{\BX}_p \ce \E[\BX^p]^{1/p}. \]

In particular:

  • $M_1(a)$ is the arithmetic mean and $\norm{\BX}_1$ is the expectation,
  • $M_2(a)$ is the quadratic mean,
  • $M_\infty(a)$ and $\norm{\BX}_\infty$ are just the maximum.2

The $p$-mean / $p\th$ moment increases with $p$: when $p=1$, every number contributes to the average “equally”, but as the power $p$ increases, the lower elements become less and less significant to the average, until only the maximum value matters at all.

It increases a lot unless $a$ is spread / $\BX$ is reasonable

Let $1 \le p < q < \infty$. Since the function $x^{q/p}$ is strictly convex, by convexity, we have

\[ \al{ \norm{\BX}_p &= \p{\E[\BX^p]}^{1/p}\\ &= \p{\E[\BX^p]^{q/p}}^{1/q}\\ %&\leq \E\b{\p{\abs{X}^p}^{q/p}}^{1/q}\tag{by Jensen's}\\ &\le \p{\E\b{\BX^q}}^{1/q}\tag{convexity}\\ &= \norm{\BX}_q, } \]

and thus also $M_p(a) \leq M_q(a)$. In particular, the inequality $M_1(a) \leq M_2(a)$ (known as “AM-QM”) is a special case of Cauchy–Schwarz, and the inequality $\norm{\BX}_1 \le \norm{\BX}_2$ is a consequence of the definition of variance.

The equality case is when

  • $\BX$ is constant: it’s “perfectly reasonable”,
  • all the $a_i$’s are equal: they are “perfectly spread”.

This is also a robust fact: if the ratio $\frac{\norm{\BX}_q}{\norm{\BX}_p}$ is small, then $\BX$ must be reasonable: in particular, it can’t be almost always smaller than its $p$-norm. Indeed,

\[ \al{ &&\E[\BX^p] &\leq (t\norm{\BX}_p)^p + \E\b{\BX^p \cdot \1[\BX \geq t\norm{\BX}_p]}\\ &&&\leq (t\norm{\BX}_p)^p + \E[\BX^q]^{\frac{p}{q}}\cdot\Pr[\BX \geq t\norm{\BX}_p]^{\frac{q-p}{q}}\tag{by Hölder's}\\ &\Leftrightarrow& \Pr[\BX \geq t\norm{\BX}_p]^{\frac{q-p}{p}} &\geq \frac{(1-t^p)\norm{\BX}_p^p}{\norm{\BX}_q^p}\\ &\Leftrightarrow& \Pr[\BX \geq t\norm{\BX}_p] &\geq \p{\frac{1-t^p}{\p{\norm{\BX}_q/\norm{\BX}_p}^p}}^{\frac{q}{q-p}}, } \]

so in particular, when the ratio is constant, $\Pr[\BX \geq t\norm{X}_p] \geq \Omega(1)$ for any $t<1$.

Note: I’m pretty sure you can also show that when $\frac{\norm{\BX}_q}{\norm{\BX}_p} \to 1$, it’s not only reasonable but actually concentrated:

\[\Pr[\BX \not\in (1\pm o(1))\norm{\BX}_p] \leq o(1).\]

For $p=1,q=2$, it is true by Chebyshev since $\norm{\BX}_2^2-\norm{\BX}_1^2$ is precisely the variance of $\BX$. I suspect you could prove it in general using a quantitative version of Jensen’s inequality and a lot of courage.

Overall picture

Bounding either side

The $p$-length inequalities and the $p$-mean inequalities complement each other. On the one hand, we can use the $p$-mean inequalities to bound how fast the $p$-length decreases:

\[ \norm{x}_q = n^{\frac{1}{q}}\cdot M_q(x) \ge n^\frac{1}{q}\cdot M_p(x) = \frac{\norm{x}_p}{n^{\frac{1}{p}-\frac{1}{q}}}. \]

And on the other hand, we can use the $p$-length inequalities bound how fast the $p$-means increases:

\[ M_q(a) = \frac{\norm{a}_q}{n^{\frac{1}{q}}} \leq \frac{\norm{a}_p}{n^{\frac{1}{q}}} = n^{\frac{1}{p}-\frac{1}{q}} \cdot M_p(a). \]

But on the third hand, you can’t bound how fast the moments of a continuous random variable increase: in fact, it could be that the $p\th$ moment is finite but the $q\th$ moment is infinite.

So we get the following picture (for $1 \le p < q \leq \infty$):

  • The $q$-norm $\norm{x}_q$ is stuck between $\frac{\norm{x}_p}{n^{\frac{1}{p}-\frac{1}{q}}}$ (“spread” regime) and $\norm{x}_p$ (“sparse” regime).
  • The $q$-mean $M_q(a)$ is stuck between $M_p(a)$ (“spread” regime) and $n^{\frac{1}{p}-\frac{1}{q}}\cdot M_p(a)$ (“sparse” regime).
  • The $q\th$ moment $\norm{\BX}_q$ is stuck between $\norm{\BX}_p$ (“reasonable/concentrated” regime) and
    • either $n^{\frac{1}{p}-\frac{1}{q}}\norm{\BX}_p$ if $\BX$ is discrete (“sparse” regime),
    • or $\infty$ if $\BX$ is continuous (“spiky” regime).

TODO: change “concentrated” to “spread” in this graph #figure TODO: also make a “table” #figure

Qualities being tracked

We can use ratios between norms to track different qualitative behaviors (the ratios below are $\ge 1$, with $1$ indicating perfection):

  • for a vector $x \in \R^n$, the length ratios $\frac{\norm{x}_p}{\norm{x}_q}$ track its sparsity;
  • for a sequence $a \in \R_{\ge 0}^n$, the mean ratios $\frac{M_q(a)}{M_p(a)}$ track its spreadness;
  • for a random variable $\BX$, the moment ratios $\frac{\norm{\BX}_q}{\norm{\BX}_p}$ track its reasonableness;
    • by extension, so do the norm ratios $\frac{\norm{f}_q}{\norm{f}_p}$ for a function $f$ (by considering its value $f(\BX)$ over a random input $\BX$);
  • for a discrete random variable $\BX$, the power entropy ratios $\f{H_\alpha[\BX]}{H_\beta[\BX]}$ (for $0 \le \alpha < \beta \le \infty$ ) track its flatness.

TODO: add something about Relative density (are inner products like $\inner{f, f^p}$ the right notion?)

Hölder’s inequality

The generalized Hölder’s inequality implies that if

\[ k\cdot\frac{1}{p} = \frac{1}{q_1} + \cdots \frac{1}{q_k} \]

(i.e. $p$ is the harmonic mean of $q_1, \ldots q_k$), then

\[ \norm{x}_p \leq \sqrt[k]{\norm{x}_{q_1} \cdots \norm{x}_{q_k}}. \]

This is true both for $p$-lengths and $p$-means/moments; indeed, one can see that the factors $n$ in the $p$-means cancel out.

Hölder switcheroo

This inequality allows you to “propagate sparsity up” and “propagate spreadness/reasonableness down”.

Indeed, let’s take the $k=2$ case, and pick $q_1 < p < q_2$ such that $\frac{2}{p} = \frac{1}{q_1}+\frac{1}{q_2}$. Then we can rewrite Hölder’s inequality as

\[ \norm{x}_p \leq \sqrt{\norm{x}_{q_1}\norm{x}_{q_2}} \Leftrightarrow \frac{\norm{x}_p}{\norm{x}_{q_2}} \leq \frac{\norm{x}_{q_1}}{\norm{x}_p}. \]

So if the ratio $\frac{\norm{x}_{q_1}}{\norm{x}_{p}}$ is small (which we’ve seen means $x$ is sparse in $p$-norm), then the ratio $\frac{\norm{x}_p}{\norm{x}_{q_2}}$ is also small (which we’ve seen means $x$ is sparse in $q_2$-norm).3

Similarly, for moments, we can write

\[ \norm{\BX}_p \leq \sqrt{\norm{\BX}_{q_1}\norm{\BX}_{q_2}} \Leftrightarrow \frac{\norm{\BX}_p}{\norm{\BX}_{q_1}} \leq \frac{\norm{\BX}_{q_2}}{\norm{\BX}_p}. \]

So if the ratio $\frac{\norm{\BX}_{q_2}}{\norm{\BX}_p}$ is small (which we’ve seen means $\BX$ is reasonable in $p$-norm), then the ratio $\frac{\norm{\BX}_p}{\norm{\BX}_{q_1}}$ is also small (which means $\BX$ is reasonable in $q_1$-norm).4

Valiant-Valiant inequality prover

A wide range of inequalities, including the $p$-length inequalities and Hölder’s inequality on several sequences of numbers can be automatically proved or disproved by Valiant and Valiant’s inequality prover.5

  1. In fact, $p$-means can be defined for any $-\infty \leq p \leq \infty$, but they’re only norms for $p\geq 1$ (in the sense of satisfying the triangle inequality). 

  2. at least, assuming $\BX$ is either discrete or its pdf is continuous 

  3. Note that this is not useful for sparsity itself, since a string that’s sparse for $p$-norm is obviously sparse for any norm bigger than $p$, but maybe the inequality of ratios can itself be useful. 

  4. Again, this is not useful for concentration itself, since a variable that’s concentrated in $p$-norm is obviously concentrated for any norm smaller than $p$, but the inequality of ratios can itself be useful, in particular in Hypercontractivity

  5. Gregory Valiant and Paul Valiant, ”An automatic inequality prover and instance optimal identity testing”.