The l2-norm of a random variable is shared between the mean and the variance:

E[X2]=E[X]2+Var[X]=μ2+σ2,

Three categories

#to-write restrict to positive-valued

Depending on how the l2-norm is split, we can place nonnegative random variables into three scaling-invariant categories:

  1. Fat-tailed: σμ.
    • These variables have some rare but crazy-high values that make the variance freak out.
    • e.g. a bit X{0,1} that is very rarely 1
  2. Well-rounded: σ=Θ(μ).
    • These variables are overall civilized, and vary roughly equally above and below their mean.
    • e.g. a uniform random bit, any exponential random variable, the absolute value or square of any normal of mean 0, most real world things
  3. Concentrated: σμ.
    • These variables are tightly concentrated around a specific non-zero value.
    • e.g. a bit X{0,1} that is almost always 1, (the absolute value of) a normal with mean 1 and tiny variance

#figure

Relative variance

We can also define the relative variance r=Var[X]E[X]2=σ2μ2 (also scaling-invariant), with the correspondance

  • fat tailed: r1,
  • well-rounded: r=Θ(1),
  • concentrated: r1.

The shape parameter of a gamma distribution is k=1r=μ2σ2.

The ratio r=σμ is known as the “coefficient of variation”.

Summing independent copies

As we sum independent copies, fat-tailed variables eventually become well-rounded, which quickly become concentrated. Indeed, Y is the sum of m independent copies of X, then

E[Y]2=m2E[X]2butVar[Y]=mVar[X],

so the variance loses ground, and the relative variance becomes m times smaller:

Var[Y]E[Y]2=mVar[X]m2E[X]2=1m×Var[X]E[X]2.

Mixtures

#to-write Eric points out that when you take mixtures, by jensen’s you can only increase the relative variance!