Entropies are dimensionless notions of how spread out a random variable is (as opposed to dispersions, which care about distance in space). Concretely, for any convex function such that we can define the -entropy of a distribution as
Range
By convexity, for any , so is nonnegativew. On the other hand, if we let , then by convexity we have
so the -entropy takes a maximum of when is uniform.
Links to other quantities
Informally, can be understood as the -divergence , where is an imaginary prior distribution that puts probability on every point. In this sense, the -entropy is (the negative of) the “absolute” version of the -divergence, for a neutral prior. The closer is to uniform, the lower is, and the higher the entropy is.
More broadly, we’ll call “-entropy” any (usually decreasing) function of . For example,
- the Shannon entropy is given by for and ranges from (single point) to (uniform);
- the power entropies are given by for and they all range from (single point) to (uniform).