Inequalities between divergences
Direct relations
We can directly “read off” the table for the nonnegative form that:
, , , , all behave similarly in the “tweak” regime; is linearly equivalent to ;1 ; .
In addition, using the tilted definition of
so
Relations that depend on the entropy of the prior
remark like “for the missing links, can be arbitrarily bigger but only if prior has tiny probability values”
Information divergence vs total variation distance
Let
be the smallest probability in the prior, be the vector of the probability differences, so that is the total variation distance between and .
Then
#to-write generalize it (e.g. to chi squared)
Summary
#figure using Levels of closeness notation

#to-write add the link
-
In particular, as a divergence measure,
doesn’t have that much in common with the information divergence despite being based on it. It only looks information-theoretic in the way it approaches . On the other hand, I believe it inherits the niceness of information divergence when it comes to taking gradients. ↩