Cross-entropy
The cross-entropy of
Interpretation
It’s called “cross-entropy” because it mixes and matches elements from the entropies of
Quantity | Relation with |
---|---|
As we’ll see in the next section, |
|
There is no general relation between |
Just like the entropy of
#to-write express as I[X at E; X]?
Relation to information divergence
We can decompose cross-entropy into the entropy of
with equality iff
#to-write chain rule (and observe that it’s not “truly” conditioned in the LHS)
Minimization
When training a generating model, we want to get a model distribution
However, we typically don’t have access to the probability values
only requires knowledge of
When working on a finite dataset
The use of cross-entropy as a loss has a few interesting consequences:
- Since
, cross-entropy loss can never drop to zero. But as the loss approaches , the divergence tends to , which forces to mimic with ever more accuracy.- On the other hand, we usually don’t know the inherent uncertainty
of the real-world distribution, so we can never know for sure how close we are to matching it, and how small the information divergence is.
- On the other hand, we usually don’t know the inherent uncertainty
- Because cross-entropy is the cost of encoding draws from
using the optimal scheme for , any model with low cross-entropy loss gives a cheap encoding scheme for the real world-distribution , and vice versa: generative models are compression algorithms.
Gradient over logits
Suppose that the learned distribution
Then direction of greatest decrease of
which, using the expression for the logarithmic derivatives of softmax, becomes
When learning a real world distribution
- draw some
; - push
up in by ; - push every other
down by .
Backpropagation
If the logits themselves depend on some underlying parameters
In particular, if the logits come from some energy function
then
Pushing in this direction means trying to decrease the energy over the target distribution
#to-write the loss isn’t computable in that case but you can get a proxy loss back