Six (and a half) intuitions for KL divergence
a day ago
- #information theory
- #machine learning
- #probability
- KL divergence measures how surprised a model expects to be when observing data from true distribution P, if it falsely believes distribution Q.
- It quantifies the expected evidence for Q over P in hypothesis testing when P is true, and is minimized when Q is the maximum likelihood estimator for P.
- KL divergence represents wasted bits in suboptimal coding if coding for Q when data follows P, and potential log winnings in gambling if exploiting false beliefs Q versus true P.