Hasty Briefsbeta

Bilingual

Six (and a half) intuitions for KL divergence

a day ago
  • #information theory
  • #machine learning
  • #probability
  • KL divergence measures how surprised a model expects to be when observing data from true distribution P, if it falsely believes distribution Q.
  • It quantifies the expected evidence for Q over P in hypothesis testing when P is true, and is minimized when Q is the maximum likelihood estimator for P.
  • KL divergence represents wasted bits in suboptimal coding if coding for Q when data follows P, and potential log winnings in gambling if exploiting false beliefs Q versus true P.