Written by Alexei Gilchrist, updated months ago
The relative entropy, or Kullback-Leibler divergence is a measure of the difference of two distributions
Level: 2, Subjects: Information Theory

## 1 Definition

The relative entropy of two distributions $$p(x)$$ and $$q(x)$$ is defined as \begin{equation*} D(p||q) = \sum_{x} p(x) \log \frac{p(x)}{q(x)} \end{equation*} It's also known as the Kullback-Leibler divergence or distance.

Note that it's not a true metric or distance as it's not symmetric for starters.

## 2 `Derivation'

The following is a plausible story that would lead you to the relative entropy. Say the true probabilities in some problem are \begin{equation*} \{p(x_{1}), p(x_{2}), \ldots p(x_{n})\} \end{equation*} but we believe we have \begin{equation*} \{q(x_{1}), q(x_{2}), \ldots q(x_{n})\} \end{equation*} so that with proposition $$x$$ we mistakenly associate the information \begin{equation*} I_{q}(x) = -\log q(x). \end{equation*} The erroneous average information will be \begin{equation*} \langle I_{q}(x) \rangle_{x} = -\langle \log(q(x)) \rangle_{x} = -\sum p(x)\log(q(x)) \end{equation*} since the symbols are still occurring with probabilities $$p(x)$$.

The real average information is of course the entropy of $$p(x)$$: \begin{equation*} \langle I_{p}(x) \rangle_{x} = -\sum p(x)\log(p(x)) = H(x). \end{equation*} The discrepancy between these two is the relative entropy: \begin{align*} I_{q}(x)-I_{p}(x) &= \left\langle \log\frac{1}{q(x)} \right\rangle_{x} - \left\langle \log\frac{1}{p(x)} \right\rangle_{x} \\ &= -\sum p(x)\log(q(x)) +\sum p(x)\log(p(x))\\ &= \sum_{x} p(x) \log \frac{p(x)}{q(x)} \\ &= D(p||q) \end{align*}

## 3 Properties

1. Not symmetric: $$D(p||q)\ne D(q||p)$$. Naively you might have thought that mistaking $$p$$ for $$q$$ was the same as mistaking $$q$$ for $$p$$ but this is not the case. Think about the consequences of mistaking a flat distribution with a sharply peaked one and vice versa.
2. $$D(p||q)\ge 0$$ with equality implying $$p(x)=q(x)$$ for all $$x$$.

The proof for the last item follows from Jensen's inequality.

Proof: First start by writing the relative entropy as \begin{equation*} D(p||q) = -\sum_{x} p(x) \log \frac{q(x)}{p(x)} = \sum_{x} p(x) f(u) \end{equation*} where we have inverted the fraction in the logarithm an brought out an overall minus sign. The function $$f(x)=-\log(u_{x})$$ is a strictly convex function of $$u_{x}=q(x)/p(x)$$ and the last term is the average of $$f$$ so we can apply Jensen's inequality, \begin{align*} D(p||q) &\ge f\left(\sum_{x}p(x)u_{x}\right)\\ &= -\log\left(\sum_{x}p(x)\frac{q(x)}{p(x)}\right) \\ &= -\log(1) = 0. \end{align*} Lastly, when $$D(p||q) =0$$, since $$f$$ is strictly convex we have $$u_{x}=\langle u_{x}\rangle$$, so $$q(x)/p(x)=1$$ or $$q(x)=p(x)$$.