# Relative Entropy

## Alexei Gilchrist

The relative entropy, or Kullback-Leibler divergence is a measure of the difference of two distributions

❦

### 1 Definition

The relative entropy of two distributions \(p(x)\) and \(q(x)\) is defined as

\[\begin{equation*}D(p||q) = \sum_{x} p(x) \log \frac{p(x)}{q(x)}\end{equation*}\]

It’s also known as the Kullback-Leibler divergence or distance.Note that it’s not a true metric or distance as it’s not symmetric for starters.

### 2 `Derivation’

The following is a plausible story that would lead you to the relative entropy. Say the true probabilities in some problem are

\[\begin{equation*}\{p(x_{1}), p(x_{2}), \ldots p(x_{n})\}\end{equation*}\]

but we believe we have \[\begin{equation*}\{q(x_{1}), q(x_{2}), \ldots q(x_{n})\}\end{equation*}\]

so that with proposition \(x\) we mistakenly associate the information \[\begin{equation*}I_{q}(x) = -\log q(x).\end{equation*}\]

The erroneous average information will be \[\begin{equation*}\langle I_{q}(x) \rangle_{x} = -\langle \log(q(x)) \rangle_{x} = -\sum p(x)\log(q(x))\end{equation*}\]

since the symbols are still occurring with probabilities \(p(x)\).The real average information is of course the entropy of \(p(x)\):

\[\begin{equation*}\langle I_{p}(x) \rangle_{x} = -\sum p(x)\log(p(x)) = H(x).\end{equation*}\]

The discrepancy between these two is the relative entropy: \[\begin{align*}I_{q}(x)-I_{p}(x) &= \left\langle \log\frac{1}{q(x)} \right\rangle_{x} - \left\langle \log\frac{1}{p(x)} \right\rangle_{x} \\
&= -\sum p(x)\log(q(x)) +\sum p(x)\log(p(x))\\
&= \sum_{x} p(x) \log \frac{p(x)}{q(x)} \\
&= D(p||q)\end{align*}\]

### 3 Properties

- Not symmetric: \(D(p||q)\ne D(q||p)\). Naively you might have thought that mistaking \(p\) for \(q\) was the same as mistaking \(q\) for \(p\) but this is not the case. Think about the consequences of mistaking a flat distribution with a sharply peaked one and vice versa.
- \(D(p||q)\ge 0\) with equality implying \(p(x)=q(x)\) for all \(x\).

The proof for the last item follows from Jensen’s inequality.

Proof

First start by writing the relative entropy as

\[\begin{equation*}D(p||q) = -\sum_{x} p(x) \log \frac{q(x)}{p(x)} = \sum_{x} p(x) f(u)\end{equation*}\]

where we have inverted the fraction in the logarithm an brought out an overall minus sign. The function \(f(x)=-\log(u_{x})\) is a strictly convex function of \(u_{x}=q(x)/p(x)\) and the last term is the average of \(f\) so we can apply Jensen’s inequality, \[\begin{align*}D(p||q) &\ge f\left(\sum_{x}p(x)u_{x}\right)\\
&= -\log\left(\sum_{x}p(x)\frac{q(x)}{p(x)}\right) \\
&= -\log(1) = 0.\end{align*}\]

Lastly, when \(D(p||q) =0\), since \(f\) is *strictly*convex we have \(u_{x}=\langle u_{x}\rangle\), so \(q(x)/p(x)=1\) or \(q(x)=p(x)\).☐