Written by Alexei Gilchrist, updated months ago
Level: 2, Subjects: Information Theory

## 1 Definition

The mutual information $$I(X;Y)$$ for $$x\in X$$ and $$y\in Y$$, measures the `distance' between the full joint probability distribution $$p(xy)$$, and the probability distribution that assumes $$x$$ and $$y$$ are independent $$p(x)p(y)$$. That is, in terms of the mutual entropy, \begin{equation*} D(p(xy)||p(x)p(y)) = \sum_{x,y} p(x,y)\log\frac{p(x,y)}{p(x)p(y)} \equiv I(X;Y) \end{equation*}

$$I(X;Y)$$ is a measure of how much information $$X$$ contains about $$Y$$ (and vice versa).

## 2 Properties

1. Symmetric: $$I(X;Y)=I(Y;X)$$
2. $$I(X;Y)\ge 0$$. This follows immediately from the relative entropy since that is non-negative.
3. If $$I(X;Y)=0$$ then $$x$$ and $$y$$ are independent

## 3 Relationship to entropies

The chain rule for entropies is \begin{equation*} H(X,Y) = H(X|Y)+ H(Y) = H(Y|X) + H(X) \end{equation*} but we can write the mutual information also in terms of these entropies \begin{align*} I(X;Y) &= \sum_{x,y} p(x,y)\log\frac{p(x,y)}{p(x)p(y)} \\ &= \sum_{x,y} p(x,y)\log\frac{p(x|y)p(y)}{p(x)p(y)} \\ &= \sum_{x,y} p(x,y)\log p(x|y)-\sum_{x,y} p(x,y)\log p(x)\\ &= -H(X|Y) + H(X) \end{align*} and because the mutual information is symmetric we have \begin{equation*} I(X;Y) = H(X)-H(X|Y) = H(Y)-H(Y|X), \end{equation*} and using the joint entropy, \begin{equation*} I(X;Y) = H(X)+H(Y)-H(X,Y). \end{equation*}

The relationship between the mutual information and the various entropies is summarised in the diagram below. A graphical depiction of the relationship between mutual information and the various entropies and the way different terms add to give others.

Note that \begin{equation*} I(X;X) = H(X)-H(X|X) = H(X) \end{equation*} Since $$p(x|x)=1$$ and so $$H(X|X)=0$$.