Written by Alexei Gilchrist, updated months ago

## 1 Definition

The mutual information \(I(X;Y)\) for \(x\in X\) and \(y\in Y\), measures the `distance' between the full joint probability distribution \(p(xy)\), and the probability distribution that assumes \(x\) and \(y\) are independent \(p(x)p(y)\).
That is, in terms of the mutual entropy,
\begin{equation*}
D(p(xy)||p(x)p(y)) = \sum_{x,y} p(x,y)\log\frac{p(x,y)}{p(x)p(y)} \equiv I(X;Y)
\end{equation*}

\(I(X;Y)\) is a measure of how much information \(X\) contains about \(Y\) (and vice versa).

## 2 Properties

- Symmetric: \(I(X;Y)=I(Y;X)\)
- \(I(X;Y)\ge 0\). This follows immediately from the relative entropy since that is non-negative.
- If \(I(X;Y)=0\) then \(x\) and \(y\) are independent

## 3 Relationship to entropies

The chain rule for entropies is
\begin{equation*}
H(X,Y) = H(X|Y)+ H(Y) = H(Y|X) + H(X)
\end{equation*}
but we can write the mutual information also in terms of these entropies
\begin{align*}
I(X;Y) &= \sum_{x,y} p(x,y)\log\frac{p(x,y)}{p(x)p(y)} \\
&= \sum_{x,y} p(x,y)\log\frac{p(x|y)p(y)}{p(x)p(y)} \\
&= \sum_{x,y} p(x,y)\log p(x|y)-\sum_{x,y} p(x,y)\log p(x)\\
&= -H(X|Y) + H(X)
\end{align*}
and because the mutual information is symmetric we have
\begin{equation*}
I(X;Y) = H(X)-H(X|Y) = H(Y)-H(Y|X),
\end{equation*}
and using the joint entropy,
\begin{equation*}
I(X;Y) = H(X)+H(Y)-H(X,Y).
\end{equation*}

The relationship between the mutual information and the various entropies is summarised in the diagram below.

A graphical depiction of the relationship between mutual information and the various entropies and the way different terms add to give others.

Note that
\begin{equation*}
I(X;X) = H(X)-H(X|X) = H(X)
\end{equation*}
Since \(p(x|x)=1\) and so \(H(X|X)=0\).