Written by Alexei Gilchrist, updated months ago

The joint entropy and its properties

## 1 Definition

Say \(x\in X\) and \(y\in Y\) then the average information contained by the joint probability distribution \(P(xy)\) is
\begin{equation}
\langle \mathcal{I}(P(x,y))\rangle = - \sum_{x,y} P(xy) \log P(xy) \equiv H(X,Y)
\end{equation}
this is also known as the joint entropy, and it's a measure of the combined
uncertainty in \(x\) and \(y\). I've deliberately not used the symbol \(I\) to denote
information here as the quantity \(I(X;Y)\) to be introduced later will mean
something else entirely.

## 2 Properties

- \(H(X,Y)\ge 0\)
- The joint entropy is symmetric \(H(X,Y) = H(Y, X)\).
- If \(X\) and \(Y\) are independent then \(H(X,Y)=H(X)+H(Y)\). That is, \(P(xy)=P(x)P(y)\) so
\begin{align}
H(X,Y) &= -\sum_{x,y}P(x)P(y) \log( P(x)P(y)) \\
&= -\sum_{x,y}P(x)P(y) \log P(x)-\sum_{x,y}P(x)P(y) \log P(y) \\
&= -\sum_{x}P(x)\log P(x)-\sum_{y}P(y) \log P(y) \\
&= H(X)+H(Y)
\end{align}
- In general, the uncertainty in a joint distribution is the same or less than the sum of the individual uncertainties:
\begin{equation}
H(X,Y) \le H(X) + H(Y)
\end{equation}
This relationship is called
*subaditivity* and we'll prove it later. The
relationship is intuitive, the more \(x\) and \(y\) become dependent the less
uncertainty there is in the joint distribution.
- These properties extend in the obvious way to more than two variables.