Joint Entropy
Alexei Gilchrist
The joint entropy and its properties
❦
1 Definition
Say \(x\in X\) and \(y\in Y\) then the average information contained by the joint probability distribution \(P(xy)\) is
\[\begin{equation*}\langle \mathcal{I}(P(x,y))\rangle = - \sum_{x,y} P(xy) \log P(xy) \equiv H(X,Y)\end{equation*}\]
this is also known as the joint entropy, and it’s a measure of the combined uncertainty in \(x\) and \(y\). I’ve deliberately not used the symbol \(I\) to denote information here as the quantity \(I(X;Y)\) to be introduced later will mean something else entirely.2 Properties
- \(H(X,Y)\ge 0\)
- The joint entropy is symmetric \(H(X,Y) = H(Y, X)\).
- If \(X\) and \(Y\) are independent then \(H(X,Y)=H(X)+H(Y)\). That is, \(P(xy)=P(x)P(y)\) so \[\begin{align*}H(X,Y) &= -\sum_{x,y}P(x)P(y) \log( P(x)P(y)) \\ &= -\sum_{x,y}P(x)P(y) \log P(x)-\sum_{x,y}P(x)P(y) \log P(y) \\ &= -\sum_{x}P(x)\log P(x)-\sum_{y}P(y) \log P(y) \\ &= H(X)+H(Y)\end{align*}\]
- In general, the uncertainty in a joint distribution is the same or less than the sum of the individual uncertainties: \[\begin{equation*}H(X,Y) \le H(X) + H(Y)\end{equation*}\]This relationship is called subaditivity and we’ll prove it later. The relationship is intuitive, the more \(x\) and \(y\) become dependent the less uncertainty there is in the joint distribution.
- These properties extend in the obvious way to more than two variables.