Written by Alexei Gilchrist, updated months ago
The joint entropy and its properties
Level: 2, Subjects: Information Theory

1 Definition

Say \(x\in X\) and \(y\in Y\) then the average information contained by the joint probability distribution \(P(xy)\) is \begin{equation} \langle \mathcal{I}(P(x,y))\rangle = - \sum_{x,y} P(xy) \log P(xy) \equiv H(X,Y) \end{equation} this is also known as the joint entropy, and it's a measure of the combined uncertainty in \(x\) and \(y\). I've deliberately not used the symbol \(I\) to denote information here as the quantity \(I(X;Y)\) to be introduced later will mean something else entirely.

2 Properties

  1. \(H(X,Y)\ge 0\)
  2. The joint entropy is symmetric \(H(X,Y) = H(Y, X)\).
  3. If \(X\) and \(Y\) are independent then \(H(X,Y)=H(X)+H(Y)\). That is, \(P(xy)=P(x)P(y)\) so \begin{align} H(X,Y) &= -\sum_{x,y}P(x)P(y) \log( P(x)P(y)) \\ &= -\sum_{x,y}P(x)P(y) \log P(x)-\sum_{x,y}P(x)P(y) \log P(y) \\ &= -\sum_{x}P(x)\log P(x)-\sum_{y}P(y) \log P(y) \\ &= H(X)+H(Y) \end{align}
  4. In general, the uncertainty in a joint distribution is the same or less than the sum of the individual uncertainties: \begin{equation} H(X,Y) \le H(X) + H(Y) \end{equation} This relationship is called subaditivity and we'll prove it later. The relationship is intuitive, the more \(x\) and \(y\) become dependent the less uncertainty there is in the joint distribution.
  5. These properties extend in the obvious way to more than two variables.