Written by Alexei Gilchrist, updated months ago
The joint entropy and its properties
Level: 2, Subjects: Information Theory

## 1 Definition

Say $$x\in X$$ and $$y\in Y$$ then the average information contained by the joint probability distribution $$P(xy)$$ is $$\langle \mathcal{I}(P(x,y))\rangle = - \sum_{x,y} P(xy) \log P(xy) \equiv H(X,Y)$$ this is also known as the joint entropy, and it's a measure of the combined uncertainty in $$x$$ and $$y$$. I've deliberately not used the symbol $$I$$ to denote information here as the quantity $$I(X;Y)$$ to be introduced later will mean something else entirely.

## 2 Properties

1. $$H(X,Y)\ge 0$$
2. The joint entropy is symmetric $$H(X,Y) = H(Y, X)$$.
3. If $$X$$ and $$Y$$ are independent then $$H(X,Y)=H(X)+H(Y)$$. That is, $$P(xy)=P(x)P(y)$$ so \begin{align} H(X,Y) &= -\sum_{x,y}P(x)P(y) \log( P(x)P(y)) \\ &= -\sum_{x,y}P(x)P(y) \log P(x)-\sum_{x,y}P(x)P(y) \log P(y) \\ &= -\sum_{x}P(x)\log P(x)-\sum_{y}P(y) \log P(y) \\ &= H(X)+H(Y) \end{align}
4. In general, the uncertainty in a joint distribution is the same or less than the sum of the individual uncertainties: $$H(X,Y) \le H(X) + H(Y)$$ This relationship is called subaditivity and we'll prove it later. The relationship is intuitive, the more $$x$$ and $$y$$ become dependent the less uncertainty there is in the joint distribution.
5. These properties extend in the obvious way to more than two variables.