Written by Alexei Gilchrist, updated months ago
Level: 2, Subjects: Information Theory

1 Definition

Given the definitions for entropy and joint entropy, the definition for the conditional entropy is not what you might expect. With $$x\in X$$ and $$y\in Y$$, \begin{equation*} H(X|Y) \equiv -\sum_{x,y} P(xy)\log P(x|y) \end{equation*} Note the conditional probability only appears in the logarithm. Because of this the role of the variable in the “given” part is fundamentally different from that in conditional probabilities.

There's a different approach to defining the conditional entropy that perhaps gives more intuition. Say we know the value $$y$$ and ask what is the entropy of the distribution of $$x$$ given this information \begin{equation*} H(X|y) = -\sum_{x} P(x|y) \log P(x|y) \end{equation*} This entropy is clearly a function of $$y$$. Now average $$H(X|y)$$ over $$y$$: \begin{equation*} \sum_{y}P(y)H(X|y) = -\sum_{x,y}P(y)P(x|y) \log P(x|y)=-\sum_{x,y}P(xy) \log P(x|y) = H(X|Y) \end{equation*} So the conditional entropy is perhaps better denoted as \begin{equation*} H(X|Y) = \langle H(X|y) \rangle_{y\in Y} \end{equation*}

2 Chain Rule

The joint, conditional and individual entropies are related in a simple way reminiscent of the logarithm of the product rule \begin{equation*} H(X,Y) = H(X|Y) + H(Y) = H(Y|X) + H(X) \end{equation*}

Proof: \begin{align*} H(X,Y) &= -\sum_{x,y} P(xy)\log\left[P(y|x)P(x)\right]\\ &= -\sum_{x,y} P(xy)\log P(y|x)-\sum_{x,y} P(xy)\log P(x)\\ &= H(Y|X)-\sum_{x,y} P(x)\log P(x)\\ &= H(Y|X)+H(X) \end{align*} For the other identity simply swap $$X$$ and $$Y$$.

The chain rule also extends to multiple variables \begin{equation*} H(X_{1}X_{2}X_{3}\ldots X_{n}) = H(X_{1}|X_{2}\ldots X_{n})+H(X_{2}|X_{3}\ldots X_{n})+\ldots+H(X_{n}) \end{equation*}

3 Properties

1. $$H(X|Y)\ne H(Y|X)$$
2. If $$H(X|Y)=0$$ then $$x$$ is completely determined if we know $$y$$.
3. $$H(X|Y)\le H(X)$$ with $$H(X|Y)=H(X)$$ if $$x$$ and $$y$$ are independent. With this result we have \begin{equation*} H(X_{1}X_{2}X_{3}\ldots X_{n})\le \sum_{j}H(X_{j}) \end{equation*}