Written by Alexei Gilchrist, updated months ago
Level: 2, Subjects: Information Theory

1 Definition

Given the definitions for entropy and joint entropy, the definition for the conditional entropy is not what you might expect. With \(x\in X\) and \(y\in Y\), \begin{equation*} H(X|Y) \equiv -\sum_{x,y} P(xy)\log P(x|y) \end{equation*} Note the conditional probability only appears in the logarithm. Because of this the role of the variable in the “given” part is fundamentally different from that in conditional probabilities.

There's a different approach to defining the conditional entropy that perhaps gives more intuition. Say we know the value \(y\) and ask what is the entropy of the distribution of \(x\) given this information \begin{equation*} H(X|y) = -\sum_{x} P(x|y) \log P(x|y) \end{equation*} This entropy is clearly a function of \(y\). Now average \(H(X|y)\) over \(y\): \begin{equation*} \sum_{y}P(y)H(X|y) = -\sum_{x,y}P(y)P(x|y) \log P(x|y)=-\sum_{x,y}P(xy) \log P(x|y) = H(X|Y) \end{equation*} So the conditional entropy is perhaps better denoted as \begin{equation*} H(X|Y) = \langle H(X|y) \rangle_{y\in Y} \end{equation*}

2 Chain Rule

The joint, conditional and individual entropies are related in a simple way reminiscent of the logarithm of the product rule \begin{equation*} H(X,Y) = H(X|Y) + H(Y) = H(Y|X) + H(X) \end{equation*}

Proof: \begin{align*} H(X,Y) &= -\sum_{x,y} P(xy)\log\left[P(y|x)P(x)\right]\\ &= -\sum_{x,y} P(xy)\log P(y|x)-\sum_{x,y} P(xy)\log P(x)\\ &= H(Y|X)-\sum_{x,y} P(x)\log P(x)\\ &= H(Y|X)+H(X) \end{align*} For the other identity simply swap \(X\) and \(Y\).

The chain rule also extends to multiple variables \begin{equation*} H(X_{1}X_{2}X_{3}\ldots X_{n}) = H(X_{1}|X_{2}\ldots X_{n})+H(X_{2}|X_{3}\ldots X_{n})+\ldots+H(X_{n}) \end{equation*}

3 Properties

  1. \(H(X|Y)\ne H(Y|X)\)
  2. If \(H(X|Y)=0\) then \(x\) is completely determined if we know \(y\).
  3. \(H(X|Y)\le H(X)\) with \(H(X|Y)=H(X)\) if \(x\) and \(y\) are independent. With this result we have \begin{equation*} H(X_{1}X_{2}X_{3}\ldots X_{n})\le \sum_{j}H(X_{j}) \end{equation*}