# Conditional Entropy

## Alexei Gilchrist

Conditional entropy.

### 1 Definition

Given the definitions for entropy and joint entropy, the definition for the conditional entropy is not what you might expect. With $$x\in X$$ and $$y\in Y$$,

$\begin{equation*}H(X|Y) \equiv -\sum_{x,y} P(xy)\log P(x|y)\end{equation*}$
Note the conditional probability only appears in the logarithm. Because of this the role of the variable in the “given” part is fundamentally different from that in conditional probabilities.

There’s a different approach to defining the conditional entropy that perhaps gives more intuition. Say we know the value $$y$$ and ask what is the entropy of the distribution of $$x$$ given this information

$\begin{equation*}H(X|y) = -\sum_{x} P(x|y) \log P(x|y)\end{equation*}$
This entropy is clearly a function of $$y$$. Now average $$H(X|y)$$ over $$y$$:
$\begin{equation*}\sum_{y}P(y)H(X|y) = -\sum_{x,y}P(y)P(x|y) \log P(x|y)=-\sum_{x,y}P(xy) \log P(x|y) = H(X|Y)\end{equation*}$
So the conditional entropy is perhaps better denoted as
$\begin{equation*}H(X|Y) = \langle H(X|y) \rangle_{y\in Y}\end{equation*}$

### 2 Chain Rule

The joint, conditional and individual entropies are related in a simple way reminiscent of the logarithm of the product rule

$\begin{equation*}H(X,Y) = H(X|Y) + H(Y) = H(Y|X) + H(X)\end{equation*}$

Proof

\begin{align*}H(X,Y) &= -\sum_{x,y} P(xy)\log\left[P(y|x)P(x)\right]\\ &= -\sum_{x,y} P(xy)\log P(y|x)-\sum_{x,y} P(xy)\log P(x)\\ &= H(Y|X)-\sum_{x,y} P(x)\log P(x)\\ &= H(Y|X)+H(X)\end{align*}

For the other identity simply swap $$X$$ and $$Y$$.

The chain rule also extends to multiple variables

$\begin{equation*}H(X_{1}X_{2}X_{3}\ldots X_{n}) = H(X_{1}|X_{2}\ldots X_{n})+H(X_{2}|X_{3}\ldots X_{n})+\ldots+H(X_{n})\end{equation*}$

### 3 Properties

1. $$H(X|Y)\ne H(Y|X)$$
2. If $$H(X|Y)=0$$ then $$x$$ is completely determined if we know $$y$$.
3. $$H(X|Y)\le H(X)$$ with $$H(X|Y)=H(X)$$ if $$x$$ and $$y$$ are independent. With this result we have
$\begin{equation*}H(X_{1}X_{2}X_{3}\ldots X_{n})\le \sum_{j}H(X_{j})\end{equation*}$