Communication through noisy channels
Alexei Gilchrist
1 Communication through channels
Consider communicating through a channel. Here a channel can mean any medium like a cable, air, protein instructions in DNA, whatever. A typical channel will introduce errors.
To keep it simple we’ll look memory-less channels where the effect of channel doe not depend on what went before, and static channels whose properties do not change in time.
2 Some types of channel
Here are some examples of abstract channels. A set of symbols on the input is transmitted through the channel and output symbols are received.
2.1 Noiseless channel
The simplest channel is the noiseless channel. The channel transmits symbols without error.
In the following channel an error can occur with each symbol. With 50% probability the channel changes a 0 to a 1 or to a 2, and with 50% probability it changes a 1 into a 3 or into a 4. Even though the error acts with a high probability, because the output symbols do not overlap the message can be decoded without error. So from an information point of view this is still a noiseless channel.
2.2 Typewriter channel
In the following we have what is known as a “typewriter” channel. With some probability a symbol is transmitted without error and with the complementary probability it is transmitted as the next symbol in the sequence (looped around for the last symbol). Clearly this channel cannot be decoded without error.
Notice however, if we only use a subset of the input symbols we can transform the typewriter channel into a noiseless channel. This gives us out first glimpse of how by encoding the input symbols we can mitigate the effect of the noise in the channel. In this case we can remove the noise altogether, at the expense of having fewer symbols in which to encode our message, so the rate of communication will drop.
2.3 Binary erasure channel
In the following channel there is a probability \(f\) that an error will occur (hence \(1-f\) that the symbol was transmitted without error). When an error does occur it is flagged in someway so that we can represent it by an error symbol \(e\).
In this kind of channel we know when the errors occur so that we can take suitable action, like asking for a repeat of the message.
2.4 Binary symmetric channel
In the following channel a bit-flip error occurs with probability \(f\). This is an error which will swap a 0 to a 1 and vice versa.
On the output we receive zeros and ones and no indication if an error has occurred or not.
3 Mutual information
The mutual information \(I(X;Y)\) gives a way to characterise the quality of a channel.
In this context, symbols from some set are sent into the channel which we’ll denote as a set of propositions \(X\) with probabilities \(p(x)\) for \(x\in X\) meaning the probability of sending symbol \(x\) into the channel. The channel then outputs symbols from a set which may differ from the input set, and these events will be denoted as the set of propositions \(Y\). The relevant probabilities are
- \(p(x)\) the probability symbol \(x\) was sent into the channel.
- \(p(y)\) the probability of getting symbol \(y\) out of the channel not knowing what was sent in.
- \(p(y|x)\) the probability of getting symbol \(y\) given symbol \(x\) was sent in.
- \(p(x|y)\) the probability that symbol \(x\) was sent in given we got \(y\) out of the channel.
Similarly the following entropies have natural interpretations
- \(H(X)\) the uncertainty, or average information (surprise), in the symbols sent into the channel
- \(H(Y)\) the uncertainty, or average information (surprise), in the symbols emerging from the channel
- \(H(X|Y)\) the average uncertainty in the input after observing the output
- \(H(X|Y)\) the average uncertainty in the output after observing the input
If \(H(X)\) the uncertainty in the input before observing the output and \(H(X|Y)\) the average uncertainty in the input after observing the output, then
It’s often easiest to calculate \(I(X;Y) = H(Y) - H(Y|X)\) as the channel diagram gives \(P(y|x)\) in the transition probabilities and the other terms can be calculated by straight multiplication.
In the following video I calculate the mutual information for various channels:
4 Repetition codes
5 Noisy channel coding theorem
How do we use a channel to communicate as much as possible? Let’s look at how much information the output of a channel conveys about the input, and then look to maximising that.
So far we have been interpreting the mutual information as the amount by which the uncertainty of the input is reduced by observing the output of the channel. So we have been thinking of the uncertainty of the input as a given quantity out of our control.
We can also expand the mutual information as
Now, for every discrete memoryless channel, Shannon’s noisy channel coding theorem says:
- Rates \(R \le C\) are achievable with arbitrarily small error, at least in the limit of large code sizes. Intuitively, for large enough blocks a channel will appear like a non-overlapping channel (which is equivalent to a noiseless channel). Note that the theorem does not say what the codes are, it just says they exist. Finding suitable codes is generally hard.
- If a certain amount of error probability \(p_\mathrm{err}\) is acceptable, higher communication rates up to \(R(p_\mathrm{err})\) are achievable, where \[\begin{equation*}R(p_\mathrm{err}) = \frac{C}{1-H(p_\mathrm{err})}\end{equation*}\]
- For a given \(p_\mathrm{err}\), rates higher than \(R(p_\mathrm{err})\) are not achievable.