Written by Alexei Gilchrist, updated months ago
Communication through memory-less static channels will be examined, and in particular using repetition codes to counteract the errors introduced by the channel.
Level: 3, Subjects: Information Theory

1 Communication through channels

Consider communicating through a channel. Here a channel can mean any medium like a cable, air, protein instructions in DNA, whatever. A typical channel will introduce errors.

To keep it simple we'll look memory-less channels where the effect of channel doe not depend on what went before, and static channels whose properties do not change in time.

Communication process

2 Some types of channel

Here are some examples of abstract channels. A set of symbols on the input is transmitted through the channel and output symbols are received.

2.1 Noiseless channel

The simplest channel is the noiseless channel. The channel transmits symbols without error.

A noiseless channel

In the following channel an error can occur with each symbol. With 50% probability the channel changes a 0 to a 1 or to a 2, and with 50% probability it changes a 1 into a 3 or into a 4. Even though the error acts with a high probability, because the output symbols do not overlap the message can be decoded without error. So from an information point of view this is still a noiseless channel.

A non-overlapping channel (also noiseless).

2.2 Typewriter channel

In the following we have what is known as a “typewriter” channel. With some probability a symbol is transmitted without error and with the complementary probability it is transmitted as the next symbol in the sequence (looped around for the last symbol). Clearly this channel cannot be decoded without error.

A typewriter channel.

Notice however, if we only use a subset of the input symbols we can transform the typewriter channel into a noiseless channel. This gives us out first glimpse of how by encoding the input symbols we can mitigate the effect of the noise in the channel. In this case we can remove the noise altogether, at the expense of having fewer symbols in which to encode our message, so the rate of communication will drop.

Using only a subset of a typewriter channel can make it noiseless.

2.3 Binary erasure channel

In the following channel there is a probability \(f\) that an error will occur (hence \(1-f\) that the symbol was transmitted without error). When an error does occur it is flagged in someway so that we can represent it by an error symbol \(e\).

A binary erasure channel. With probability \(1-f\) the symbol is transmitted correctly otherwise a flagged error \(e\) is transmitted.

In this kind of channel we know when the errors occur so that we can take suitable action, like asking for a repeat of the message.

2.4 Binary symmetric channel

In the following channel a bit-flip error occurs with probability \(f\). This is an error which will swap a 0 to a 1 and vice versa.

A binary symmetric channel. With probability \(1-f\) the symbol is transmitted correctly, otherwise it's flipped.

On the output we receive zeros and ones and no indication if an error has occurred or not.

3 Mutual information

The mutual information \(I(X;Y)\) gives a way to characterise the quality of a channel.

In this context, symbols from some set are sent into the channel which we'll denote as a set of propositions \(X\) with probabilities \(p(x)\) for \(x\in X\) meaning the probability of sending symbol \(x\) into the channel. The channel then outputs symbols from a set which may differ from the input set, and these events will be denoted as the set of propositions \(Y\). The relevant probabilities are

  • \(p(x)\) the probability symbol \(x\) was sent into the channel.
  • \(p(y)\) the probability of getting symbol \(y\) out of the channel not knowing what was sent in.
  • \(p(y|x)\) the probability of getting symbol \(y\) given symbol \(x\) was sent in.
  • \(p(x|y)\) the probability that symbol \(x\) was sent in given we got \(y\) out of the channel.

Similarly the following entropies have natural interpretations

  • \(H(X)\) the uncertainty, or average information (surprise), in the symbols sent into the channel
  • \(H(Y)\) the uncertainty, or average information (surprise), in the symbols emerging from the channel
  • \(H(X|Y)\) the average uncertainty in the input after observing the output
  • \(H(X|Y)\) the average uncertainty in the output after observing the input

If \(H(X)\) the uncertainty in the input before observing the output and \(H(X|Y)\) the average uncertainty in the input after observing the output, then \begin{equation*} H(X) - H(X|Y) = I(X;Y) \end{equation*} is the reduction in uncertainty on using the channel and observing the output. Note that if the output of the channel was not correlated with the input in any way then the channel would not reduce the uncertainty. If the output is in some way correlated with the input then our uncertainty of the input symbols is reduced on observing the output symbols. So the mutual information, which captures this reduction in uncertainty is a measure of the quality of the channel.

It's often easiest to calculate \(I(X;Y) = H(Y) - H(Y|X)\) as the channel diagram gives \(P(y|x)\) in the transition probabilities and the other terms can be calculated by straight multiplication.

In the following video I calculate the mutual information for various channels:

Examples of abstract communication channels and how to calculate and interpret the mutual information.

4 Repetition codes

Repetition codes can be used to counteract channel noise. We look at how to optimally decode given channel properties and what the trade off is between effective error rate and communication rate.

5 Noisy channel coding theorem

How do we use a channel to communicate as much as possible? Let's look at how much information the output of a channel conveys about the input, and then look to maximising that.

So far we have been interpreting the mutual information as the amount by which the uncertainty of the input is reduced by observing the output of the channel. So we have been thinking of the uncertainty of the input as a given quantity out of our control.

We can also expand the mutual information as \begin{equation*} I(X;Y) = H(Y) - H(Y|X) \end{equation*} i.e. How much the uncertainty in the output is reduced on average by observing the input. Clearly, if we want to communicate effectively we should increase this. That is, we want as little uncertainty as possible after the use of the channel, and we can control the probability of symbols we feed into it. The Maximum of the mutual information over the distribution \(p(x)\) is something called the channel capacity: \begin{equation*} C = \underset{p(x)}{\mathrm{max}}\; I(X;Y) \end{equation*}

Now, for every discrete memoryless channel, Shannon's noisy channel coding theorem says:

  1. Rates \(R \le C\) are achievable with arbitrarily small error, at least in the limit of large code sizes. Intuitively, for large enough blocks a channel will appear like a non-overlapping channel (which is equivalent to a noiseless channel). Note that the theorem does not say what the codes are, it just says they exist. Finding suitable codes is generally hard.
  2. If a certain amount of error probability \(p_\mathrm{err}\) is acceptable, higher communication rates up to \(R(p_\mathrm{err})\) are achievable, where \begin{equation*} R(p_\mathrm{err}) = \frac{C}{1-H(p_\mathrm{err})} \end{equation*}
  3. For a given \(p_\mathrm{err}\), rates higher than \(R(p_\mathrm{err})\) are not achievable.

The different regions referred to by Shannon's noisy channel coding theorem.