Communication through memory-less static channels will be examined, and in particular using repetition codes to counteract the errors introduced by the channel.

Consider communicating through a channel. Here a channel can mean any medium like a cable, air, protein instructions in DNA, whatever. A typical channel will introduce errors.

To keep it simple we'll look *memory-less* channels where the effect of
channel doe not depend on what went before, and *static* channels whose
properties do not change in time.

Here are some examples of abstract channels. A set of symbols on the input is transmitted through the channel and output symbols are received.

The simplest channel is the noiseless channel. The channel transmits symbols without error.

In the following channel an error can occur with each symbol. With 50% probability the channel changes a 0 to a 1 or to a 2, and with 50% probability it changes a 1 into a 3 or into a 4. Even though the error acts with a high probability, because the output symbols do not overlap the message can be decoded without error. So from an information point of view this is still a noiseless channel.

In the following we have what is known as a “typewriter” channel. With some probability a symbol is transmitted without error and with the complementary probability it is transmitted as the next symbol in the sequence (looped around for the last symbol). Clearly this channel cannot be decoded without error.

Notice however, if we only use a subset of the input symbols we can transform
the typewriter channel into a noiseless channel. This gives us out first glimpse
of how by encoding the input symbols we can mitigate the effect of the noise in
the channel. In this case we can remove the noise altogether, at the expense of
having fewer symbols in which to encode our message, so the *rate* of
communication will drop.

In the following channel there is a probability \(f\) that an error will occur
(hence \(1-f\) that the symbol was transmitted without error). When an error
*does* occur it is flagged in someway so that we can represent it by an
error symbol \(e\).

In this kind of channel we know when the errors occur so that we can take suitable action, like asking for a repeat of the message.

In the following channel a *bit-flip* error occurs with probability \(f\). This is an error which will swap a 0 to a 1 and vice versa.

On the output we receive zeros and ones and no indication if an error has occurred or not.

The mutual information \(I(X;Y)\) gives a way to characterise the quality of a channel.

In this context, symbols from some set are sent into the channel which we'll
denote as a set of propositions \(X\) with probabilities \(p(x)\) for \(x\in X\)
meaning the probability of sending symbol \(x\) *into* the channel. The
channel then outputs symbols from a set which may differ from the input set, and
these events will be denoted as the set of propositions \(Y\). The relevant
probabilities are

- \(p(x)\) the probability symbol \(x\) was sent into the channel.
- \(p(y)\) the probability of getting symbol \(y\) out of the channel not knowing what was sent in.
- \(p(y|x)\) the probability of getting symbol \(y\) given symbol \(x\) was sent in.
- \(p(x|y)\) the probability that symbol \(x\) was sent in given we got \(y\) out of the channel.

Similarly the following entropies have natural interpretations

- \(H(X)\) the uncertainty, or average information (surprise), in the symbols sent into the channel
- \(H(Y)\) the uncertainty, or average information (surprise), in the symbols emerging from the channel
- \(H(X|Y)\) the average uncertainty in the input after observing the output
- \(H(X|Y)\) the average uncertainty in the output after observing the input

If \(H(X)\) the uncertainty in the input before observing the output and \(H(X|Y)\)
the average uncertainty in the input after observing the output, then
\begin{equation*}
H(X) - H(X|Y) = I(X;Y)
\end{equation*}
is the *reduction* in uncertainty on using the channel and observing the
output. Note that if the output of the channel was not correlated with the input
in any way then the channel would *not* reduce the uncertainty. If the
output is in some way correlated with the input then our uncertainty of the input
symbols is reduced on observing the output symbols. So the mutual information,
which captures this reduction in uncertainty is a measure of the *quality*
of the channel.

It's often easiest to calculate \(I(X;Y) = H(Y) - H(Y|X)\) as the channel diagram gives \(P(y|x)\) in the transition probabilities and the other terms can be calculated by straight multiplication.

In the following video I calculate the mutual information for various channels:

How do we use a channel to communicate as much as possible? Let's look at how much information the output of a channel conveys about the input, and then look to maximising that.

So far we have been interpreting the mutual information as the amount by which the uncertainty of the input is reduced by observing the output of the channel. So we have been thinking of the uncertainty of the input as a given quantity out of our control.

We can also expand the mutual information as
\begin{equation*}
I(X;Y) = H(Y) - H(Y|X)
\end{equation*}
i.e. How much the uncertainty in the output is reduced on average by observing
the input. Clearly, if we want to communicate effectively we should increase
this. That is, we want as little uncertainty as possible after the use of the
channel, and we can control the probability of symbols we feed into it. The
Maximum of the mutual information over the distribution \(p(x)\) is something
called the *channel capacity*:
\begin{equation*}
C = \underset{p(x)}{\mathrm{max}}\; I(X;Y)
\end{equation*}

Now, for every discrete memoryless channel, Shannon's *noisy channel coding theorem* says:

- Rates \(R \le C\) are achievable with arbitrarily small error, at least in the limit of large code sizes. Intuitively, for large enough blocks a channel will appear like a non-overlapping channel (which is equivalent to a noiseless channel). Note that the theorem does not say
*what*the codes are, it just says they exist. Finding suitable codes is generally hard. - If a certain amount of error probability \(p_\mathrm{err}\) is acceptable, higher communication rates up to \(R(p_\mathrm{err})\) are achievable, where \begin{equation*} R(p_\mathrm{err}) = \frac{C}{1-H(p_\mathrm{err})} \end{equation*}
- For a given \(p_\mathrm{err}\), rates higher than \(R(p_\mathrm{err})\) are not achievable.