Given that we are interpreting probability as a measure of plausibility, just what is the relationship of probabilities to frequencies?

If the problem has built-in frequencies of events or objects then we can use this information to determine the numerical value of probabilities. Following Jaynes, think of an urn with marbles (I know, very classical). The marbles are numbered \(1,\ldots,n\) and there are \(m\) green marbles and the rest are white. Say we draw a marble `randomly' and look at it's colour, what is the probability \(P(g|I)\) that it will be green?

First, by `draw randomly' I mean any method that respects a certain symmetry, which is that I should not be able to tell if someone sneakily renumbered all the marbles in a different order. In other words the method of drawing a marble should be indifferent to *which* marble is drawn. The symmetry means that the probability I should assign to any *particular* marble being drawn should be invariant under renumbering (or I'd be able to tell if they had been renumbered). The only way I can satisfy this symmetry is by assigning equal probabilities and since the total probability should be unity we have the probability of marble \(j\) is \(P(j|I)=1/n\).

Now back to the example. What is the probability of drawing a green marble? \begin{equation} P(g|I) = \sum_j P(gj|I) = \sum_j P(g|jI)P(j|I) = \frac{1}{n}\sum_j P(g|jI) = \frac{m}{n}. \end{equation} Where we first used marginalisation in reverse, then the product rule, then \(P(j|I)=1/n\), and finally that \(P(g|jI)\) will be 1 if the \(j^\mathrm{th}\) marble is green otherwise 0, so the summation just counts the number of green marbles.

The argument can be extended in a natural way to more complicated situations where the background information \(I\) implies that we are counting various members of sets. So if there is frequency data available in the problem we can make use of it to assign numerical values to probabilities.

It's perfectly fine to talk about the probability of a frequency. Say we envision some experiment which is to be repeated \(n\) times and are interested in the probability of obtaining the frequency \(m/n\) of some interesting outcome. To make things more tractable let's assume that the result in each experiment is independent of the other results, and that the probability of the outcome we're interested in is \(a\) in each experiment. With these assumptions the probability of a frequency \(m/n\) is simply the Binomial distribution:
\begin{equation*}
P(m/n|I) = \begin{pmatrix} n\\m \end{pmatrix}a^m (1-a)^{(n-m)}.
\end{equation*}
The probabilities can be read off as a simple application of the product and sum rules. The probability of a particular sequence with \(m\) interesting outcomes and \(n-m\) other outcomes is \(a^m (1-a)^{(n-m)}\), this is just the propositions of the outcomes *AND*ed together (hence the product of the individual probabilities since they are independent). The Binomial factor is all the ways \(m\) outcomes can appear in \(n\) so the groups of propositions are logically *OR*ed together (hence the probabilities are added as they are mutually exclusive), and since the probability is the same for each group there is an overall factor.

These probabilities are just the terms in a Binomial expansion so the distribution is normalised \begin{equation*} \sum_m P(m/n|I) = \sum_m \begin{pmatrix} n\\m \end{pmatrix}a^m (1-a)^{(n-m)} = (a+1-a)^n = 1. \end{equation*}

Note that for any given \(n\) there is not a single `true' frequency, rather there are a range of frequencies with a high probability. While we can report just the highest probability frequency if there is a unique one, it's more informative to report a *credible region* which in this case is a range of the most probable frequencies that account for some fraction of the total probability like 80%. This can be explored in the simulation below.