Marginalization

Alexei Gilchrist

Variables are introduced and then some consequences of the sum-rule are explored.

1 Mutually exclusive and exhaustive

The extended sum-rule is

$\begin{equation*}P(A+B|I) = P(A|I)+P(B|I)-P(AB|I).\end{equation*}$
Imagine there are three possibilities $$A_1$$, $$A_2$$, and $$A_3$$. Then by repeatedly using the extended sum-rule we can expand $$P(A_1+A_2+A_3|I)$$,
\begin{align*}P(A_1+A_2+A_3|I) =& P(A_1|I) + P(A_2|I) + P(A_3|I) \\&- P(A_1A_2|I) - P(A_1A_3|I) - P(A_2A_3|I) + P(A_1A_2A_3|I).\end{align*}
Now if the background information $$I$$ implies the $$A_j$$ are mutually exclusive, then all the terms on the second line vanish. If $$I$$ implies the $$A_j$$ are exhaustive then $$P(A_1+A_2+A_3|I)=1$$, so we have
$\begin{equation*}P(A_1|I)+P(A_2|I)+P(A_3|I) = 1.\end{equation*}$
It’s not difficult to extend the example to an arbitrary set of mutually exclusive and exhaustive possibilities $$\{A_j\}$$ with the result
$$$\label{eq:sum1} \sum_j P(A_j|I) = 1.$$$

2 Variables

Up till now we have been considering single propositions that can be either true or false. In fact, we can consider the proposition $$A$$ to be a shorthand for $$(A\!=\!\text{True})$$ and $$\bar{A}$$ a shorthand for $$(A\!=\!\text{False})$$. It’s often convenient to talk about a set of related propositions, so instead of having an endless list $$(A_1\!=\!\text{True})$$, $$(A_2\!=\!\text{True})$$, $$(A_3\!=\!\text{True})$$ etc. We can group them all together in a single proposition $$A=A_1+A_2+A_3 \ldots$$ (where `$$+$$' signifies a logical or). Another way of looking at this set is to think of $$A$$ as a variable that can take on values from the set $$\{A_1,A_2,A_3,\ldots\}$$ and we are asking a sequence of questions “is $$A$$ equal to $$A_1$$” etc. Some examples,

1. $$\text{Coin}\in\{H,T\}$$,
2. $$\text{Die}\in\{1,2,3,4,5,6\}$$,
3. $$\text{Weather}\in\{\text{Raining},\text{Sunny},\text{Other}\}$$,
4. $$\text{Height}\in\{(h<1.6),(1.6\le h \le 1.8),(h>1.8)\}$$.

The set of possible values of the variable is the domain of the variable, and the values are the states, options, instances, or possibilities. I’ll try and stick with possibilities even though this is a non-standard term, as it is not overloaded with prior meaning. The possibilities can also form a countably infinite or continuous set, though taking this limit needs to be done with care. It is particularly useful if the possibilities are mutually exclusive (only one can be true at one time) and exhaustive (one of them must be true), but the way we have set up variables this need not be the case.

Where the context is clear, we’ll abbreviate the notation by just giving the possibility, e.g. $$P(A_1)\equiv P(A\!=\!A_1)$$, or use the variable name to stand for a whole set of relations, one for each combination of possibilities. So for example $$P(A|B) = P(A)P(B|A)/P(B)$$ stands for

\begin{align*}P(A_1|B_1) &= P(A_1)P(B_1|A_1)/P(B_1) \\ P(A_2|B_1) &= P(A_2)P(B_1|A_2)/P(B_1) \\ P(A_1|B_2) &= P(A_1)P(B_2|A_1)/P(B_2)\end{align*}
and so on. An expression like $$P(A)$$ is really a vector of probabilities for each possibility and $$P(A|B)$$ is a table.

Example

Say $$A=\{A_1,A_2\}$$, $$B=\{B_1,B_2\}$$, and $$C=\{C_1,C_2,C_3\}$$, then we might have:

\begin{align*}P(A) &= \begin{array}{c|c} A_1 & 0.2 \\ A_2 & 0.8 \end{array} \\ P(A|B) &= \begin{array}{c|cc} & B_1 & B_2 \\\hline A_1 & 0.1 & 0.5 \\ A_2 & 0.9 & 0.5 \end{array} \\ P(A|BC) &= \begin{array}{c|cccccc} & B_1C_1 & B_1C_2 & B_1C_3 & B_2C_1 & B_2C_2 & B_2C_3 \\\hline A_1 & 0.9 & 0.2 & 0.5 & 0.7 & 0.4 & 0.6 \\ A_2 & 0.1 & 0.8 & 0.5 & 0.3 & 0.6 & 0.4 \end{array}\end{align*}
Note that the columns add up to 1. This is a consequence of mutually exclusive and exhaustive possibilities as we’ll show next.

3 Marginalization

There are two ways Eq. (\ref{eq:sum1}) is typically used. First we can use it to remove nuisance variables—say we have a probability that depends on a number of variables e.g. $$P(ABC)$$, but we are interested in the dependence on only one of the variables (so the other dependancies are a nuisance). We can simply sum over the other variables:

$$$\sum_{BC}P(ABC) = \sum_{BC}P(BC|A)P(A) = \left(\sum_{BC}P(BC|A)\right)P(A) = P(A).$$$
The sum is over all the possibilities in the two sets. A more explicitly notation would be to write
$$$P(a) = \sum_{b\in B,c\in C}P(abc),$$$
but the meaning should be clear from context in the abbreviated form. If the possibilities are continuous then the sums become integrals
$$$P(A) = \int_{BC}P(ABC).$$$

This trick to eliminate variables from a joint probability also works with conditional probabilities, for example

$\begin{equation*}\sum_B P(AB|C) = \sum_B \frac{P(ABC)}{P(C)} = \frac{P(AC)}{P(C)} = P(A|C).\end{equation*}$

The other main way in which Eq. (\ref{eq:sum1}) is commonly used is to expand a probability out as a sum over conditional probabilities. For instance

\begin{align*}P(A|I) &= P(A|I)\sum_B P(B|AI) = \sum_B P(AB|I) \\ &= \sum_B P(A|BI)P(B|I).\end{align*}
The value in doing this expansion is that the conditional probabilities $$P(A|BI)$$ may be much easier to reason about compared to the probability $$P(A|I)$$.

Example

Say $$A\in\{A_T,A_F\}$$ represents the result of a test for a disease, and $$B\in\{B_T,B_F\}$$ is whether you have the disease or not. Then

$\begin{equation*}P(A|I) = P(A|B_TI)P(B_T|I) + P(A|B_FI)P(B_F|I).\end{equation*}$
The incidence in the population of the disease is $$P(B_T|I)$$ and of course $$P(B_F|I)=1-P(B_T|I)$$. The probability $$P(A_T|B_TI)$$ is the accuracy of the test (given you have the disease what is the probability that the test will say so), and $$P(A_T|B_FI)$$ is the probability of false positives (test reports you have the disease when you don’t). All these may be known for a test but the probabilities $$P(A_T|I)$$ or $$P(A_F|I)$$ may seem quite mysterious to calculate without expanding them out.

Note that since the variables have only true/false possibilities we could have written this example treating them as propositions instead of variables ($$A=$$ test indicates the disease, $$B=$$ have the disease):

$\begin{equation*}P(A|I) = P(A|BI)P(B|I) + P(A|\bar{B}I)P(\bar{B}|I).\end{equation*}$