Written by Alexei Gilchrist, updated months ago
When we have several variables, the joint probability distribution over all the variables has all the information we need to calculate any simpler probability distribution. The joint probability distribution can be expressed efficiently if there are independence relationships between variables.
Level: 2, Subjects: Bayesian Networks Probability

1 Joint Probability Distributions

If we have several variables, say \(A\), \(B\), and \(C\), then the joint probability distribution over all the variables \(P(A B C)\) has all the information we need to calculate simpler distributions like the marginals \(P(A B)\) or \(P(B)\), and any conditional probability distributions like \(P(A|B)\) or \(P(B|AC)\).

Given the joint probability distribution over a set of variables, we can calculate any other probability distribution over fewer variables by using marginalisation.

To be completed

2 Independence

2.1 Propositions

For two propositions \(A\) and \(B\) and some background information \(I\) as a proposition, we say \(A\) is independent of \(B\) given \(I\), written in shorthand as \(A \perp B | I\) if any of the following hold:

  1. \(P(A B|I) = P(A|I)P(B|I)\)
  2. \(P(A|BI) = P(A|I)\)
  3. \(P(B|AI) = P(B|I)\)

Note that from a Bayesian perspective there is always background information to a problem so all probabilities are conditional probabilities.

Now we'll introduce a simplification to the notation that may lead to confusion. Since some background information \(I\) is always present and often it's the same background information when discussing some problem it gets tedious to always write it so we'll omit it. So the previous independence conditions would be written as \(A \perp B\) if \(P(A B) = P(A)P(B)\) or \(P(A|B) = P(A)\) or \(P(B|A) = P(B)\). You may find that some authors call this `unconditional' independence as opposed to the `conditional' independence earlier but for us this distinction is moot, we are just lazy in our notation.

2.2 Variables

Extending the notion of independence to variables, we can define the following. If we have the variables \(A\), \(B\), \(C\) (that is sets of propositions so that \(A\in \{A_{1}\ldots A_{a}\}\), \(B\in \{B_{1}\ldots B_{b}\}\), and \(C\in \{C_{1}\ldots C_{c}\}\)) we say that variable \(A\) is independent of variable \(B\), written as \(A \perp B\) if any of the following hold for all \(A_{i}\) and \(B_{j}\):

  1. \(P(A B) = P(A)P(B)\)
  2. \(P(A|B) = P(A)\)
  3. \(P(B|A) = P(B)\).

Also, we define the conditional independence of \(A\) and \(B\) given variable \(C\), written as \(A\perp B|C\), if any of the following hold for all \(A_{i}\), \(B_{j}\) and \(C_{k}\):

  1. \(P(A B|C) = P(A|C)P(B|C)\)
  2. \(P(A|BC) = P(A|C)\)
  3. \(P(B|AC) = P(B|C)\).
In this situation, variable \(C\) is said to be `observed'.

2.3 Mixed

3 Factorisation of joint probability distribution