Probability permeates much of physics. It appears in quantifying errors in every measurement, in the dynamics of stochastic processes, in statistical mechanics as a way of coping with the vast amount of variables, and even intrinsically in quantum mechanics. At first glance the use of probability may seem natural and even `obvious', but things get much more lively when you realise that it is still not settled what a probability is. Different interpretations of probability affect the meaning of all the areas that it touches.
This material supports a second and third year advanced physics unit at Macquarie University.
In these notes I will take the view that probabilities are a measure of plausibility and probability theory is the extension of deductive logic to incomplete information. This view follows Laplace, Jeffreys, Cox, and Jaynes.
Variables are introduced and then some consequences of the sum-rule are explored.
Some consequences of the product rule are explored including the famous Bayes' rule.
How do you assign actual values to probabilities?
Given that we are interpreting probability as a measure of plausibility, just what is the relationship of probabilities to frequencies?
Model comparison is one of the principal tasks on inference. Given some data, how does the plausibility of different models change? Does the data select out a particular model as being better?
making statements about whole families of logical propositions
Another of the key tasks of inference is to determine the value of a parameter in a model on the basis of observed data
A closer look at prior distributions
An argument to maximising the entropy
Find the probability distribution that maximises the entropy subject to requiring some averages to be fixed.
When we have several variables, the joint probability distribution over all the variables has all the information we need to calculate any simpler probability distribution. The joint probability distribution can be expressed efficiently if there are independence relationships between variables.
A factorisation of the joint probability distribution can be represented in a graph known as a Bayesian Network. These networks codify independencies between variables and information flow as variables are measured.
A definition of information is introduced which leads to yet another connection with entropy.
We examine the problem of optimally encoding a set of symbols in some alphabet to reduce the average length of the code.
Quick introduction to convex sets, convex functions and Jensen's inequality
The joint entropy and its properties
The relative entropy, or Kullback-Leibler divergence is a measure of the difference of two distributions
Communication through memory-less static channels will be examined, and in particular using repetition codes to counteract the errors introduced by the channel.