30 March 2017

Basics of probability

Some definitions

An event is a set of outcomes of a random process

A probability is a number between 0 and 1 assigned to an event.

We interpret a probability as

  • the chance of the event occurring
  • the degree of plausibility we place on the event occurring or having occurred.

Formally, let \(\Omega\) the set of all possible outcomes. \(\Omega\) is called the state space.

An event is \(A \subseteq \Omega\)

Let \(\mathcal A\) is the power set (set of all subsets) of \(\Omega\)

If \(\Omega\) is countable, probability can be viewed as a function \(P:\mathcal A \rightarrow [0,1]\)

Interpretation doesn't work if \(\Omega\) is uncountable but still helps our intuition.

Example: tossing a coin twice

State space: \(\Omega = \{HH, HT, TH,TT\}\).

An example of an event: we throw a tails first so \(A = \{TH, TT\}\).

There are \(2^4 = 16\) possible events: \[\mathcal A = \{ \emptyset, \{ HH\}, \{HT \}, \{ HT\}, \{ TT \}, \{ HH, HT \}, \ldots \] \[ \ldots \{ HH, TH\}, \{ HH, TT\}, \{ HT,TH\}, \{ HT,TT\}, \{ TH,TT\}, \dots \] \[ \dots \{HH, HT, TH \}, \{HH, HT,TT \}, \{HH, TH,TT \}, \{HT, TH,TT \}, \Omega \}\]

Operations on events

Let \(A\) and \(B\) be events.

The event \(C\) that \(A\) and \(B\) occur is given by \(C = A \cap B\).

The event \(D\) that \(A\) or \(B\) occurs is given by \(D = A \cup B\).

The event \(A\) does not occur is given by \(A^c = \bar A = \Omega - A = \Omega \backslash A = \{ \omega \in \Omega : \omega \notin A \}\).

Usually write \(P(A, B)\) for \(P(A \cap B)\).

Example: Rolling a die

\(\Omega = \{1,2,3,4,5,6\}\)

Let \(A\) be the event that the roll is even: \(A = \{2,4,6 \}\)

Let \(B\) be the event that we roll a 3 or a 6: \(B = \{ 3, 6 \}\).

The event that \(A\) and \(B\) occur is \(A \cap B = \{ 6 \}\)

The event that \(A\) or \(B\) happens is \(A \cup B = \{ 2,3,4,6 \}\).

The event that \(A\) does not occur is \(A^c = \{ 1,3,5 \}\)

Axioms of probability

Any probability function satisfies:

  1. \(P(\Omega) = 1\). That is, the total probability is 1.
  2. \(0 \leq P(A) \leq 1\) for any \(A \subseteq \Omega\). The probability of any event is non-negative and less than or equal to 1.
  3. If \(A_1, A_2, \ldots\) are mutually disjoint events (i.e., \(A_i \cap A_j = \emptyset\) if \(i \neq j\)) then \[ P\left( \bigcup_i A_i \right) = \sum_i P(A_i). \]

Conditional probability

If \(A\) and \(B\) are events and we know \(B\) occurred, what can we say about the probability of \(A\)?

Called the conditional probability of \(A\) given \(B\) and written \(P(A|B)\). We define \[ P(A|B) = \frac {P(A \cap B)}{P(B)}. \]

\(P(A|B)\) is only defined for \(P(B)>0\).

Rearranging definition, get factorisation \(P(A,B) = P(A|B)P(B) = P(B|A)P(B)\) and \[P(A_1,\ldots, A_k) = P(A_1|A_2,\ldots, A_k)P(A_2|A_3,\ldots, A_k)\ldots P(A_{k-1}|A_k)P(A_k) . \]

Example: Roll a fair die and record the value

Let \(A = \{2\}\) be the event that 2 is rolled

\(B = \{2,4,6\}\) be the event that the roll is an even number.

Probability that the roll is a two given that we know it is even is just \(P(A|B)\).

Have \(P(B) = 1/2\) and \(P(A \cap B) = P(A) = 1/6\) since \(A \cap B = A\).

Thus \[ P(A|B) = \frac {P(A \cap B)}{P(B)} = \frac{1/6}{1/2} = \frac 1 3. \]

Independence

Events \(A\) and \(B\) are independent when \(P(A \cap B) = P(A,B) = P(A)P(B)\).

\(A\) and \(B\) are independent if and only if \(P(A|B) = P(A)\) and \(P(B|A) = P(B)\).

Bayes' Theorem

Bayes' Theorem states \[ P(B|A) = \frac{P(A|B)P(B)}{P(A)}. \]

It is simple to derive from the definition of conditional probability.

It tells us how the forward probability \(P(A|B)\) is related to the backward probability \(P(B|A)\).

This relationship is crucial to statistical inference.

Random variables

Definition

A random variable \(X\) is a variable whose value results from a measurement of some random event

Denote random variables by capital letters: \(X, Y, Z\) etc.

Lower case letters \(x, y, z\) etc. denote particular observations or realisations of the random variable.

\(X = x\) is the event that the random variable \(X\) takes the particular value \(x\). Will look at \(P(X = x)\), the probability of this event.

Often abbreviate random variable as r.v. or rv

Discrete vs continuous random variables

Discrete variable takes a finite or countably infinite number of values

Examples:

  • a count: (0,1,2,3,…)
  • a fixed number of outcomes (heads or tails, the DNA bases {A,C,G,T }).
  • Number of people in a room at 5 times during a day: 6, 15, 12, 23, 18

Discrete vs continuous random variables

Continuous variable takes uncountably many different values,

Examples:

  • any number in interval \([0,1]\)
  • any real number \(>0\) or any number in \(\mathbb R\).
  • 10 men measured at random (height in cm): 173.466, 174.931, 179.707, 183.739, 181.596, 172.897, 172.629, 183.328, 167.799, 178.733

Random variables could be any object

2 random trees:

The shape (or topology) of the tree is discrete

The length of the branches in the tree are continuous

2 random networks:

A set of random sequences:

## CGGTCACATGATACCCGATCAGTTACTGCGAGCTAGCGCCGCAACTGTACAACAATCCTGTCCAA
## GCCGGTCTTGACCTCGTAGTGTGACTGTATGCGTTCCAAGTTGTTAGGTACTCCGAGTGTCAAAT
## TACCGCCATATATTCGATAGTAGATCATGAGACATCCAAATACAGCCCCGGGCAGCCGGTAGTGG
## AGAATTACACGGTATTCGACTTCTATGACCGTTCGATAAGTGTCTCCGTTTAGACCTAATGGCAC
## TATGATAAGGGTATGTAGGTAAGCACTTCGGTTAGGTATTTAACGGAGAGTGTAGGACGCCTCGC
## GTATAGAGCCTCCACGGTTGGGCCACACCATTTTTACTCTGTTTCGCTACAGCGGATATTTGATT
## AGTACGAAAAAAAAGAACCGCCACGTTTCGGGTAGGCACGAAAACCTTGAATAACCGACAGAGGC

Probability distributions and Probablity Density Functions

For discrete random variables

A probability distribution function or probability mass function \(f(x)\) tells us the probability of a random variable taking a particular value \(x\).

\(P(X=x) = f(x)\) is the probability that \(X = x\).

Often write \(p_x\) for \(f(x)\).

Can display as a table e.g.,

\(x\) \(P(X = x)\)
1 0.2
2 0.3
3 0.1
4 0.1
5 0.15
6 0.15

Or in functional form e.g. \(f(x) = 1/6\) for \(x \in 1,\ldots,6\)

Can plot pdfs

For a discrete random variable, the value of the pdf is always between 0 and 1.

Continuous random variables: probability density functions

For a continuous random variable \(X\), for any exact value \(x\), \(P(X = x) = 0\).

Look instead at probability of falling in an interval, \(Pr(a \leq X \leq b)\)

Define the probability density function, \(p_X(x)\) (also written \(f_X(x)\) or \(f(x)\)).

Get \[Pr(a \leq X \leq b) = \int_a^b p_X(x) dx.\]

Properties of the probability density function

  • \(p_X(x) >= 0\) for all \(x\).
  • \(p_X(x)\) is normalised: \(\int_{-\infty}^{\infty} p_x(x) dx = 1.\) That is, total probability is 1.
  • \(p_X(x)\) may be greater than 1 for a given value of \(x\)

Example: Density for Exponential distribution

curve(dexp(x,2),0,5,main = 'Exponentially distributed r.v.',ylab = "f(x)")

Example: Density for Beta distribution

curve(dbeta(x,2,5),main = "Beta distributed r.v.",ylab = "f(x)")

\(P(a \leq X \leq b) = \int_a^b p_X(x) dx\) is the area under \(p_X(x)\) between \(x = a\) and \(x = b\).

Cumulative distribution function

The cumulative distribution function (cdf or just "the distribution") is defined for discrete and continuous random variables by \[F(x) = P(X \leq x)\]

\(F(x)\) is monotonically increasing from 0 to 1.

For continuous random variables, the cdf is continuous,

For discrete random variables, the cdf is a step function with dis-continuities.

For example, for a normally distributed random variable, the density function is on the left, the distribution is on the right

The value of the cdf at \(x\) is the integral of the pdf from \(-\infty\) to \(x\):

This is just the area under the left-tail up to \(x\).

Multiple random variables and joint probablity density function

Extend directly to multiple random variables:

The joint probability density function of \(n\) random variables \(X_1,\ldots,X_n\) takes \(n\) arguments, \(p_{X_1,\ldots,X_n}(x_1,\ldots,x_n)\)

\(p_{X_1,\ldots,X_n}\) is real-valued, non-negative and normalised.

The probability that the point \((X_1,\ldots,X_n)\) lies in some region is just the multiple integral over that region.

A joint proability density function can be visualised as a contour plot…

…or a 3d plot:

Marginalization

Given a joint pdf, get the pdf for a subset of the variables by integrating over the ones not in the subset.

For example, given the joint density for \(X\) and\(Y\), what is the density for \(X\) alone?

It is \[ p_X(x) = \int_{-\infty}^{\infty} p_{XY}(x,y) dy.\]

This process is called marginalization.

For a discrete random variable, we replace the integral with a sum:

\[ P(x) = \sum_{y \in \mathcal Y} P(x,y) \]

Expectation and variance

Expectation (or mean)

The expectation gives us an idea of the central value of the variable.

The expected value of a random variable \(X\) is \(E[X]\)

It is commonly called the average or mean

\[ E[X] = \int_{-\infty}^{\infty} x p_X(x) dx. \]

For discrete random variables, the integrals replaced by a sum \[E[X] = \sum_{x \in {\mathcal X}} x p_x.\]

The symbol \(\mu\) is often used for the mean.

Example: mean of an Exponentially distributed random variable

An exponentially distributed variable \(X\) with parameter \(\lambda\) has density function \(f(x) = \lambda e^{-\lambda x}\).

So \(E[X] = \int_{-\infty}^{\infty} x f(x) dx = \int_{-\infty}^{\infty} x \lambda e^{-\lambda x} dx =\)(some work) $ = ^{-1}$.

E.g., when \(\lambda = 2\), the mean is \(1/\lambda = 0.5\).

Example: PDF of a Beta distributed random variable showing the mean

Example: When \(X\) is discrete and uniform over \(p_X(x) = 1/6\) for \(x \in 1,\ldots,6\)

\(E[X] = \sum_{x\in \mathcal X} x p_X(x) = \sum_{x = 1}^6 x 1/6 = 21/6 = 3.5\)

Variance

Variance measures the spread of a random variable about its mean.

Defined by \[ \mathrm{Var}(X) = E[ (X - E[X])^2] = E[X^2] - (E[X])^2\]

Variance is the expected value of the square of the distance of the random variable from its mean.

\(\mathrm{Var}(X) \geq 0\) always.

Example: When \(X\) is discrete and uniform over \(p_X(x) = 1/6\)

Saw above that \(E[X] = 3.5\).

\(E[X^2] = \sum_{x = 1}^6 x^2 1/6 = 91/6 = 15.1667\)

So

\(Var(X) = E[X^2] - (E[X])^2 = 15.1667 - 3.5^2 = 2.9167\).

Functions of a random varaible

A function of a random variable is simply a function that takes as it input a random variable.

Let \(g(X)\) be a function of the random variable \(X\).

Then \(g\) is a random variable and can be treated as such.

For example: the expectation of \(g(X)\) is \(E[G] = \int_{-\infty}^{\infty} g(x) p_X(x) dx\).