30 March 2017
An event is a set of outcomes of a random process
A probability is a number between 0 and 1 assigned to an event.
We interpret a probability as
Formally, let \(\Omega\) the set of all possible outcomes. \(\Omega\) is called the state space.
An event is \(A \subseteq \Omega\)
Let \(\mathcal A\) is the power set (set of all subsets) of \(\Omega\)
If \(\Omega\) is countable, probability can be viewed as a function \(P:\mathcal A \rightarrow [0,1]\)
Interpretation doesn't work if \(\Omega\) is uncountable but still helps our intuition.
State space: \(\Omega = \{HH, HT, TH,TT\}\).
An example of an event: we throw a tails first so \(A = \{TH, TT\}\).
There are \(2^4 = 16\) possible events: \[\mathcal A = \{ \emptyset, \{ HH\}, \{HT \}, \{ HT\}, \{ TT \}, \{ HH, HT \}, \ldots \] \[ \ldots \{ HH, TH\}, \{ HH, TT\}, \{ HT,TH\}, \{ HT,TT\}, \{ TH,TT\}, \dots \] \[ \dots \{HH, HT, TH \}, \{HH, HT,TT \}, \{HH, TH,TT \}, \{HT, TH,TT \}, \Omega \}\]
Let \(A\) and \(B\) be events.
The event \(C\) that \(A\) and \(B\) occur is given by \(C = A \cap B\).
The event \(D\) that \(A\) or \(B\) occurs is given by \(D = A \cup B\).
The event \(A\) does not occur is given by \(A^c = \bar A = \Omega - A = \Omega \backslash A = \{ \omega \in \Omega : \omega \notin A \}\).
Usually write \(P(A, B)\) for \(P(A \cap B)\).
\(\Omega = \{1,2,3,4,5,6\}\)
Let \(A\) be the event that the roll is even: \(A = \{2,4,6 \}\)
Let \(B\) be the event that we roll a 3 or a 6: \(B = \{ 3, 6 \}\).
The event that \(A\) and \(B\) occur is \(A \cap B = \{ 6 \}\)
The event that \(A\) or \(B\) happens is \(A \cup B = \{ 2,3,4,6 \}\).
The event that \(A\) does not occur is \(A^c = \{ 1,3,5 \}\)
Any probability function satisfies:
If \(A\) and \(B\) are events and we know \(B\) occurred, what can we say about the probability of \(A\)?
Called the conditional probability of \(A\) given \(B\) and written \(P(A|B)\). We define \[ P(A|B) = \frac {P(A \cap B)}{P(B)}. \]
\(P(A|B)\) is only defined for \(P(B)>0\).
Rearranging definition, get factorisation \(P(A,B) = P(A|B)P(B) = P(B|A)P(B)\) and \[P(A_1,\ldots, A_k) = P(A_1|A_2,\ldots, A_k)P(A_2|A_3,\ldots, A_k)\ldots P(A_{k-1}|A_k)P(A_k) . \]
Let \(A = \{2\}\) be the event that 2 is rolled
\(B = \{2,4,6\}\) be the event that the roll is an even number.
Probability that the roll is a two given that we know it is even is just \(P(A|B)\).
Have \(P(B) = 1/2\) and \(P(A \cap B) = P(A) = 1/6\) since \(A \cap B = A\).
Thus \[ P(A|B) = \frac {P(A \cap B)}{P(B)} = \frac{1/6}{1/2} = \frac 1 3. \]
Events \(A\) and \(B\) are independent when \(P(A \cap B) = P(A,B) = P(A)P(B)\).
\(A\) and \(B\) are independent if and only if \(P(A|B) = P(A)\) and \(P(B|A) = P(B)\).
Bayes' Theorem states \[ P(B|A) = \frac{P(A|B)P(B)}{P(A)}. \]
It is simple to derive from the definition of conditional probability.
It tells us how the forward probability \(P(A|B)\) is related to the backward probability \(P(B|A)\).
This relationship is crucial to statistical inference.
A random variable \(X\) is a variable whose value results from a measurement of some random event
Denote random variables by capital letters: \(X, Y, Z\) etc.
Lower case letters \(x, y, z\) etc. denote particular observations or realisations of the random variable.
\(X = x\) is the event that the random variable \(X\) takes the particular value \(x\). Will look at \(P(X = x)\), the probability of this event.
Often abbreviate random variable as r.v. or rv
Discrete variable takes a finite or countably infinite number of values
Examples:
Continuous variable takes uncountably many different values,
Examples:
2 random trees:
The shape (or topology) of the tree is discrete
The length of the branches in the tree are continuous
## CGGTCACATGATACCCGATCAGTTACTGCGAGCTAGCGCCGCAACTGTACAACAATCCTGTCCAA
## GCCGGTCTTGACCTCGTAGTGTGACTGTATGCGTTCCAAGTTGTTAGGTACTCCGAGTGTCAAAT
## TACCGCCATATATTCGATAGTAGATCATGAGACATCCAAATACAGCCCCGGGCAGCCGGTAGTGG
## AGAATTACACGGTATTCGACTTCTATGACCGTTCGATAAGTGTCTCCGTTTAGACCTAATGGCAC
## TATGATAAGGGTATGTAGGTAAGCACTTCGGTTAGGTATTTAACGGAGAGTGTAGGACGCCTCGC
## GTATAGAGCCTCCACGGTTGGGCCACACCATTTTTACTCTGTTTCGCTACAGCGGATATTTGATT
## AGTACGAAAAAAAAGAACCGCCACGTTTCGGGTAGGCACGAAAACCTTGAATAACCGACAGAGGC
A probability distribution function or probability mass function \(f(x)\) tells us the probability of a random variable taking a particular value \(x\).
\(P(X=x) = f(x)\) is the probability that \(X = x\).
Often write \(p_x\) for \(f(x)\).
\(x\) | \(P(X = x)\) |
---|---|
1 | 0.2 |
2 | 0.3 |
3 | 0.1 |
4 | 0.1 |
5 | 0.15 |
6 | 0.15 |
Or in functional form e.g. \(f(x) = 1/6\) for \(x \in 1,\ldots,6\)
For a discrete random variable, the value of the pdf is always between 0 and 1.
For a continuous random variable \(X\), for any exact value \(x\), \(P(X = x) = 0\).
Look instead at probability of falling in an interval, \(Pr(a \leq X \leq b)\)
Define the probability density function, \(p_X(x)\) (also written \(f_X(x)\) or \(f(x)\)).
Get \[Pr(a \leq X \leq b) = \int_a^b p_X(x) dx.\]
curve(dexp(x,2),0,5,main = 'Exponentially distributed r.v.',ylab = "f(x)")
curve(dbeta(x,2,5),main = "Beta distributed r.v.",ylab = "f(x)")
\(P(a \leq X \leq b) = \int_a^b p_X(x) dx\) is the area under \(p_X(x)\) between \(x = a\) and \(x = b\).
The cumulative distribution function (cdf or just "the distribution") is defined for discrete and continuous random variables by \[F(x) = P(X \leq x)\]
\(F(x)\) is monotonically increasing from 0 to 1.
For continuous random variables, the cdf is continuous,
For discrete random variables, the cdf is a step function with dis-continuities.
For example, for a normally distributed random variable, the density function is on the left, the distribution is on the right
The value of the cdf at \(x\) is the integral of the pdf from \(-\infty\) to \(x\):
This is just the area under the left-tail up to \(x\).
Extend directly to multiple random variables:
The joint probability density function of \(n\) random variables \(X_1,\ldots,X_n\) takes \(n\) arguments, \(p_{X_1,\ldots,X_n}(x_1,\ldots,x_n)\)
\(p_{X_1,\ldots,X_n}\) is real-valued, non-negative and normalised.
The probability that the point \((X_1,\ldots,X_n)\) lies in some region is just the multiple integral over that region.
Given a joint pdf, get the pdf for a subset of the variables by integrating over the ones not in the subset.
For example, given the joint density for \(X\) and\(Y\), what is the density for \(X\) alone?
It is \[ p_X(x) = \int_{-\infty}^{\infty} p_{XY}(x,y) dy.\]
This process is called marginalization.
For a discrete random variable, we replace the integral with a sum:
\[ P(x) = \sum_{y \in \mathcal Y} P(x,y) \]
The expectation gives us an idea of the central value of the variable.
The expected value of a random variable \(X\) is \(E[X]\)
It is commonly called the average or mean
\[ E[X] = \int_{-\infty}^{\infty} x p_X(x) dx. \]
For discrete random variables, the integrals replaced by a sum \[E[X] = \sum_{x \in {\mathcal X}} x p_x.\]
The symbol \(\mu\) is often used for the mean.
An exponentially distributed variable \(X\) with parameter \(\lambda\) has density function \(f(x) = \lambda e^{-\lambda x}\).
So \(E[X] = \int_{-\infty}^{\infty} x f(x) dx = \int_{-\infty}^{\infty} x \lambda e^{-\lambda x} dx =\)(some work) $ = ^{-1}$.
E.g., when \(\lambda = 2\), the mean is \(1/\lambda = 0.5\).
Example: When \(X\) is discrete and uniform over \(p_X(x) = 1/6\) for \(x \in 1,\ldots,6\)
\(E[X] = \sum_{x\in \mathcal X} x p_X(x) = \sum_{x = 1}^6 x 1/6 = 21/6 = 3.5\)
Variance measures the spread of a random variable about its mean.
Defined by \[ \mathrm{Var}(X) = E[ (X - E[X])^2] = E[X^2] - (E[X])^2\]
Variance is the expected value of the square of the distance of the random variable from its mean.
\(\mathrm{Var}(X) \geq 0\) always.
Saw above that \(E[X] = 3.5\).
\(E[X^2] = \sum_{x = 1}^6 x^2 1/6 = 91/6 = 15.1667\)
So
\(Var(X) = E[X^2] - (E[X])^2 = 15.1667 - 3.5^2 = 2.9167\).
A function of a random variable is simply a function that takes as it input a random variable.
Let \(g(X)\) be a function of the random variable \(X\).
Then \(g\) is a random variable and can be treated as such.
For example: the expectation of \(g(X)\) is \(E[G] = \int_{-\infty}^{\infty} g(x) p_X(x) dx\).