CS 369 Introduction to probability

30 March 2017

Basics of probability

Some definitions

An event is a set of outcomes of a random process

A probability is a number between 0 and 1 assigned to an event.

We interpret a probability as

the chance of the event occurring
the degree of plausibility we place on the event occurring or having occurred.

Formally, let $\Omega$ the set of all possible outcomes. $\Omega$ is called the state space.

An event is $A \subseteq \Omega$

Let $\mathcal A$ is the power set (set of all subsets) of $\Omega$

If $\Omega$ is countable, probability can be viewed as a function $P:\mathcal A \rightarrow [0,1]$

Interpretation doesn't work if $\Omega$ is uncountable but still helps our intuition.

Example: tossing a coin twice

State space: $\Omega = \{HH, HT, TH,TT\}$.

An example of an event: we throw a tails first so $A = \{TH, TT\}$.

There are $2^4 = 16$ possible events: \[\mathcal A = \{ \emptyset, \{ HH\}, \{HT \}, \{ HT\}, \{ TT \}, \{ HH, HT \}, \ldots \] \[ \ldots \{ HH, TH\}, \{ HH, TT\}, \{ HT,TH\}, \{ HT,TT\}, \{ TH,TT\}, \dots \] \[ \dots \{HH, HT, TH \}, \{HH, HT,TT \}, \{HH, TH,TT \}, \{HT, TH,TT \}, \Omega \}\]

Operations on events

Let $A$ and $B$ be events.

The event $C$ that $A$ and $B$ occur is given by $C = A \cap B$.

The event $D$ that $A$ or $B$ occurs is given by $D = A \cup B$.

The event $A$ does not occur is given by $A^c = \bar A = \Omega - A = \Omega \backslash A = \{ \omega \in \Omega : \omega \notin A \}$.

Usually write $P(A, B)$ for $P(A \cap B)$.

Example: Rolling a die

$\Omega = \{1,2,3,4,5,6\}$

Let $A$ be the event that the roll is even: $A = \{2,4,6 \}$

Let $B$ be the event that we roll a 3 or a 6: $B = \{ 3, 6 \}$.

The event that $A$ and $B$ occur is $A \cap B = \{ 6 \}$

The event that $A$ or $B$ happens is $A \cup B = \{ 2,3,4,6 \}$.

The event that $A$ does not occur is $A^c = \{ 1,3,5 \}$

Axioms of probability

Any probability function satisfies:

$P(\Omega) = 1$. That is, the total probability is 1.
$0 \leq P(A) \leq 1$ for any $A \subseteq \Omega$. The probability of any event is non-negative and less than or equal to 1.
If $A_1, A_2, \ldots$ are mutually disjoint events (i.e., $A_i \cap A_j = \emptyset$ if $i \neq j$) then \[ P\left( \bigcup_i A_i \right) = \sum_i P(A_i). \]

Conditional probability

If $A$ and $B$ are events and we know $B$ occurred, what can we say about the probability of $A$?

Called the conditional probability of $A$ given $B$ and written $P(A|B)$. We define \[ P(A|B) = \frac {P(A \cap B)}{P(B)}. \]

$P(A|B)$ is only defined for $P(B)>0$.

Rearranging definition, get factorisation $P(A,B) = P(A|B)P(B) = P(B|A)P(B)$ and \[P(A_1,\ldots, A_k) = P(A_1|A_2,\ldots, A_k)P(A_2|A_3,\ldots, A_k)\ldots P(A_{k-1}|A_k)P(A_k) . \]

Example: Roll a fair die and record the value

Let $A = \{2\}$ be the event that 2 is rolled

$B = \{2,4,6\}$ be the event that the roll is an even number.

Probability that the roll is a two given that we know it is even is just $P(A|B)$.

Have $P(B) = 1/2$ and $P(A \cap B) = P(A) = 1/6$ since $A \cap B = A$.

Thus \[ P(A|B) = \frac {P(A \cap B)}{P(B)} = \frac{1/6}{1/2} = \frac 1 3. \]

Independence

Events $A$ and $B$ are independent when $P(A \cap B) = P(A,B) = P(A)P(B)$.

$A$ and $B$ are independent if and only if $P(A|B) = P(A)$ and $P(B|A) = P(B)$.

Bayes' Theorem

Bayes' Theorem states \[ P(B|A) = \frac{P(A|B)P(B)}{P(A)}. \]

It is simple to derive from the definition of conditional probability.

It tells us how the forward probability $P(A|B)$ is related to the backward probability $P(B|A)$.

This relationship is crucial to statistical inference.

Random variables

Definition

A random variable $X$ is a variable whose value results from a measurement of some random event

Denote random variables by capital letters: $X, Y, Z$ etc.

Lower case letters $x, y, z$ etc. denote particular observations or realisations of the random variable.

$X = x$ is the event that the random variable $X$ takes the particular value $x$. Will look at $P(X = x)$, the probability of this event.

Often abbreviate random variable as r.v. or rv

Discrete vs continuous random variables

Discrete variable takes a finite or countably infinite number of values

Examples:

a count: (0,1,2,3,…)
a fixed number of outcomes (heads or tails, the DNA bases {A,C,G,T }).
Number of people in a room at 5 times during a day: 6, 15, 12, 23, 18

Discrete vs continuous random variables

Continuous variable takes uncountably many different values,

Examples:

any number in interval $[0,1]$
any real number $>0$ or any number in $\mathbb R$.
10 men measured at random (height in cm): 173.466, 174.931, 179.707, 183.739, 181.596, 172.897, 172.629, 183.328, 167.799, 178.733

Random variables could be any object

2 random trees:

The shape (or topology) of the tree is discrete

The length of the branches in the tree are continuous

2 random networks:

A set of random sequences:

## CGGTCACATGATACCCGATCAGTTACTGCGAGCTAGCGCCGCAACTGTACAACAATCCTGTCCAA

## GCCGGTCTTGACCTCGTAGTGTGACTGTATGCGTTCCAAGTTGTTAGGTACTCCGAGTGTCAAAT

## TACCGCCATATATTCGATAGTAGATCATGAGACATCCAAATACAGCCCCGGGCAGCCGGTAGTGG

## AGAATTACACGGTATTCGACTTCTATGACCGTTCGATAAGTGTCTCCGTTTAGACCTAATGGCAC

## TATGATAAGGGTATGTAGGTAAGCACTTCGGTTAGGTATTTAACGGAGAGTGTAGGACGCCTCGC

## GTATAGAGCCTCCACGGTTGGGCCACACCATTTTTACTCTGTTTCGCTACAGCGGATATTTGATT

## AGTACGAAAAAAAAGAACCGCCACGTTTCGGGTAGGCACGAAAACCTTGAATAACCGACAGAGGC

Probability distributions and Probablity Density Functions

For discrete random variables

A probability distribution function or probability mass function $f(x)$ tells us the probability of a random variable taking a particular value $x$.

$P(X=x) = f(x)$ is the probability that $X = x$.

Often write $p_x$ for $f(x)$.

Can display as a table e.g.,

$x$	$P(X = x)$
1	0.2
2	0.3
3	0.1
4	0.1
5	0.15
6	0.15

Or in functional form e.g. $f(x) = 1/6$ for $x \in 1,\ldots,6$

Can plot pdfs

For a discrete random variable, the value of the pdf is always between 0 and 1.

Continuous random variables: probability density functions

For a continuous random variable $X$, for any exact value $x$, $P(X = x) = 0$.

Look instead at probability of falling in an interval, $Pr(a \leq X \leq b)$

Define the probability density function, $p_X(x)$ (also written $f_X(x)$ or $f(x)$).

Get \[Pr(a \leq X \leq b) = \int_a^b p_X(x) dx.\]

Properties of the probability density function

$p_X(x) >= 0$ for all $x$.
$p_X(x)$ is normalised: $\int_{-\infty}^{\infty} p_x(x) dx = 1.$ That is, total probability is 1.
$p_X(x)$ may be greater than 1 for a given value of $x$

Example: Density for Exponential distribution

curve(dexp(x,2),0,5,main = 'Exponentially distributed r.v.',ylab = "f(x)")

Example: Density for Beta distribution

curve(dbeta(x,2,5),main = "Beta distributed r.v.",ylab = "f(x)")

$P(a \leq X \leq b) = \int_a^b p_X(x) dx$ is the area under $p_X(x)$ between $x = a$ and $x = b$.

Cumulative distribution function

The cumulative distribution function (cdf or just "the distribution") is defined for discrete and continuous random variables by \[F(x) = P(X \leq x)\]

$F(x)$ is monotonically increasing from 0 to 1.

For continuous random variables, the cdf is continuous,

For discrete random variables, the cdf is a step function with dis-continuities.

For example, for a normally distributed random variable, the density function is on the left, the distribution is on the right

The value of the cdf at $x$ is the integral of the pdf from $-\infty$ to $x$:

This is just the area under the left-tail up to $x$.

Multiple random variables and joint probablity density function

Extend directly to multiple random variables:

The joint probability density function of $n$ random variables $X_1,\ldots,X_n$ takes $n$ arguments, $p_{X_1,\ldots,X_n}(x_1,\ldots,x_n)$

$p_{X_1,\ldots,X_n}$ is real-valued, non-negative and normalised.

The probability that the point $(X_1,\ldots,X_n)$ lies in some region is just the multiple integral over that region.

A joint proability density function can be visualised as a contour plot…

…or a 3d plot:

Marginalization

Given a joint pdf, get the pdf for a subset of the variables by integrating over the ones not in the subset.

For example, given the joint density for $X$ and$Y$, what is the density for $X$ alone?

It is \[ p_X(x) = \int_{-\infty}^{\infty} p_{XY}(x,y) dy.\]

This process is called marginalization.

For a discrete random variable, we replace the integral with a sum:

\[ P(x) = \sum_{y \in \mathcal Y} P(x,y) \]

Expectation and variance

Expectation (or mean)

The expectation gives us an idea of the central value of the variable.

The expected value of a random variable $X$ is $E[X]$

It is commonly called the average or mean

\[ E[X] = \int_{-\infty}^{\infty} x p_X(x) dx. \]

For discrete random variables, the integrals replaced by a sum \[E[X] = \sum_{x \in {\mathcal X}} x p_x.\]

The symbol $\mu$ is often used for the mean.

Example: mean of an Exponentially distributed random variable

An exponentially distributed variable $X$ with parameter $\lambda$ has density function $f(x) = \lambda e^{-\lambda x}$.

So $E[X] = \int_{-\infty}^{\infty} x f(x) dx = \int_{-\infty}^{\infty} x \lambda e^{-\lambda x} dx =$(some work) $ = ^{-1}$.

E.g., when $\lambda = 2$, the mean is $1/\lambda = 0.5$.

Example: PDF of a Beta distributed random variable showing the mean

Example: When $X$ is discrete and uniform over $p_X(x) = 1/6$ for $x \in 1,\ldots,6$

$E[X] = \sum_{x\in \mathcal X} x p_X(x) = \sum_{x = 1}^6 x 1/6 = 21/6 = 3.5$

Variance

Variance measures the spread of a random variable about its mean.

Defined by \[ \mathrm{Var}(X) = E[ (X - E[X])^2] = E[X^2] - (E[X])^2\]

Variance is the expected value of the square of the distance of the random variable from its mean.

$\mathrm{Var}(X) \geq 0$ always.

Example: When $X$ is discrete and uniform over $p_X(x) = 1/6$

Saw above that $E[X] = 3.5$.

$E[X^2] = \sum_{x = 1}^6 x^2 1/6 = 91/6 = 15.1667$

$Var(X) = E[X^2] - (E[X])^2 = 15.1667 - 3.5^2 = 2.9167$.

Functions of a random varaible

A function of a random variable is simply a function that takes as it input a random variable.

Let $g(X)$ be a function of the random variable $X$.

Then $g$ is a random variable and can be treated as such.

For example: the expectation of $g(X)$ is $E[G] = \int_{-\infty}^{\infty} g(x) p_X(x) dx$.

Basics of probability

Some definitions

Example: tossing a coin twice

Operations on events

Example: Rolling a die

Axioms of probability

Conditional probability

Example: Roll a fair die and record the value

Independence

Bayes' Theorem

Random variables

Definition

Discrete vs continuous random variables

Discrete vs continuous random variables

Random variables could be any object

2 random networks:

A set of random sequences:

Probability distributions and Probablity Density Functions

For discrete random variables

Can display as a table e.g.,

Can plot pdfs

Continuous random variables: probability density functions

Properties of the probability density function

Example: Density for Exponential distribution

Example: Density for Beta distribution

Cumulative distribution function

Multiple random variables and joint probablity density function

A joint proability density function can be visualised as a contour plot…

…or a 3d plot:

Marginalization

Expectation and variance

Expectation (or mean)

Example: mean of an Exponentially distributed random variable

Example: PDF of a Beta distributed random variable showing the mean

Variance

Example: When \(X\) is discrete and uniform over \(p_X(x) = 1/6\)

Functions of a random varaible