From assignment 2, it is assumed you have a method for generating uniform random sequences of length L and mutating sequences over a time t according to a Jukes-Cantor model with rate mu.
For that, you’ll need:
A method that returns uniform random numbers between 0 and 1, random() (from the numpy random library).
A method randexp(lambda) that returns an exponentially distributed random number with rate lambda (see tutorial 6)
randexp(lambda)
return -log(random())/lambda
A method randpoiss(lambda) to simulate Poisson distributed numbers with rate \(\lambda\) (see tutorial 6)
randpoiss(lambda)
t = randexp(lambda)
poiss = 0
while (t < 1)
poiss += 1
t += randexp(lambda)
return poiss
A method sample(x,p) to a value from x according to the probabilities p (see tutorial 6)
sample(x,p)
\\ check x and p same length
if (x.length() != p.length()) stop
\\ check p sums to 1
if (p.sum() != 1) stop
\\ get cumulative sum of probabilities
cumprob = cumsum(p)
\\ draw uniform value and find where it falls
u = random()
for (j in 1 to x.length())
if (u < cumprob[j])
samp = x[j]
break
return samp
To sample a uniform random sequence of length L, use a method like
randseq(L)
for (i in 1 to L)
seq[i] = sample(['A','C','G','T'], [0.25,0.25,0.25,0.25])
return seq
To mutate a sequence X according to the Jukes-Cantor process with per site mutation rate mu over a time period t, use the following method (which allows mutations from a base to itself so does not adjust the mutation rate)
mutate(X,t,mu)
L= X.length()
\\ the number of mutations is Poisson with total rate L*mu*t
numMutation = randpoiss(L*mu*t)
\\ for each mutation, choose a site to mutate and mutate it
for (i in 1 to numMutation)
\\ choose a site
site = ceiling(random()*L)
\\ mutate that site
X[site] = sample(['A','C','G','T'], [0.25,0.25,0.25,0.25])
return X