From assignment 2, it is assumed you have a method for generating uniform random sequences of length L and mutating sequences over a time t according to a Jukes-Cantor model with rate mu.

For that, you’ll need:

A method that returns uniform random numbers between 0 and 1, random() (from the numpy random library).

A method randexp(lambda) that returns an exponentially distributed random number with rate lambda (see tutorial 6)

randexp(lambda)
  return -log(random())/lambda

A method randpoiss(lambda) to simulate Poisson distributed numbers with rate \(\lambda\) (see tutorial 6)

randpoiss(lambda)
  t = randexp(lambda)
  poiss = 0
  while (t < 1)
    poiss += 1
    t += randexp(lambda)
  return poiss

A method sample(x,p) to a value from x according to the probabilities p (see tutorial 6)

sample(x,p)
  \\ check x and p same length
  if (x.length() != p.length()) stop
  \\ check p sums to 1
  if (p.sum() != 1) stop
    
  \\ get cumulative sum of probabilities
  cumprob = cumsum(p)
  
  \\ draw uniform value and find where it falls
  u = random()
  for (j in 1 to x.length())
    if (u < cumprob[j])
      samp = x[j]
      break
    
  return samp

To sample a uniform random sequence of length L, use a method like

randseq(L)
  for (i in 1 to L)
    seq[i] = sample(['A','C','G','T'], [0.25,0.25,0.25,0.25])
  
  return seq

To mutate a sequence X according to the Jukes-Cantor process with per site mutation rate mu over a time period t, use the following method (which allows mutations from a base to itself so does not adjust the mutation rate)

mutate(X,t,mu)

  L= X.length()
  
  \\ the number of mutations is Poisson with total rate L*mu*t
  numMutation = randpoiss(L*mu*t)
  
  \\ for each mutation, choose a site to mutate and mutate it
  for (i in 1 to numMutation)
    \\ choose a site
    site = ceiling(random()*L)
    \\ mutate that site
    X[site] =  sample(['A','C','G','T'], [0.25,0.25,0.25,0.25])
    
  return X