CS 369: Pairwise alignment

3 May 2017

Global Alignment

\(F(i,j)\)th is found by taking maximum score based on closest 3 entries to left and above:

where \(F(0,0) = 0\) and \(F(i,0) = F(0,i) = -id\)

Draw arrow to show where (i,j)th entry was derived from

Global alignment: example

Align x = ATA and y = AGTTA using scores:

\(s(a,b) = 2\) if \(a = b\)
\(s(a,b) = 1\) if \(a\) and \(b\) are both purines (A,G) or pyrimidines (C,T)
\(s(a,b) =-2\) if \(a\) is a purine and \(b\) is a pyrimidine or vice versa.
The gap penalty is \(d = -2\)

So score matrix is

\[ \begin{array}{lrrrr} & A & C & G &T \\ A& 2 & -2 & 1 & -2 \\ C& -2 & 2 & -2 & 1 \\ G & 1 & -2 & 2 & -2 \\ T & -2 & 1 & -2 & 2 \\ \end{array} \]

\[ \begin{array}{lrrrr} & A & C & G &T \\ A& 2 & -2 & 1 & -2 \\ C& -2 & 2 & -2 & 1 \\ G & 1 & -2 & 2 & -2 \\ T & -2 & 1 & -2 & 2 \\ \end{array} \] Gap penalty is \(d = -2\). So fill out \(F\):

So score of best alignment is 2. Get the best alignment by starting in bottom right corner and tracing arrows back

Two alignments with best score depending on which arrow followed at \(F(2,4)\): \[\begin{array}{ccccc} A & G & T & T & A & \mbox{ and } & A & G & T & T & A \\ A & - & - & T & A & & A & - & T & - & A \end{array}\]

Elements of alignment algorithm

a recurrence relation for the quantity we are trying to optimise
boundary conditions
tabular computing to efficiently calculate the recurrence
traceback, includes rules for where to start and stop traceback.

Local Alignment

Local alignment: Smith Waterman algorithm

Want to find best alignment for subsequences of \(x\) and \(y\).

Fix \(i\leq m\) and \(j \leq m\). Define a Suffix alignment: any alignment of \[x_s x_{s+1} \ldots x_i\] and \[y_r y_{r+1} \ldots y_j\] for some \(1 \leq s \leq i\) and \(1 \leq r \leq j\).

Let \(F(i,j)\) be the score of the best suffix alignment of \(x_1 x_2 \ldots x_i\) and \(y_1 y_2 \ldots y_j\)

Since score of empty alignment is 0, \(F(i,j) \geq 0\)

So instead of a letting a score for an alignment to go negative, we start a new alignment.

\[ F(i,j) = \max \begin{cases} 0\\ F(i-1,j-1)+s(x_i,y_i), \\ F(i-1,j)-d, \\ F(i,j-1)-d. \end{cases} \]

To find the best subsequence alignment, look for the best score and trace it back until we hit zero.

Local alignment example

Find best local alignment of x = ATA and y = AGTTA with same gap penalty and score matrix as before.

Want to fill out

Highest score is at \(F(5,3)\), so start traceback from there and stop when hit first zero to get \[ \begin{array}{cc} T & A \\ T & A \end{array}\]