19 May 2017
Hamming distance for an aligned pair \(x\), \(y\), \[ D_{xy} = \mbox{ number of places } x \mbox{ and } y \mbox{ differ.} \]
Normalise this by dividing by sequence length \(L\) to get \[ f_{xy} = \frac{ D_{xy}} L = \mbox{ fraction of sites at which } x \mbox{ and } y \mbox{ differ.}\]
\(f\) captures evolutionary distance well for small distances, but doesn't grow very fast.
Based on model where mutations between all four bases are equally likely.
Corrects for fact that unrelated sequences will agree simply due to chance
\[d_{xy} = -\frac34 \log \left (1- \frac{4f_{xy}}3\right).\]
Can use other mutation models to define other distances.