11 May 2017

HMM Idea

States of a Markov chain are not seen, we only see symbols that states emit.

All states emit the same symbols but with different probabilities.

We observe the sequence of symbols, want to know the sequence of states.

HMM Weather example

A sequence of states (wind directions) might look like: NNNSSNS

Producing the sequence of symbols (weather): UUUURUR

HMM Notation

  • \(\pi\) is the state sequence with ith state \(\pi_i\)
  • Transition probability from state \(k\) to state \(l\) is \(a_{kl} = P(\pi_i = l | \pi_{i-1} = k)\)
  • Symbol sequence is \(x\), with ith symbol \(x_i\)
  • often use \(b\) for the generic symbol
  • Emission probability is probability of emitting \(b\) from state \(k\) \(e_{k}(b) = P(x_i = b | \pi_i = k)\)

HMM Weather example notation

HMM Example: Cheating casino

A casino occasionally switches from a fair die to a loaded die.

Rolling loaded die gives 6 50% of the time, each of 1-5 10% of the time.

States are: \(F\) and \(L\) corresponding to the Fair and Loaded die.

Emissions are: 1,2,3,4,5,6 corresponding to value rolled

Transition probabilities: \(a_{FL} = 0.05\) and \(a_{LF} = 0.1\). \(a_{FF} = 1 - a_{FL} = 0.95\) and \(a_{LL} = 1 - a_{LF} = 0.9\).

Emission probabilities: \(e_F(b) = 1/6\) for \(b = 1,\ldots,6\) while \(e_L(b) = 1/10\) for \(b = 1,\ldots,5\) and \(e_L(6 ) = 0.5\).

HMM Example: Cheating casino

From this process, we might observe the symbol sequence (i.e., a sequence of rolls of the die)

2 5 1 6 6 2 4 2 4 5 6 6 3 6 2 4 2 2 3 4 6 3 6 5 3 4 5 2 3 5 1 6 6 6 6 2 4 6 6 2 6 6 6 6 6 1 5 

Rolls (symbol sequence)
2 5 1 6 6 2 4 2 4 5 6 6 3 6 2 4 2 2 3 4 6 3 6 5 3 4 5 2 3 5 1 6 6 6 6 2 4 6 6 2 6 6 6 6 6 1 5
States (state seqeunce)
F F F F L F F F L L L L L L L L F F F F F F F F F F F F F F L L L L F F F F L L L L L L F F F 

Joint probability of state and symbol seqeunce

Think of HMM as a generative process:

  • the first state \(\pi_1\) is chosen according to probability \(a_{0i}\).

  • From first state, a symbol is emitted according to the probabilities \(e_{\pi_1}\).

  • The next state is then chosen according to transition probs \(a_{\pi_1 i}\)

  • The next state emits a symbol and so on.

Joint probability of states and symbols is thus: \[ P(x,\pi) = P(x,\pi|\mathbf{e,a}) = a_{0 \pi_1} \prod_{i = 1}^L e_{\pi_i}(x_i)a_{\pi_i \pi_{i+1}}. \]

HMM Problems

  • Decoding — given model and output, find most likely state path (use Viterbi algorithm)
  • Evaluation — given model and output, what is probability model generated that output (use forward or backward algorithms)
  • Posterior decoding — given model and output, what is probability of being in each state at each time (use forward-backward algorithm)
  • Learning — given model structure and output, find parameters of model (use Baum-Welch algorithm)

Viterbi algorithm

Given symbol sequence \(x\), find state path \(\pi\) that maximises the joint probability: \[ \pi^* = \arg\,\max_{\pi} P(x,\pi). \]

Can find \(\pi^*\) using a dynamic programming approach.

Viterbi algorithm

Let \(v_k(i)\) be the probability of the most probable state path ending in state \(k\) at observation \(i\)

So \(v_k(i)\) is the best score of the state path for the first \(i\) observations.

Suppose \(v_k(i)\) is known for all states \(k\).

Then \(v_l(i+1)\) can be calculated for observation \(x_{i+1}\) as \[v_l(i+1) = e_l(x_{i+1}) \max_k (v_k(i)a_{kl}).\]

Use this recursion to find best state path.

Assume boundary conditions: always start in begin state called 0, so \(v_0(0) = 1\)

Viterbi algorithm

Initialisation \(i = 0\): \(v_0(0) = 1\), \(v_k(0) = 0\) for \(k>0\).

Recursion \(i = 1,\ldots,L\): \(v_l(i) = e_l(x_i) \max_k(v_k(i-1)a_{kl})\).

\(Pointer_i(l) = \arg \max_k(v_k(i-1)a_{kl})\)

Termination: \(P(x,\pi^*) = \max_k (v_k(L)a_{k0})\).

Traceback \(i = L,\ldots,1\): \(\pi^*_L = \arg \max_k(v_k(L)a_{k0})\).

\(\pi_{i-1}^* = Pointer_i(\pi_i^*)\).

The termination step assumes the end state is modelled as \(a_{k0}\). If the end is not modelled, this term disappears.

Viterbi algorithm

Even the most probable path has very low probability, so it is important to work in log units.

The recursion in log units is \[ V_l(i+1) = \log (e_l(x_{i+1})) + \max_k (V_k(i) + \log(a_{kl})) = E_l(x_{i+1}) + \max_k (V_k(i) + A_{kl})\] where \[V_k(i) = \log(v_k(i)), A_{kl} = \log(a_{kl}),E_l(x_{i+1}) = \log(e_l(x_{i+1}))\]

Viterbi algorithm (log units)

Initialisation \(i = 0\): \(V_0(0) = 0\), \(V_k(0) = -\infty\) for \(k>0\).

Recursion \(i = 1,\ldots,L\): \(V_l(i) = E_l(x_i) + \max_k(V_k(i-1) + A_{kl})\).

\(Pointer_i(l) = \arg \max_k(V_k(i-1) +A_{kl})\)

Termination: \(P(x,\pi^*) = \max_k (V_k(L) + A_{k0})\).

Traceback \(i = L,\ldots,1\): \(\pi^*_L = \arg \max_k(V_k(L) + A_{k0})\).

\(\pi_{i-1}^* = Pointer_i(\pi_i^*)\).

The key part of the algorithm is calculating the recursion \[V_l(i+1) = E_l(x_{i+1}) + \max_k (V_k(i)+ A_{kl}) \]

Think of \(V\) as a matrix with row indices \(k\) (states) and column indices \(i\) (position along sequences)