States of a Markov chain are not seen, we only see symbols that states emit.
All states emit the same symbols but with different probabilities.
We observe the sequence of symbols, want to know the sequence of states.
11 May 2017
States of a Markov chain are not seen, we only see symbols that states emit.
All states emit the same symbols but with different probabilities.
We observe the sequence of symbols, want to know the sequence of states.
A sequence of states (wind directions) might look like: NNNSSNS
Producing the sequence of symbols (weather): UUUURUR
A casino occasionally switches from a fair die to a loaded die.
Rolling loaded die gives 6 50% of the time, each of 1-5 10% of the time.
States are: \(F\) and \(L\) corresponding to the Fair and Loaded die.
Emissions are: 1,2,3,4,5,6 corresponding to value rolled
Transition probabilities: \(a_{FL} = 0.05\) and \(a_{LF} = 0.1\). \(a_{FF} = 1 - a_{FL} = 0.95\) and \(a_{LL} = 1 - a_{LF} = 0.9\).
Emission probabilities: \(e_F(b) = 1/6\) for \(b = 1,\ldots,6\) while \(e_L(b) = 1/10\) for \(b = 1,\ldots,5\) and \(e_L(6 ) = 0.5\).
From this process, we might observe the symbol sequence (i.e., a sequence of rolls of the die)
2 5 1 6 6 2 4 2 4 5 6 6 3 6 2 4 2 2 3 4 6 3 6 5 3 4 5 2 3 5 1 6 6 6 6 2 4 6 6 2 6 6 6 6 6 1 5
Rolls (symbol sequence) 2 5 1 6 6 2 4 2 4 5 6 6 3 6 2 4 2 2 3 4 6 3 6 5 3 4 5 2 3 5 1 6 6 6 6 2 4 6 6 2 6 6 6 6 6 1 5
States (state seqeunce) F F F F L F F F L L L L L L L L F F F F F F F F F F F F F F L L L L F F F F L L L L L L F F F
Think of HMM as a generative process:
the first state \(\pi_1\) is chosen according to probability \(a_{0i}\).
From first state, a symbol is emitted according to the probabilities \(e_{\pi_1}\).
The next state is then chosen according to transition probs \(a_{\pi_1 i}\)
The next state emits a symbol and so on.
Joint probability of states and symbols is thus: \[ P(x,\pi) = P(x,\pi|\mathbf{e,a}) = a_{0 \pi_1} \prod_{i = 1}^L e_{\pi_i}(x_i)a_{\pi_i \pi_{i+1}}. \]
Given symbol sequence \(x\), find state path \(\pi\) that maximises the joint probability: \[ \pi^* = \arg\,\max_{\pi} P(x,\pi). \]
Can find \(\pi^*\) using a dynamic programming approach.
Let \(v_k(i)\) be the probability of the most probable state path ending in state \(k\) at observation \(i\)
So \(v_k(i)\) is the best score of the state path for the first \(i\) observations.
Suppose \(v_k(i)\) is known for all states \(k\).
Then \(v_l(i+1)\) can be calculated for observation \(x_{i+1}\) as \[v_l(i+1) = e_l(x_{i+1}) \max_k (v_k(i)a_{kl}).\]
Use this recursion to find best state path.
Assume boundary conditions: always start in begin state called 0, so \(v_0(0) = 1\)
Initialisation \(i = 0\): \(v_0(0) = 1\), \(v_k(0) = 0\) for \(k>0\).
Recursion \(i = 1,\ldots,L\): \(v_l(i) = e_l(x_i) \max_k(v_k(i-1)a_{kl})\).
\(Pointer_i(l) = \arg \max_k(v_k(i-1)a_{kl})\)
Termination: \(P(x,\pi^*) = \max_k (v_k(L)a_{k0})\).
Traceback \(i = L,\ldots,1\): \(\pi^*_L = \arg \max_k(v_k(L)a_{k0})\).
\(\pi_{i-1}^* = Pointer_i(\pi_i^*)\).
The termination step assumes the end state is modelled as \(a_{k0}\). If the end is not modelled, this term disappears.
Even the most probable path has very low probability, so it is important to work in log units.
The recursion in log units is \[ V_l(i+1) = \log (e_l(x_{i+1})) + \max_k (V_k(i) + \log(a_{kl})) = E_l(x_{i+1}) + \max_k (V_k(i) + A_{kl})\] where \[V_k(i) = \log(v_k(i)), A_{kl} = \log(a_{kl}),E_l(x_{i+1}) = \log(e_l(x_{i+1}))\]
Initialisation \(i = 0\): \(V_0(0) = 0\), \(V_k(0) = -\infty\) for \(k>0\).
Recursion \(i = 1,\ldots,L\): \(V_l(i) = E_l(x_i) + \max_k(V_k(i-1) + A_{kl})\).
\(Pointer_i(l) = \arg \max_k(V_k(i-1) +A_{kl})\)
Termination: \(P(x,\pi^*) = \max_k (V_k(L) + A_{k0})\).
Traceback \(i = L,\ldots,1\): \(\pi^*_L = \arg \max_k(V_k(L) + A_{k0})\).
\(\pi_{i-1}^* = Pointer_i(\pi_i^*)\).
The key part of the algorithm is calculating the recursion \[V_l(i+1) = E_l(x_{i+1}) + \max_k (V_k(i)+ A_{kl}) \]
Think of \(V\) as a matrix with row indices \(k\) (states) and column indices \(i\) (position along sequences)