Run Forward Backward and Viterbi Algorithms on defined HMMs

Why have such a model?

Usually the data points we encounter in datasets like MNIST, PASCAL etc are assumed to independent and identically distributed. This allows us to apply likelihood function across the data points to model probability distribution. But there are examples of data instances for which making this assumption would clearly be wrong, more specifically time series data points. Time series data is sequential in nature and not all examples of sequential data are time series, some examples for this kind of dataset are accoustic features in speech, sequence of characters, sequence of nucleotide base pairs in a DNA strand etc.

Overview of Markov Models

To exploit the sequential patterns that occur in the data, we need a way to model the correlations between the observations. Markov models use the product rule to express the joint distribution for a sequence of observations,

$P(x_1, x_2, ..., x_n) = \Pi^n_{i=2} \ p(x_i\ |\ x_1,..., x_{i-1})$

Assuming that the current observation only depends on the previous observation, a first-order Markov Chain, and using d-separation property to reduce the above equation we get,

$P(x_1, x_2, ..., x_n) = p(x_1) \Pi^n_{i=2} p(x_i\ |\ x_{i-1})$

We can also create higher orders of markov models in a similar manner.

Overview of Hidden Markov Model

Hidden Markov Models (HMM) are an extension of a mixture model, where there are various discrete multinomial latent variables that could be responsible for generating a particular observation in a sequence. The choice of picking a mixture component or hidden state for a particular observation depends on the choice of component for the previous observation. The transition probabilities are defined by this transition from previous hidden state to the current hidden state based on the observation denoted by A, and emission probabilities are defined by the conditional distribution of the observed variables $p(x_n|z_n,\phi)$ for each latent variable denoted by B. HMMs in general are not susceptible to local warping and variability in generations, making it an excellent choice for speech recognition, handwriting generation etc.

Forward Backward (Baum-Welch) Algorithm

This algorithm capable of determining the probability of emitting a sequence of observation given the parameters (z,x,A,B) of a HMM, using a two stage message passing system. It is used when we know the sequence of observation but don't know the sequence of hidden states that generates the sequence of observation in question. Let us represent the sequence of observation with X and parameters using theta,

$P(X^T\ |\ \theta) = \Sigma_{n^T}\ p(X^T, Z^T)$

The time complexity of calculating the posterior with just one pass will be $\small O(n^TT)$ for a given sequence of T observations. The complexity can be reduced to $\small O(n^2T)$ , using dynamic programming,

$\alpha_j(t) = p(x_1,....x_t, z_t = j)$
$\alpha_j(t+1) = b_{jk}(x_{t+1}) \Sigma^n_{i=1} \ a_{ij}\alpha_i(t)$

After completing this step, we backtrack through our trellis using the following function,

$\large \beta_i(t) = \begin{Bmatrix} 1 & when\ t=T\\ \sum_{j=0}^{n} a_{ij} b_{jk}(x_{t+1})\beta_{j}(t+1) & when\ t<T \end{Bmatrix}$

Viterbi Algorithm

For finding the most probable sequence of hidden states, we use max-sum algorithm known as Viterbi algorithm for HMMs. It searches the space of paths (possible sequences) efficiently with a computational cost that grows linearly with the length of chain. We again use the variables z to represent the hidden states, x to represent the observed sequence, n as the number of hidden states, and T as the length of the observed sequence. Our objective is to find the states that maximize the conditional proabbility of states given sequence of observation,

$\large Z^*_{0:T} = argmax_{Z_{0:T}} P(Z_{0:T} | X_{0:T})$

This can again be solved by the means of dynamic programming, as the current state in a HMM only depends on the previous state. We define a function to represent the maximum joint probability of getting an intermediate observation of the sequence and hidden state as follows:

$\large \mu(Z_i) = max_{Z_{0:i-1}} P(Z_{0:i}, X_{0:i})\\ \mu(Z_i) = max_{Z_{i-1}} \mu(Z_{i-1})P(Z_i|Z_{i-1})P(X_i|Z_i)$

We can see from the final formula that the last two probability terms are nothing but the transition probability and the emission probabilities. The runtime complexity comes down to be $\small O(n^2T)$ using dynamic algorithm.

References

Rabiner LR (February 1989). "A tutorial on hidden Markov models and selected applications in speech recognition". Proceedings of the IEEE. 77 (2): 257–286. CiteSeerX 10.1.1.381.3454. doi:10.1109/5.18626. (Describes the forward algorithm and Viterbi algorithm for HMMs).
Blasiak, S.; Rangwala, H. (2011). "A Hidden Markov Model Variant for Sequence Classification". IJCAI Proceedings-International Joint Conference on Artificial Intelligence.
Christopher M. Bishop "Pattern Classification and Machine Learning" 13.2

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
HMM.ipynb		HMM.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Run Forward Backward and Viterbi Algorithms on defined HMMs

Why have such a model?

Overview of Markov Models

Overview of Hidden Markov Model

Forward Backward (Baum-Welch) Algorithm

Viterbi Algorithm

References

About

Releases

Packages

Languages

tanishkasingh9/HMM_fwd_viterbi

Folders and files

Latest commit

History

Repository files navigation

Run Forward Backward and Viterbi Algorithms on defined HMMs

Why have such a model?

Overview of Markov Models

Overview of Hidden Markov Model

Forward Backward (Baum-Welch) Algorithm

Viterbi Algorithm

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages