Hidden Markov models (HMMs) are often used as black boxes to perform a wide range of classification and clustering tasks. We refer to Part 1 of this post for a detailed introduction to HMMs and their main applications. In this second part, we dive a bit deeper into the mathematical aspects behind one of the most popular applications—the Forward Algorithm—and see how to easily implement a concrete example in R.
In part 1 of this post, we have defined our HMM where there are two possible states: Hungry (referred to as H) and Thirsty (T), with three possible observations: Burger (B), Ice-cream (I) and Soda (S).
The corresponding initial probability vector, IN, is:
With a transition probability matrix, TR:
|at t=0 \ at t=1||Hungry||Thirsty|
And an emission probability matrix, EM:
Given this model definition and a sequence of observations (example: B-I-S), we are now interested in solving the following question: How likely is it to start lunch with a Burger, then having an Ice-cream and then finishing with a Soda?
Here, we want to calculate the likelihood of observing the B-I-S sequence, PHMM(B-I-S).
Even before having my first dish (i.e. getting the first observation), I have a probability of being Hungry, Pinit(H)=0.4, and a probability of being Thirsty, Pinit(T)=0.6, which is indicated by the IN vector. To estimate the total probability of observing B-I-S, let’s start by first estimating the probability of eating a Burger, B, as the first dish.
Remember that my states are hidden: we don’t know for sure that I am Hungry or Thirsty, we just have the probability of being on one of these states. So, given any of my two possible initial states (Thirsty or Hungry), I now make the decision to have a Burger. To estimate this, we use the emission probability matrix, EM: I can have a Burger while being Hungry, P(B|H)=0.6, or have a Burger while being Thirsty, P(B|T)=0.05.
Therefore, the total probability of having a Burger as the first dish is the probability sum of: having a Burger while being initially Hungry, P1(B|H) + having a Burger while being initially Thirsty, P1(B|T). These probabilities can be expressed as P1(B|H) = Pinit(H) x P(B|H) = 0.4 x 0.6 = 0.24 and P1(B|T) = Pinit(T) x P(B|T) = 0.6 x 0.05 = 0.03.
The total probability of getting my Burger as an entry is therefore 0.27. A graphical representation of this first step in described by the red arrows in Figure 1.
Fig. 1: Evolution of the hidden states (Hungry and Thirsty) after the first meal. The top level represents the evolution graphically while the bottom level represents the evolution in terms of probabilities. The red arrows represent the probabilities provided by the emission probability matrix EM, while the green arrows represent probabilities provided by the transition probability matrix TR.
Now, after my Burger, I can still be either Hungry or Thirsty. If I am Hungry now which is given by the probability P1(H), it could be because I was Hungry before and that the Burger wasn’t enough, or it could be that I was Thirsty before but having the Burger made me Hungry. The probability of being Hungry now while being Hungry before — i.e. P(H|H), or being Hungry now while being Thirsty before —i.e. P(H|T)— is provided by the transition matrix probability, TR. A visual representation of this transition is given by the green arrows of Figure 1.
With this information, we can estimate the total probability of being Hungry after the Burger:
P1(H) = P1(B|H) x P(H|H) + P1(B|T) x P(H|T)= 0.24 x 0.3 + 0.03 x 0.8 = 0.072+0.024 = 0.096
Similarly, I could be now Thirsty, which we will express as the probability P1(T). This Thirsty state could result from having a salty Burger which I ordered because I was initially Hungry (given by P1(B|H)). As a result, I went from a Hungry to a Thirsty state, for which the transition probability is given by P(T|H). It could also be because I was Thirsty initially and had a Burger (P1(B|T)) and the salt of the Burger made it worse so I am still Thirsty (expressed as P(T|T)). So the total probability of being Thirsty after the Burger is:
P1(T) = P1(B|H) x P(T|H) + P1(B|T) x P(T|T) = 0.24 x 0.7 + 0.03 x 0.2 = 0.168 + 0.006 = 0.174
Here we see that after the Burger, I am twice more likely to be Thirsty than Hungry. Now let’s have a look at the probability of having an Ice-cream I, as the second part of the meal.
Knowing the probability of being Hungry or Thirsty after the Burger, I can now make the decision to have an Ice-cream, which we can derive by using the EM matrix again. This process is illustrated by the red arrows in Figure 2. I can have an ice-cream while being Hungry, P(I|H)=0.3, or have a Ice-cream while being Thirsty, P(I|T)=0.4. Therefore, the total probability of having an Ice-cream as the second dish is the probability sum of: having an Ice-cream while being Hungry after Burger, P2(I|H) + Having an Ice-cream while being Thirsty after Burger, P2(I|T).
These probabilities can be expressed as: P2(I|H) = P1(H) x P(I|H) = 0.096 x 0.3 = 0.0288, with P2(I|T) = P1(T) x P(I|T) = 0.174 x 0.4 = 0.0696. Thus the total probability of getting an Ice-cream as a main meal is 0.0984.
Fig. 2: Evolution of the hidden states (Hungry and Thirsty) after the ice cream. Same caption than Figure 1.
Again, given my great appetite, I am still either Hungry or Thirsty after my Ice-cream, which is represented by the green arrows of Figure 2. I could be Thirsty P2(T), because I was Thirsty before and the Ice-cream did not calm my thirst—probability given by P(T|T), or because I was Hungry when I had the Ice-cream which then made me Thirsty—probability given by P(T|H). We look again at the TR matrix to estimate such probability:
P2(T) = P2(I|T) x P(T|T) + P2(I|H) x P(T|H) = 0.0696 x 0.2 + 0.0288 x 0.7 = 0.01392 + 0.02016 = 0.03408
Similarly, the probability of being Hungry after my Ice-cream is:
P2(H) = P2(I|H) x P(H|H) + P2(I|T) x P(H|T) = 0.0288 x 0.3 + 0.0696 x 0.8 = 0.00864 + 0.05568 = 0.06432
At this stage in my meal, it can be noticed that the probability of being Hungry is now twice higher than the probability of being Thirsty. Finally, let’s have a look at the probability of having a Soda to finish our lunch.
Given my probability of being Hungry or Thirsty after the Ice-cream, what is the probability of getting a Soda? Again we use the EM matrix to estimate such probability to be the sum of:
having a Soda while being Hungry after Ice-cream, P3(S|H) + having a Soda while being Thirsty after Ice-cream, P3(S|T). These probabilities can be expressed by: P3(S|H) = P2(H) x P(S|H) = 0.06432 x 0.1 = 0.006432, and + P3(S|T) = P2(T) x P(S|T) = 0.03408 x 0.55 = 0.018744.
We have our final answer! The probability of observing a B-I-S sequence is PHMM(B-I-S) = P3(S|H) + P3(S|T)= 0.025176.
How meaningful is this information? To put everything back into context, it can be useful to compare such probability with the one we would have obtained if each meal was drawn by chance: Pchance(B-I-S) = (1/3)3 = 0.035937. Here, since we have Pchance(B-I-S) > PHMM(B-I-S), clearly having a lunch made of Burger as an entry, followed by an Ice-cream with a Soda for dessert is highly unlikely given the HMM we defined.
The entire process we followed, known as the Forward Algorithm, is summarised in Figure 3.
Fig. 3: The forward algorithm applied to our lunch.
There are a few R libraries which allow quick and easy implementation of any HMM model and application of the Forward Algorithm. For example, the “HMM” library  can be installed directly within R via the command
install.packages("HMM"). Here is a snippet to implement our example using the HMM library:
library(HMM) #-------# Initialise HMMhmm = initHMM(c("Hungry", "Thirsty"), c("Burger", "Ice-cream", "Coke"), startProbs=c(0.4, 0.6), transProbs=matrix(c(.3, .8, .7, .2), 2), emissionProbs=matrix(c(0.6, 0.05, 0.3, 0.4, 0.1, 0.55), 3, 2))#-------# Sequence of observationsobservations = c("Burger", "Ice-cream", "Coke") cat("Proposed observations sequence: ", observations,"\n") #-------# Forward algorithmlogForwardProb = forward(hmm, observations) print(exp(logForwardProb)) cat("Probability of reaching state 3: ", exp(logForwardProb) + exp(logForwardProb), "\n")
Proposed observations sequence: Burger, Ice-cream, Coke index states 1 2 3 Hungry 0.24 0.0288 0.006432 Thirsty 0.03 0.0696 0.018744Probability of reaching state 3: 0.025176
In the part 3 of this series, we will have a look on how to decode the hidden states of the HMM via the Viterbi algorithm as well as how to implement our whole example in R.
Time for an Ice-cream.
Header image: courtesy of NASA/JPL-Caltech.
Thanks to Luis Torres and Alex Cummaudo for reviewing this post