I recently completed a course Final Project in which I attempted to implement a Hidden Markov Model (HMM) parameterized using the Markov Chain Monte Carlo method. In this post, I will briefly introduce HMM models as well as the MCMC method. I will then walk you through an exercise implementing a HMM in the Julia programming language and share some preliminary results from my exploration in combining both these methods. All the code found in this exercise can be located in this GitHub repository.
Note: The exercise in this post will require some familiarity with Julia. For a quick introduction to this relatively recent newcomer to the programming languages scene, please refer to Peter Storm’s blog post here.
Hidden Markov Models
A Hidden Markov Model (HMM) is a statistical tool used to model different types of time series data ranging from signal processing, hydrology, financial statistics, speech recognition, and econometrics (Rabiner, 1990; Rydén, 2008), where the underlying process is assumed to be Markovian. Specifically, it has been found to be effectively at modeling systems with long-term persistence, where past events have effects on current and future events at a multi-year scale (Akintug and Rasmussen, 2005; Bracken et al., 2014). Recent studies have also found that it is possible to use HMMs to model the dynamics of streamflow, where the HMM state space represents the different hydrological conditions (e.g. dry vs. wet) and the observation space represents the streamflow data (Hadjimichael et al., 2020; 2020a). These synthetic streamflows are computationally generated with the aid of packages such as hmmlearn
in Python and HMMBase
in Julia. An example implementation of hmmlearn
can be found in Julie Quinn’s blog post here. She also wrote the prequel to the implementation post, where she dives into the background of HMMs and applying Expectation-Maximization (EM) to parameterize the model.
Markov Chain Monte Carlo
The EM approach of parameterizing HMMs reflects a frequentist approach that assumes the existence of one “true” value for each HMM parameter. In this exercise, I will instead use a Bayesian approach via the Markov Chain Monte Carlo method to generate and randomly sample potential parameter (model prior) values from a Markov chain, which are statistical models that approximate Markovian time series. In these time series, the probability of being in any given state st at current time t is dependent only on states prior to st. Using the MCMC approach, a Markov Chain is constructed for each HMM parameter such that its stationary distribution (distribution of the parameters at equilibrium) is the distribution of its associated model posterior (the likelihood of a model structure after, or a posteriori, obtaining system observed data). The model priors (assumed model structure prior to having any knowledge about the system) are randomly sampled from their respective chains — hence the term “Monte Carlo”. These samples are then used to estimate the statistical properties of the distribution of each parameter.
Overall, MCMC is a powerful method to estimate the statistical properties of high-dimensional problems with many parameters. Nonetheless, it is beholden to its own set of limitations that include high computational expense and the inability to guarantee convergence. Figure 1 below summarize some of the benefits and drawbacks of using MCMC parameterization:
Setting up the parameters
In this exercise, I will be using the HMM to approximate the statistical properties of streamflow data found in the annualQ.csv
file previously used also by Rohini in her post on creating HMMs to generate synthetic streamflow series and by the MSD UC Ebook HMM training Jupyter Notebook. This streamflow time series originated from a USGS outlet gauge in the Upper Colorado River Basin. It records 105 years’ worth of streamflow from 1909 and 2013 (inclusive). It was previously found that it was best described using two states (wet and dry; Bracken et al, 2014). In this exercise, I also assume that this time series is best approximated by a first-order HMM where the state at current time t (st) is only affected by the state at the timestep immediately prior to it (st-1).
In this exercise, I will be using MCMC to parameterize the following HMM parameters:
Parameter | Description |
The initial state distribution a for wet (1) and dry (2) states | |
The transition matrix A from a wet (1) to a dry (2) state | |
The probability distributions for the vector of observations x for a given state where Wet: 1 and Dry: 2 |
From Table 1 above:
- P(ai) is the initial state probability for state i
- P(Aij) is the transition probability from state i to state j
- μi and σi are the mean and standard deviation for the vector of streamflow observations x for a given state i
Now that the dataset and model parameters have been introduced, we will proceed with demonstrating the code required to implement the HMM model with MCMC parameterization.
Implementing the streamflow HMM with using MCMC parameterization
Before begin, I recommend downloading and installing Julia 1.8.5. Please find the Windows and Mac 64-bit installer here. Once Julia is set up, we can download and install all required packages. In Julia, this can be done as follows:
# make sure all required packages are installed
import Pkg
Pkg.activate(@__DIR__);
Pkg.instantiate();
Once this is done, we can load all the required packages. In Julia, this is done using the using
key word, which can be thought of as the Python import
parallel. Note that I placed a semicolon after the using CSV
line. This is to prevent the automatic printing of output messages that Julia automatically displays when loading packages.
# load all packages
using Random
using CSVFiles # load CSV data
using DataFrames # data storage and presentation
using Plots # plotting library
using StatsPlots # statistical plotting
using Distributions # statistical distribution interface
using Turing # probabilistic programming and MCMC
using Optim # optimization library
using StatsBase
using LaTeXStrings
using HMMBase
using DelimitedFiles
using CSV;
Now, let’s load the data and identify the number of years and number of states inherent to time streamflow time series.
# load data
# source: https://github.com/IMMM-SFA/msd_uncertainty_ebook/tree/main/notebooks/data
Q = CSV.File("annualQ.csv", header=[:Inflow]) |> DataFrame
Q = Q[2:end,:]
Q[!,:Inflow] = parse.(Float64, Q[!, :Inflow])
Q[!, :Year] = 1909:2013;
Q[!, :logQ] = log.(Q[!, :Inflow]);
# Assume first-order HMM where only the immediate prior step influences the next step
n_states = 2
logQ = Q[!,:logQ];
n_years = size(logQ)[1];
Next, we should set a random seed to ensure that our following results are consistent. This is because the package that we are using in this exercise called “Distributions” performs random sampling. Without setting the seed (shown below), the outcomes of the exercise may be wildly different each time the MCMC parameterization algorithm is run.
# Set the random seed
Random.seed!(1)
Next, we define the HMM model with MCMC sampling. The sampling is performed using methods and syntax built into the Turing.jl
package. In the code below, also note lines 32 to 38. Within those lines, the observations’ states are enforced where observations with log-streamflows below the input mean are assigned to a dry state, while the opposite is implemented for log-streamflows above the wet state. This step is important to prevent mode collapse, where the HMM model cannot distinguish between the wet and dry state distributions due to the random sampling of the MCMC parameterization.
@model function HMM_mcmc(logQ, n_states, n_years)
# Source: https://chat.openai.com/c/d81ef577-9a2f-40b2-be9d-c19bf89f9980
# Prompt example: Please suggest how you would parameterize Hidden Markov Model with the Markov Chain Monte Carlo method
# Uses the Turing.jl package in Julia. The input to this model will be an inflow timeseries with two hidden states.
mean_logQ = mean(logQ)
# Define the prior distributions for the parameters
μ₁ ~ Normal(15.7, 0.25) # wet state
μ₂ ~ Normal(15.2, 0.25) # dry state
σ₁ ~ truncated(Normal(0.24, 0.1), 0, Inf)
σ₂ ~ truncated(Normal(0.24, 0.1), 0, Inf)
a11 ~ Dirichlet(ones(n_states))
A ~ filldist(Dirichlet(ones(n_states)), n_states)
s = tzeros(Int, n_years) # Define the vector of hidden states
# Define intial state probability distribution
a = [a11[1], 1-a11[1]] # [wet state, dry state]
# Define the observation distributions
B = [Normal(μ₁, σ₁), Normal(μ₂, σ₂)]
# Sample the initial hidden state and observation variables
s[1] ~ Categorical(a)
# Loop over time steps and sample the hidden+observed variables
for t in 2:n_years
s[t] ~ Categorical(A[:,s[t-1]])
end
for t in 1:n_years
logQ[t] ~ B[s[t]]
# if the logQ is greater than the input mean, force the state to be a wet state
if logQ[t] > mean_logQ
s[t] ~ Categorical(A[:,1])
# if the logQ is smaller than the input mean, force the state to be a dry state
elseif logQ[t] < mean_logQ
s[t] ~ Categorical(A[:,2])
end
end
end
We then have to run the model for at least 3,000 iterations before the sample values for each parameter converge to stable values. The code to perform this is as follows:
# run for 3000 iterations; usually enough
# run using >3 chains to double check convergence if desired
hmm_model = HMM_mcmc(logQ, n_states, n_years)
g = Gibbs(HMC(0.05, 10, :μ₁, :μ₂, :σ₁, :σ₂, :a11, :A), PG(10, :s))
chains = sample(hmm_model, g, MCMCThreads(), 3000, 1, drop_warmup=true)
To check for convergence, you can attempt to run plot(chains)
to check for convergence, and corner(chains)
to examine the parameters for dependencies. In this post, we will not discuss MCMC convergence in detail, but some things to look out for would be r_hat values in the output of the code shown above that are close to 1, as well as chains that are similar to each other. Refer to Table 2 and Figure 2 for some quick statistics that can be used as a baseline to determine if your model has converged.
Now, let’s use the model to generate 1,000 samples of streamflow. First, we define the streamflow prediction function:
function predict_inflow(chain, logQ)
μ₁ = Array(group(chain, :μ₁))
μ₂ = Array(group(chain, :μ₂))
σ₁ = Array(group(chain, :σ₁))
σ₂ = Array(group(chain, :σ₂))
a11 = Array(group(chain, "a11[1]"))
a22 = Array(group(chain, "a11[2]"))
A11 = Array(group(chain, "A[1,1]"))
A12 = Array(group(chain, "A[1,2]"))
A21 = Array(group(chain, "A[2,1]"))
A22 = Array(group(chain, "A[2,2]"))
n_samples = 1000
μ1_sample = sample(μ₁, n_samples, replace=true);
μ2_sample = sample(μ₂, n_samples, replace=true);
σ1_sample = sample(σ₁, n_samples, replace=true);
σ2_sample = sample(σ₂, n_samples, replace=true);
a11_sample = sample(a11, n_samples, replace=true);
a22_sample = sample(a22, n_samples, replace=true);
A11_sample = sample(A11, n_samples, replace=true);
A12_sample = sample(A12, n_samples, replace=true);
A21_sample = sample(A21, n_samples, replace=true);
A22_sample = sample(A22, n_samples, replace=true);
Q_predict = zeros(length(logQ), n_samples)
#residuals_ar = zeros(length(temp_data), n_samples)
s_matrix = zeros(length(logQ), n_samples)
logQ_matrix = zeros(length(logQ), n_samples)
meanQ = mean(logQ)
for n = 1:n_samples
s_sample = tzeros(Int, n_years)
Q_sample = zeros(n_years)
μ1 = μ1_sample[n]
μ2 = μ2_sample[n]
σ1 = σ1_sample[n]
σ2 = σ2_sample[n]
a11 = a11_sample[n]
a22 = a22_sample[n]
A11 = A11_sample[n]
A21 = 1 - A11_sample[n]
A12 = 1 - A22_sample[n]
A22 = A22_sample[n]
#println(A)
A = [A11 A12; A21 A22]
A = transpose(A)
#println(A)
#a = [1-a22, a22]
a = [a11, 1-a11]
print(a)
B = [Normal(μ1, σ1), Normal(μ2, σ2)]
# Sample the initial hidden state and observation variables
s_sample[1] = rand(Categorical(a))
# Loop over time steps and sample the hidden+observed variables
for t in 2:n_years
s_sample[t] = rand(Categorical(A[s_sample[t-1], :]))
end
for t in 1:n_years
Q_sample[t] = rand(B[s_sample[t]])
if Q_sample[t] < meanQ
s_sample[t] = 1
elseif Q_sample[t] > meanQ
s_sample[t] = 2
end
end
s_matrix[:, n] = s_sample
Q_predict[:, n] = Q_sample
end
return s_matrix, Q_predict
end
From this function, we obtain the state and log-inflow matrix with dimensions 105 × 1000 each.
s_matrix, logQ_predict = predict_inflow(chains, logQ)
We randomly select a timeseries out of the 1,000 timeseries output by the prediction function and plot the wet and dry states identified:
# create the dry and wet masks
q = Q[!,:Inflow]
y = Q[!,:Year]
# select a random sample timeseries to plot
idx = 7
Q_wet_mcmc = q[s_matrix[:,idx] .== 1]
Q_dry_mcmc = q[s_matrix[:,idx] .== 2]
y_wet_mcmc = y[s_matrix[:,idx] .== 1]
y_dry_mcmc = y[s_matrix[:,idx] .== 2];
# plot the figure
plot(y, Q.Inflow, c=:grey, label="Inflow", xlabel="Year", ylabel="Annual inflow (cubic ft/yr)")
scatter!(y_dry_mcmc, Q_dry_mcmc, c=:salmon, label="Dry state (MCMC)", xlabel="Year", ylabel="Annual inflow (cubic ft/yr)")
scatter!(y_wet_mcmc, Q_wet_mcmc, c=:lightblue, label="Wet state (MCMC)", xlabel="Year", ylabel="Annual inflow (cubic ft/yr)")
The output figure should look similar to this:
From this figure, we can see that the HMM can identify streamflow extreme as wet and dry states (albeit with some inconsistencies) where historically high streamflows are identified as a “wet state” and low streamflows are correspondingly identified as “dry”.
To make sure that the model can distinguish the observation distributions for wet and dry states, we plot their probability distribution functions. Before we plot anything, though, let’s define a function that will help us plot the distributions themselves. For this function, we first need to load the LinearAlgebra
package.
# load the LinearAlgebra package
using LinearAlgebra
# function to plot the distribution of the wet and dry states
function plot_dist(Q, μ, σ, A)
evals, evecs = eigen(A).values, eigen(A).vectors
one_eval = argmin(abs.(evals.-1))
π = evecs[:, one_eval] / sum(evecs[:, one_eval])
x_0 = LinRange(μ[1] - 4*σ[1], μ[1] + 4*σ[1], 10000)
norm0 = Normal(μ[1], σ[1])
fx_0 = π[1].*pdf.(norm0, x_0)
x_1 = LinRange(μ[2] - 4*σ[2], μ[2] + 4*σ[2], 10000)
norm1 = Normal(μ[2], σ[2])
fx_1 = π[2].*pdf.(norm1, x_1)
x = LinRange(μ[1] - 4*σ[1], μ[2] + 4*σ[2], 10000)
fx = π[1].*pdf.(norm0, x) .+ π[2].*pdf.(norm1, x)
#fig, ax = plt.subplots(1, 1, figsize=(12, 8))
b = range(μ[1] - 4*σ[1], μ[2] + 4*σ[2], length=100)
histogram(log.(Q), c=:lightgrey, normalize=:pdf, label="Log-Inflow")
plot!(x_0, fx_0, c=:red, linewidth=3, label="Dry State Distr", xlabel="x", ylabel="P(x)")
plot!(x_1, fx_1, c=:blue, linewidth=2, label="Wet State Distr")
plot!(x, fx, c=:black, linewidth=2, label="Combined State Distr")
end
Great! Now let’s use this function to plot the distributions themselves.
# get the model parameter value means
μ₁_mean = mean(chains, :μ₁)
μ₂_mean = mean(chains, :μ₂)
σ₁_mean = mean(chains, :σ₁)
σ₂_mean = mean(chains, :σ₂)
a11_mean = mean(chains, "a11[2]")
A11_mean = mean(chains, "A[1,1]")
A12_mean = mean(chains, "A[1,2]")
A21_mean = mean(chains, "A[2,1]")
A22_mean = mean(chains, "A[2,2]")
# obtained from model above
μ = [μ₂_mean, μ₁_mean]
σ = [σ₂_mean, σ₁_mean]
A_mcmc = [A11_mean 1-A11_mean; 1-A22_mean A22_mean]
# plot the distributions
plot_dist(Q[!,:Inflow], μ, σ, A_mcmc)
This code block will output the following figure:
From Figure 3, we can observe that the dry state streamflow distribution is different from that of the wet state. The HMM model correctly estimates a lower mean for the dry state, and a higher one for the dry state. Based on visual analysis, the states have similar standard deviations, but this is likely an artifact of the prior parameterization of the initial HMM model that we set up earlier. The step we took earlier to force the wet/dry state distributions given a sampled log-streamflow value also likely prevented mode collapse and enabled the identification of two distinct distributions.
Now, let’s see if our HMM can be used to generate synthetic streamflow. We do this using the following lines of code:
# plot the FDC of the sampled inflow around the historical
sampled_obs_mcmc = zeros(1000, n_years)
Q_predict_mcmc = transpose(logQ_predict)
for i in 1:1000
sampled_obs_mcmc[i,:] = exp.(Q_predict_mcmc[i, :])
sampled_obs_mcmc[i,:] = sort!(sampled_obs_mcmc[i, :], rev=true)
end
hist_obs = copy(Q[!,:Inflow])
hist_obs = sort!(hist_obs, rev=true)
probs = (1:n_years)/n_years
plot(probs, hist_obs, c=:grey, linewidth=2, label="Historical inflow",
xlabel="Exceedance prob.", ylabel="Annual inflow (cubic ft/yr)")
for i in 1:1000
if i == 1
plot!(probs, sampled_obs_mcmc[i,:], c=:lightgreen, linewidth=2,
label="Sampled inflow (MCMC)", xlabel="Exceedance prob.", ylabel="Annual inflow (cubic ft/yr)")
end
plot!(probs, sampled_obs_mcmc[i,:], c=:lightgreen, linewidth=2, legend=false,
xlabel="Exceedance prob.", ylabel="Annual inflow (cubic ft/yr)")
end
plot!(probs, hist_obs, c=:grey, linewidth=2, legend=false,
xlabel="Exceedance prob.", ylabel="Annual inflow (cubic ft/yr)")
Once implemented, the following figure should be output:
This demonstrates that the HMM model is capable of generating synthetic streamflow that approximates the statistical properties of the historical timeseries, while expanding upon the range of uncertainty in streamflow scenarios. This has implications for applications such as exploratory modeling of different streamflow scenarios, where such approaches can be used to generate a wetter- or dryer-than-historical streamflows.
Summary
While using MCMC to parameterize a HMM is a useful method to generate synthetic streamflow, one significant drawback is the long runtime and the large number of iterations that the MCMC algorithm takes to converge (Table 3).
This is likely due to the HMM sampling setup. Notice that in lines 27 and 33-27 in the HMM_mcmc
function: we are essentially fitting an additional probability distribution at each st essentially turning them into tunable parameters themselves. Since there are 105 elements in the streamflow timeseries, this introduces an additional 105 parameters for the MCMC to fit distributions to. This is the likely cause for the lengthy runtime. While we will not explore alternative HMM-MCMC parameterization approaches, more efficient alternatives definitely exist, and you are highly encouraged to check it out!
Overall, we began this post with a quick introduction to HMM and MCMC sampling. We then compared the pros and cons of using the MCMC approach to parameterize a HMM. Once we understood its benefits and drawbacks, we formulated a HMM for a 105-year long timeseries of annual streamflow and examined the performance of the model by analyzing its converged parameter values, observation probability distributions, as well as its ability to detect wet/dry states in the historical time series. We also used the HMM to generate 1,000 synthetic streamflow scenarios. Finally, we diagnosed the cause for the long time to convergence of the MCMC sampling approach.
This brings us to the end of the post! I hope you now have a slightly better understanding of HMMs, how they can be parameterized using the MCMC approach, as well as issues you might run into when you implement the latter.
Happy coding!
References
Akintug, B., & Rasmussen, P. F. (2005). A Markov switching model for annual hydrologic time series. Water Resources Research, 41(9). https://doi.org/10.1029/2004wr003605
Bracken, C., Rajagopalan, B., & Zagona, E. (2014). A hidden markov model combined with climate indices for Multidecadal Streamflow Simulation. Water Resources Research, 50(10), 7836–7846. https://doi.org/10.1002/2014wr015567
Hadjimichael, A., Quinn, J., & Reed, P. (2020). Advancing diagnostic model evaluation to better understand water shortage mechanisms in institutionally complex river basins. Water Resources Research, 56(10), e2020WR028079.
Hadjimichael, A., Quinn, J., Wilson, E., Reed, P., Basdekas, L., Yates, D., & Garrison, M. (2020a). Defining robustness, vulnerabilities, and consequential scenarios for diverse stakeholder interests in institutionally complex river basins. Earth’s Future, 8(7), e2020EF001503.
Reed, P.M., Hadjimichael, A., Malek, K., Karimi, T., Vernon, C.R., Srikrishnan, V., Gupta, R.S., Gold, D.F., Lee, B., Keller, K., Thurber, T.B, & Rice, J.S. (2022). Addressing Uncertainty in Multisector Dynamics Research [Book]. Zenodo. https://doi.org/10.5281/zenodo.6110623
Rabiner, L. R. (1990). A tutorial on Hidden Markov models and selected applications in speech recognition. Readings in Speech Recognition, 267–296. https://doi.org/10.1016/b978-0-08-051584-7.50027-9
Rydén, T. (2008). Em versus markov chain Monte Carlo for estimation of Hidden Markov models: A computational perspective. Bayesian Analysis, 3(4). https://doi.org/10.1214/08-ba326
Turek, D., de Valpine, P. and Paciorek, C.J. (2016) Efficient Markov Chain Monte Carlo Sampling for Hierarchical Hidden Markov Models, arXiv.org. Available at: https://arxiv.org/abs/1601.02698.