# Fitting Hidden Markov Models Part II: Sample Python Script

This is the second part of a two-part blog series on fitting hidden Markov models (HMMs). In Part I, I explained what HMMs are, why we might want to use them to model hydro-climatological data, and the methods traditionally used to fit them. Here I will show how to apply these methods using the Python package hmmlearn using annual streamflows in the Colorado River basin at the Colorado/Utah state line (USGS gage 09163500). First, note that to use hmmlearn on a Windows machine, I had to install it on Cygwin as a Python 2.7 library.

For this example, we will assume the state each year is either wet or dry, and the distribution of annual streamflows under each state is modeled by a Gaussian distribution. More states can be considered, as well as other distributions, but we will use a two-state, Gaussian HMM here for simplicity. Since streamflow is strictly positive, it might make sense to first log-transform the annual flows at the state line so that the Gaussian models won’t generate negative streamflows, so that’s what we do here.

After installing hmmlearn, the first step is to load the Gaussian hidden Markov model class with from hmmlearn.hmm import GaussianHMM. The fit function of this class requires as inputs the number of states (n_components, here 2 for wet and dry), the number of iterations to run of the Baum-Welch algorithm described in Part I (n_iter; I chose 1000), and the time series to which the model is fit (here a column vector, Q, of the annual or log-transformed annual flows). You can also set initial parameter estimates before fitting the model and only state those which need to be initialized with the init_params argument. This is a string of characters where ‘s’ stands for startprob (the probability of being in each state at the start), ‘t’ for transmat (the probability transition matrix), ‘m’ for means (mean vector) and ‘c’ for covars (covariance matrix). As discussed in Part I it is good to test several different initial parameter estimates to prevent convergence to a local optimum. For simplicity, here I simply use default estimates, but this tutorial shows how to pass your own. I call the model I fit on line 5 model.

Among other attributes and methods, model will have associated with it the means (means_) and covariances (covars_) of the Gaussian distributions fit to each state, the state probability transition matrix (transmat_), the log-likelihood function of the model (score) and methods for simulating from the HMM (sample) and predicting the states of observed values with the Viterbi algorithm described in Part I (predict). The score attribute could be used to compare the performance of models fit with different initial parameter estimates.

It is important to note that which state (wet or dry) is assigned a 0 and which state is assigned a 1 is arbitrary and different assignments may be made with different runs of the algorithm. To avoid confusion, I choose to reorganize the vectors of means and variances and the transition probability matrix so that state 0 is always the dry state, and state 1 is always the wet state. This is done on lines 22-26 if the mean of state 0 is greater than the mean of state 1.


from hmmlearn.hmm import GaussianHMM

def fitHMM(Q, nSamples):
# fit Gaussian HMM to Q
model = GaussianHMM(n_components=2, n_iter=1000).fit(np.reshape(Q,[len(Q),1]))

# classify each observation as state 0 or 1
hidden_states = model.predict(np.reshape(Q,[len(Q),1]))

# find parameters of Gaussian HMM
mus = np.array(model.means_)
sigmas = np.array(np.sqrt(np.array([np.diag(model.covars_),np.diag(model.covars_)])))
P = np.array(model.transmat_)

# find log-likelihood of Gaussian HMM
logProb = model.score(np.reshape(Q,[len(Q),1]))

# generate nSamples from Gaussian HMM
samples = model.sample(nSamples)

# re-organize mus, sigmas and P so that first row is lower mean (if not already)
if mus > mus:
mus = np.flipud(mus)
sigmas = np.flipud(sigmas)
P = np.fliplr(np.flipud(P))
hidden_states = 1 - hidden_states

return hidden_states, mus, sigmas, P, logProb, samples

# log transform the data and fit the HMM
logQ = np.log(AnnualQ)
hidden_states, mus, sigmas, P, logProb, samples = fitHMM(logQ, 100)



Okay great, we’ve fit an HMM! What does the model look like? Let’s plot the time series of hidden states. Since we made the lower mean always represented by state 0, we know that hidden_states == 0 corresponds to the dry state and hidden_states == 1 to the wet state.


from matplotlib import pyplot as plt
import seaborn as sns
import numpy as np

def plotTimeSeries(Q, hidden_states, ylabel, filename):

sns.set()
fig = plt.figure()

xs = np.arange(len(Q))+1909
ax.plot(xs, Q, c='k')

ax.set_xlabel('Year')
ax.set_ylabel(ylabel)
handles, labels = plt.gca().get_legend_handles_labels()
fig.legend(handles, labels, loc='lower center', ncol=2, frameon=True)
fig.savefig(filename)
fig.clf()

return None

plt.switch_backend('agg') # turn off display when running with Cygwin
plotTimeSeries(logQ, hidden_states, 'log(Flow at State Line)', 'StateTseries_Log.png') Wow, looks like there’s some persistence! What are the transition probabilities?


print(model.transmat_)



Running that we get the following:

[[ 0.6794469   0.3205531 ]
[ 0.34904974  0.65095026]]

When in a dry state, there is a 68% chance of transitioning to a dry state again in the next year, while in a wet state there is a 65% chance of transitioning to a wet state again in the next year.

What does the distribution of flows look like in the wet and dry states, and how do these compare with the overall distribution? Since the probability distribution of the wet and dry states are Gaussian in log-space, and each state has some probability of being observed, the overall probability distribution is a mixed, or weighted, Gaussian distribution in which the weight of each of the two Gaussian models is the unconditional probability of being in their respective state. These probabilities make up the stationary distribution, π, which is the vector solving the equation π = πP, where P is the probability transition matrix. As briefly mentioned in Part I, this can be found using the method described here: π = (1/ Σi[ei])e in which e is the eigenvector of PT corresponding to an eigenvalue of 1, and ei is the ith element of e. The overall distribution for our observations is then Y ~ π0N(μ0,σ02) + π1*N(μ1,σ12). We plot this distribution and the component distributions on top of a histogram of the log-space annual flows below.


from scipy import stats as ss

def plotDistribution(Q, mus, sigmas, P, filename):

# calculate stationary distribution
eigenvals, eigenvecs = np.linalg.eig(np.transpose(P))
one_eigval = np.argmin(np.abs(eigenvals-1))
pi = eigenvecs[:,one_eigval] / np.sum(eigenvecs[:,one_eigval])

x_0 = np.linspace(mus-4*sigmas, mus+4*sigmas, 10000)
fx_0 = pi*ss.norm.pdf(x_0,mus,sigmas)

x_1 = np.linspace(mus-4*sigmas, mus+4*sigmas, 10000)
fx_1 = pi*ss.norm.pdf(x_1,mus,sigmas)

x = np.linspace(mus-4*sigmas, mus+4*sigmas, 10000)
fx = pi*ss.norm.pdf(x,mus,sigmas) + \
pi*ss.norm.pdf(x,mus,sigmas)

sns.set()
fig = plt.figure()
ax.hist(Q, color='k', alpha=0.5, density=True)
l1, = ax.plot(x_0, fx_0, c='r', linewidth=2, label='Dry State Distn')
l2, = ax.plot(x_1, fx_1, c='b', linewidth=2, label='Wet State Distn')
l3, = ax.plot(x, fx, c='k', linewidth=2, label='Combined State Distn')

handles, labels = plt.gca().get_legend_handles_labels()
fig.legend(handles, labels, loc='lower center', ncol=3, frameon=True)
fig.savefig(filename)
fig.clf()

return None

plotDistribution(logQ, mus, sigmas, P, 'MixedGaussianFit_Log.png') Looks like a pretty good fit – seems like a Gaussian HMM is a decent model of log-transformed annual flows in the Colorado River at the Colorado/Utah state line. Hopefully you can find relevant applications for your work too. If so, I’d recommend reading through this hmmlearn tutorial, from which I learned how to do everything I’ve shown here.

# Fitting Hidden Markov Models Part I: Background and Methods

Hydro-climatological variables often exhibit long-term persistence caused by regime-shifting behavior in the climate, such as the El Niño-Southern Oscillations (ENSO). One popular way of modeling this long-term persistence is with hidden Markov models (HMMs) [Thyer and Kuczera, 2000; Akintug and Rasmussen, 2005; Bracken et al., 2014]. What is an HMM? Recall from my five blog posts on weather generators, that the occurrence of precipitation is often modeled by a (first order) Markov model in which the probability of rain on a given day depends only on whether or not it rained on the previous day. A (first order) hidden Markov model is similar in that the climate “state” (e.g., wet or dry) at a particular time step depends only on the state from the previous time step, but the state in this case is “hidden,” i.e. not observable. Instead, we only observe a random variable (discrete or continuous) that was generated under a particular state, but we don’t know what that state was.

For example, imagine you are a doctor trying to diagnose when an individual has the flu. On any given day, this person is in one of two states: sick or healthy. These states are likely to exhibit great persistence; when the person gets the flu, he/she will likely have it for several days or weeks, and when he/she is heathy, he/she will likely stay healthy for months. However, suppose you don’t have the ability to test the individual for the flu virus and can only observe his/her temperature. Different (overlapping) distributions of body temperatures may be observed depending on whether this person is sick or healthy, but the state itself is not observed. In this case, the person’s temperature can be modeled by an HMM.

So why are HMMs useful for describing hydro-climatological variables? Let’s go back to the example of ENSO. Maybe El Niño years in a particular basin tend to be wetter than La Niña years. Normally we can observe whether or not it is an El Niño year based on SST anomalies in the tropical Pacific, but suppose we only have paleodata of tree ring widths. We can infer from the tree ring data (with some error) what the total precipitation might have been in each year of the tree’s life, but we may not know what the SST anomalies were those years. Or even if we do know the SST anomalies, maybe there is another more predictive regime-shifting teleconnection we haven’t yet discovered. In either case, we can model the total annual precipitation with an HMM.

What is the benefit of modeling precipitation in these cases with an HMM as opposed to say, an autoregressive model? Well often the year to year correlation of annual precipitation may not actually be that high, but several consecutive wet or consecutive dry years are observed [Bracken et al., 2014]. Furthermore, paleodata suggests that greater persistence (e.g. megadroughts) in precipitation is often observed than would be predicted by autoregressive models [Ault et al., 2013; Ault et al., 2014]. This is where HMMs may come in handy.

Here I will explain how to fit HMMs generally, and in Part II I will show how to apply these methods using the Python package hmmlearn. To understand how to fit HMMs, we first need to define some notation. Let Yt be the observed variable at time t (e.g., annual streamflow). The distribution of Yt depends on the state at time t, Xt (e.g., wet or dry). Let’s assume for simplicity that our observations can be modeled by Gaussian distributions. Then f(Yt | Xt = i) ~ N(μi,σi 2) and f(Yt | Xt = j) ~ N(μj,σj 2) for a two-state HMM. The state at time t, Xt, depends on the state at the previous time step, Xt-1. Let P be the state transition matrix, where each element pi,j represents the probability of transitioning from state i at time t to state j at time t+1, i.e. pij = P(Xt+1 = j | Xt = i). P is a n x n matrix where n is the number of states (e.g. 2 for wet and dry). In all Markov models (hidden or not), the unconditional probability of being in each state, π can be modeled by the equation π = πP, where π is a 1 x n vector in which each element πi represents the unconditional probability of being in state i, i.e. πi = P(Xt = i). π is also called the stationary distribution and can be calculated from P as described here. Since we have no prior information on which to condition the first set of observations, we assume the initial probability of being in each state is the stationary distribution.

In fitting a two-state Gaussian HMM, we therefore need to estimate the following vector of parameters: θ = [μ0, σ0, μ1, σ1, p00, p11]. Note p01 = 1 – p00 and p10 = 1 – p11. The most common approach to estimating these parameters is through the Baum-Welch algorithm, an application of Expectation-Maximization built off of the forward-backward algorithm. The first step of this process is to set initial estimates for each of the parameters. These estimates can be random or based on an informed prior. We then begin with the forward step, which computes the joint probability of observing the first t observations and ending up in state i at time t, given the initial parameter estimates: P(Xt = i, Y1 = y1, Y2 = y2, …, Yt = yt | θ). This is computed for all t ϵ {1, …, T}. Then in the backward step, the conditional probability of observing the remaining observations after time t given the state observed at time t is computed: P(Yt+1 = yt+1, …, YT = yT | Xt=i, θ). Using Bayes’ theorem, it can shown that the product of the forward and backward probabilities is proportional to the probability of ending up in state i at time t given all of the observations, i.e. P(Xt = i | Y1 = y1,…, YT = yT, θ). This is derived below:

1) $P(X_t=i \vert Y_1=y_1,..., Y_T=y_T, \theta) = \frac{P(Y_1=y_1, ..., Y_T=y_T \vert X_t=i, \theta) P(X_t=i \vert \theta)}{P(Y_1=y_1, ..., Y_T=y_t \vert \theta)}$

2) $P(X_t=i \vert Y_1=y_1,..., Y_T=y_T, \theta) = \frac{P(Y_1=y_1, ..., Y_t=y_t \vert X_t=i, \theta) P(Y_{t+1} = y_{t+1}, ..., Y_T=y_T \vert X_t=i, \theta) P(X_t=i \vert \theta)}{P(Y_1=y_1, ..., Y_T=y_t \vert \theta)}$

3) $P(X_t=i \vert Y_1=y_1,..., Y_T=y_T, \theta) = \frac{P(X_t=i, Y_1=y_1, ..., Y_t=y_t \vert \theta) P(Y_{t+1} = y_{t+1}, ..., Y_T=y_T \vert X_t=i, \theta)}{P(Y_1=y_1, ..., Y_T=y_t \vert \theta)}$

4) $P(X_t=i \vert Y_1=y_1,..., Y_T=y_T, \theta) \propto P(X_t=i, Y_1=y_1, ..., Y_t=y_t \vert \theta) P(Y_{t+1}=y_{t+1}, ..., Y_T=y_T \vert X_t=i, \theta)$

The first equation is Bayes’ Theorem. The second equation is derived by the conditional independence of the observations up to time t (Y1, Y2, …, Yt) and the observations after time t (Yt+1, Yt+2, …, YT), given the state at time t (Xt). The third equation is derived from the definition of conditional probability, and the fourth recognizes the denominator as a normalizing constant.

Why do we care about the probability of ending up in state i at time t given all of the observations (the left hand side of the above equations)? In fitting a HMM, our goal is to find a set of parameters, θ, that maximize this probability, i.e. the likelihood function of the state trajectories given our observations. This is therefore equivalent to maximizing the product of the forward and backward probabilities. We can maximize this product using Expectation-Maximization. Expectation-Maximization is a two-step process for maximum likelihood estimation when the likelihood function cannot be computed directly, for example, because its observations are hidden as in an HMM. The first step is to calculate the expected value of the log likelihood function with respect to the conditional distribution of X given Y and θ (the left hand side of the above equations, or proportionally, the right hand side of equation 4). The second step is to find the parameters that maximize this function. These parameter estimates are then used to re-implement the forward-backward algorithm and the process repeats iteratively until convergence or some specified number of iterations. It is important to note that the maximization step is a local optimization around the current best estimate of θ. Hence, the Baum-Welch algorithm should be run multiple times with different initial parameter estimates to increase the chances of finding the global optimum.

Another interesting question beyond fitting HMMs to observations is diagnosing which states the observations were likely to have come from given the estimated parameters. This is often performed using the Viterbi algorithm, which employs dynamic programming (DP) to find the most likely state trajectory. In this case, the “decision variables” of the DP problem are the states at each time step, Xt, and the “future value function” being optimized is the probability of observing the true trajectory, (Y1, …,YT), given those alternative possible state trajectories. For example, let the probability that the first state was k be V1,k. Then V1,k = P(X1 = k) = P(Y1 = y1 | X1 = k)πk. For future time steps, Vt,k = P(Yt = yt | Xt = k)pik*Vt-1,i where i is the state in the previous time step. Thus, the Viterbi algorithm finds the state trajectory (X1, …, XT) maximizing VT,k.

Now that you know how HMMs are fit using the Baum-Welch algorithm and decoded using the Viterbi algorithm, read Part II to see how to perform these steps in practice in Python!

# Synthetic Weather Generation: Part III

Multi-Site Weather Generators

This is the third part of a blog series on synthetic weather generators. In Parts I and II, I described common parametric and non-parametric methods of weather generation, respectively. Both of these blogs only discussed how to use these methods to generate daily weather simulations at one site. However, we are often interested in generating weather at multiple sites simultaneously, for instance, if we are using the weather as input to a distributed watershed model. If there is a high degree of spatial heterogeneity within the system being modeled, applying average weather conditions everywhere will not provide representative simulations of the hydrology, nor will independently generating weather at multiple sites within the system. Fortunately, several techniques have been developed to correlate synthetic weather sequences generated at multiple nearby sites that are consistent with observed data.

This blog will focus on two methods of generating correlated weather data: 1) driving multiple single-site parametric weather generators with spatially correlated random numbers (Wilks, 1998; Wilks 1999; Wilks, 2008; Wilks, 2009; Srikanthan and Pegram, 2009; Baigorria and Jones, 2010) and 2) resampling concurrent weather data from multiple sites with a non-parametric weather generator (Buishand and Brandsma, 2001; Beersma and Buishand, 2003; Apipattanavis et al., 2007; Steinschneider and Brown, 2013). Other methods exist, such as jointly sampling weather at multiple sites from a copula (Bardossy and Pegram, 2009; Serinaldi, 2009, Aghakouchak et al., 2010a; Aghakouchak et al., 2010b), modeling rainfall occurrence and amounts processes with generalized linear models (GLMs) (Kigobe et al., 2011), and disaggregating spatially averaged synthetic rainfall with a nonhomogeneous random cascade process (Jothityangkoon et al., 2000), but will not be discussed here.

Method 1

The first method, driving multiple single-site weather generators with spatially correlated random numbers, is described by Wilks (1998) for use with parametric generators of the Richardson type described in Part I. In the single-site formulation of this model, the first step is to generate precipitation occurrences at each site. Normally, this is done by generating a uniform random number, u ~ U[0,1], and comparing it to the critical probably, pc, of transitioning from the current state (wet or dry) to a wet state. If upc, the state transitions to a wet state; otherwise it transitions to a dry state.

An alternative to generating uniform random numbers for the transition computation is to generate standard normal random numbers, w ~ N(0,1). In this case, state transitions from the current state to a wet state occur if w ≤ Φ-1(pc), where Φ-1 (·) is the inverse of the standard normal CDF. The benefit to using this alternative is that it is easy to generate correlated normally distributed random variables (Wilks, 2011, pp.499-500), and we will need to correlate the random number streams generated at each of K sites to ensure realistic weather simulations.

Let Xt(k) and Xt(l) be the precipitation states on day t at sites k and l, respectively, where X = 0 corresponds to a dry day and X = 1 corresponds to a wet day. In order to simulate realistic sequences of precipitation states at these two sites, one must find the correlation between the normally distributed random variables driving the transitions, ωo(k,l) = ρ(wt(k), wt(l)), that reproduces the historical correlation between the precipitation states, ξo(k,l) = ρ(Xt(k), Xt(l)). Unfortunately, this relationship must be calculated empirically by simulating precipitation states at two sites for different values of ω and calculating the corresponding values of ξ. This must be done for every one of the K(K-1)/2 station pairs. Figure 1 shows one such relationship for the July transition parameters at Ithaca and Freeville from Wilks (1998). Once the relationship between ξ and ω has been mapped for each station pair, one can determine the correlation ωo(k,l) that should be used for the weather generator. In the case of Ithaca and Freeville, the historical correlation in precipitation states between the two sites, ξo, is 0.800. The dotted line in Figure 1 shows that ξo = 0.8 corresponds to a value of ωo = 0.957. Once all of the K(K-1)/2 pairs of correlations ωo(k,l) have been determined, one can generate a daily vector of standard normal random variables, wt, from a multi-variate normal distribution with mean vector 0 and correlation matrix [Ω], whose elements are the correlations ωo(k,l). One drawback to this method is that, while it does a good job replicating the lag 0 correlations between sites, it tends to under-represent lag 1 correlations between them (Wilks, 1998).

As in the single site weather generator, after simulating whether or not rain occurs at a particular site each day, the next step is to determine the depth of rain on wet days. Not surprisingly, the conditional distribution of rainfall amounts at a particular site depends on whether or not it is raining at nearby sites. A convenient way to accommodate this is to generate the precipitation amounts at wet sites from the mixed exponential distribution (Wilks, 1998).

Recall that the mixed exponential distribution, f(x) = αλ1exp(-λ1x) + (1–α) λ2exp(-λ2x), has three parameters: two rate parameters, λ1 and λ2 where λ1 < λ2, and a weight parameter, α. Thus, with probability α the rainfall amounts are generated from an exponential distribution with rate λ1, and with probability 1–α they are generated from an exponential distribution with rate λ2. Since the mean of the exponential distribution is 1/λ and λ1 < λ2, rainfall amounts generated from an exponential distribution with rate λ1 will have a greater mean than those generated from an exponential distribution with rate λ2. Therefore, rainfall amounts at sites neighboring dry sites are more likely to have come from an exponential distribution with rate λ2, while rainfall amounts at wet sites neighboring other wet sites are more likely to have come from an exponential distribution with rate λ1.

Keeping this in mind, one can conditionally generate rainfall amounts at a given site, k, from one of these two exponential distributions. As stated earlier, if w ≤ Φ-1(pc), or equivalently if Φ(w(k)) < pc(k), then rainfall is generated at site k. Because the transition probabilities at nearby sites are correlated, if Φ(w(k)) << pc(k), then it is very likely that nearby sites also transitioned to a wet state, so rainfall amounts are likely high. Therefore, if Φ(w(k))/pc(k) ≤ α(k), rainfall at the site should be generated from the exponential distribution with the greater mean, that with rate λ1. On the contrary, if Φ(w(k)) is less than, but close to pc(k), it is very likely that nearby sites did not transition to a wet state, so rainfall amounts are likely lower. Therefore, if Φ(w(k))/pc(k) > α(k), rainfall at the site should be generated from the exponential distribution with the smaller mean, that with rate λ2 (Wilks, 1998).

This is a coarse method that can be used to decide on the rate parameter for rainfall generation; Wilks (1998) also describes a method to continuously vary the smaller rate parameter to improve the simulation of precipitation fields. Regardless of which method is chosen, once one has decided on the rate parameter, the rainfall amount itself can be generated from the exponential distribution with this parameter. Just as the precipitation state can be determined by generation of a standard normal random variable, so too can the precipitation amount. Letting the rainfall amount at a particular site k equal r(k), the depth of precipitation is simulated as r(k) = rmin­ – (1/λ)*ln(Φ(v(k))), where rmin is the minimum threshold below which a day is recorded as dry and λ is the rate parameter determined earlier.

The final step in precipitation simulation then is to generate a daily vector of correlated standard normal random variables, vt, that reproduces the historical spatial correlation structure in precipitation amounts across sites. This procedure is analogous to that used to reproduce the historical spatial correlation structure in precipitation occurrences across sites. However, Wilks (1998) found that this procedure did not always produce positive definite correlation matrices [Z] for generation of the multivariate normal random variables, vt. To overcome this problem, the correlations ζo(k,l) making up [Z] can be determined as smooth functions of the distance between station pairs. See Wilks (1998) for further details.

After generating spatially correlated precipitation, one must then generate spatially correlated minimum temperature, maximum temperature and solar radiation. Recall from Part I that in single site weather generators of the Richardson type, these three variables are generated from a first order vector auto-regressive (VAR(1)) model. As Wilks (1999) points out, this method can easily be extended by fitting a VAR(1) model to the simultaneous minimum temperature, maximum temperature and solar radiation across all sites, rather than just at one site. This increases the dimensionality from 3 for the 3 variables at one site, to 3K for the 3 variables at K sites. However, Wilks cautions that this may be difficult due to limited solar radiation data at many sites and inconsistent correlations if the record lengths are different at the K sites. See Wilks (1999) for details on how one can adjust the observed historical correlations such that they are mutually consistent.

The method just described will generate weather at multiple sites with long enough historical records to fit the weather generator parameters. One can also interpolate these parameters using locally weighted regressions (Wilks, 2008) to generate weather at arbitrary locations between stations with historical weather records. Wilks (2009) shows how to do just that to generate gridded weather data, which is particularly useful for driving distributed hydrologic models.

Method 2

The second method, resampling concurrent weather data from multiple sites with a non-parametric weather generator, is a simpler method and a natural extension of the non-parametric single site weather generator. As discussed in Part II, most non-parametric weather generators probabilistically re-sample weather data from the historical record based on how “near” the most recently generated daily weather characteristics are to previous days in the historical record. “Nearness” is typically measured using Euclidean distance.

Buishand and Brandsma (2001) modify the single-site approach to generate weather at multiple sites by describing nearness in terms of Euclidean distance of spatially averaged temperature and precipitation standardized anomalies. The simulation begins with a randomly selected day from the historical record. The temperature and precipitation on that day are standardized by subtracting the historical daily mean and dividing by the historical daily standard deviation. The standardized temperature and precipitation anomalies at each site are then averaged and compared to the spatially averaged standardized anomalies on all other historical days within some window of the current day using Euclidean distance. One of the k nearest neighbors (k-nn) is then probabilistically selected using the Kernel density estimator derived by Lall and Sharma (1996), and the weather at all sites on the following day in the historical record is chosen to be the weather on the next simulated day.

While this non-parametric method of multi-site weather generation is much simpler than the parametric method of Wilks (1998), Mehrotra (2006) found that the Wilks multi-site weather generator was better able to accommodate varying orders of serial dependence at each site and still maintain the historical correlation structure across sites. Similarly, Young (1994) found that the k-nn weather generation approach tended to under-simulate the lengths of wet and dry spells. For this reason, Apitpattanavis et al. (2007) adopt a semi-parametric multi-site weather generation approach that combines elements of each method.

The Apipattanavis generator, also employed by Steinschneider and Brown (2013), first generates spatially averaged precipitation states with a Markov chain model. To begin the simulation, the historical precipitation record is first converted to a time series of spatially averaged precipitation. If the spatially averaged rainfall on the first simulated day is below 0.3 mm, it is classified as a dry day. If it is greater than the 80th percentile of spatially averaged daily precipitation in the simulated month, it is classified as extremely wet. Otherwise it is simply wet. A three-state, first-order Markov chain is then used to simulate the next day’s spatially averaged precipitation state. One of the k-nearest neighbors from the historical record with the same transition (e.g. wet to extremely wet) is selected as in Buishand and Brandsma (2001). The weather at each site on the next simulated day is then chosen to be the weather at each site on the day following the selected neighbor. The main difference between these two methods then, is that Buishand and Brandsma (2001) consider all historical neighbors within some window of the simulated day, while Apipattanavis et al. (2007) only consider those with the same transition in precipitation states as is simulated.

All of the above described methods can reasonably reproduce the spatial and temporal dependence structure of weather at multiple sites, while maintaining the moments at each site. Additionally, they can be modified conditional on short-term, seasonal or long-term forecasts of climate change. That will be the subject of the next blog in this five-part series.

Works Cited

Aghakouchak, A., Bárdossy, A. & Habib, E. (2010a). Conditional simulation of remotely sensed rainfall data using a non-Gaussian v-transformed copula. Advances in Water Resources, 33, 624-634.

—. (2010b). Copula-based uncertainty modelling: application to multisensory precipitation estimates. Hydrological Processes, 24, 2111-2124.

Apipattanavis, S., Podesta, G., Rajagopalan, B., & Katz, R. W. (2007). A semiparametric multivariate and multisite weather generator. Water Resources Research, 43(11).

Baigorria, G. A., & Jones, J. W. (2010). GiST: A stochastic model for generating spatially and temporally correlated daily rainfall data. Journal of Climate, 23(22), 5990-6008.

Bárdossy, A., & Pegram, G. G. S. (2009). Copula based multisite model for daily precipitation simulation. Hydrology and Earth System Sciences, 13, 2299-2314.

Beersma, J. J., & Buishand, T. A. (2003). Multi-site simulation of daily precipitation and temperature conditional on the atmospheric circulation. Climate Research, 25, 121-133.

Buishand, T. A., & Brandsma, T. (2001). Multsite simulation of daily precipitation and temperature in the Rhine basin by nearest-neighbor resampling. Water Resources Research, 37(11), 2761-2776.

Jothityangkoon, C., Sivalapan, M., & Viney, N. R. (2000). Tests of a space-time model of daily rainfall in southwestern Australia based on nonhomogeneous random cascades. Water Resources Research, 36(1), 267-284.

Kigobe, M., McIntyre, N., Wheater, H., & Chandler, R. (2011). Multi-site stochastic modelling of daily rainfall in Uganda. Hydrological Sciences Journal, 56(1), 17-33.

Lall, U., & Sharma, A. (1996). A nearest neighbor bootstrap for resampling hydrologic time series. Water Resources Research, 32(3), 679-693.

Mehrotra, R., Srikanthan, R., & Sharma, A. (2006). A comparison of three stochastic multi-site precipitation occurrence generators. Journal of Hydrology, 331(1-2), 280-292.

Richardson, C. W. (1981). Stochastic simulation of daily precipitation, temperature and solar radiation. Water Resources Research, 17, 182-190.

Serinaldi, F. (2009). A multisite daily rainfall generator driven by bivariate copula-based mixed distributions. Journal of Geophysical Research: Atmospheres, 114(D10).

Srikanthan, R., & Pegram, G. G. S. (2009). A nested multisite daily rainfall stochastic generation model. Journal of Hydrology, 371, 142-153.

Steinschneider, S., & Brown, C. (2013). A semiparametric multivariate, multisite weather generator with low-frequency variability for use in climate risk assessments. Water Resources Research, 49, 7205-7220.

Wilks, D. S. (1998). Multisite generalization of a daily stochastic precipitation generation model. Journal of Hydrology, 210, 178-191.

Wilks, D. S. (1999). Simultaneous stochastic simulation of daily precipitation, temperature and solar radiation at multiple sites in complex terrain. Agricultural and Forest Meteorology, 96, 85-101.

Wilks, D. S. (2008). High-resolution spatial interpolation of weather generator parameters using local weighted regressions. Agricultural and Forest Meteorology, 148, 111-120.

Wilks, D. S. (2009). A gridded multisite weather generator and synchronization to observed weather data. Water Resources Research, 45(10).

Wilks, D. S. (2011). Statistical methods in the atmospheric sciences (Vol. 100). Academic press.

Young, K. C. (1994). A multivariate chain model for simulating climatic parameters from daily data. Journal of Applied Meteorology33(6), 661-671.