Ensemble forecasting – theory

‘Does the flap of a butterfly’s wing in Brazil stir up a tornado in Texas?’ These words reflect the so-called ‘butterfly effect’ typically ascribed to Edward Lorenz (MIT), who was a pioneer in the field of numerical weather prediction (NWP). (Interestingly, Lorenz himself first referred to the ‘flap of a seagulls’ wing’; the butterfly swap out was the work of a creative conference session convener. Somewhat coincidentally, the shape of the ‘Lorenz attractor’ is also reminiscent of a butterfly, see Figure 1). Motivated by the very salient challenge in the post-WWII era to improve weather prediction capabilities for strategic purposes, he developed a theoretical framework for the predictability of non-linear dynamical systems in a seminal paper, ‘Deterministic nonperiodic flow’ (Lorenz, 1963), that would come to be known as ‘chaos theory’ or ‘chaotic dynamics’. This theory was foundational to the development of ensemble forecasting (and a whole host of other things), and still underpins our current theoretical understanding of the inherent predictability of complex dynamical systems like the atmosphere (Lorenz, 2006).


Figure 1. Book cover (left panel) and the characteristic ‘butterfly wing’ shape of the Lorenz attractor (right panel) from his seminal 1963 work. (Palmer, 1993)

In actuality, the whole ‘flap of a butterfly’s wing creating a tornado in Texas’ relates specifically to only one aspect of the theoretical framework developed by Lorenz. In this post, I will attempt to give a slightly fuller treatment of this theoretical framework and its development. My hope is that this post will be a theoretical primer for two follow-on practically oriented posts: 1) a discussion of the state-of-the-science of hydrometeorological ensemble forecasting and its application in emerging water resources management practices like forecast informed reservoir operations, and 2) a deeper dive into ensemble verification techniques. While the techniques for (2) are grounded in the theoretical statistical properties of stochastic-dynamic predictive ensembles, they have broad applications to any scenario where one needs to diagnose the performance of an ensemble.

The Lorenz model and ‘chaos’ theory

When most people hear the word ‘chaos’, they tend to think of the dictionary definition: ‘complete disorder and confusion’ (Oxford), which in a scientific context might aptly describe some sort of white noise process. As you will see, this is somewhat of a far cry from the ‘chaos’ described in Lorenz’s theory. The ‘chaos’ terminology was actually coined by a later paper building on Lorenz’s work (Li & Yorke, 1975) and as noted by Wilks (2019): ‘…this label is somewhat unfortunate in that it is not really descriptive of the sensitive-dependence phenomenon’. The sensitive-dependence phenomenon is one of a set of 4 key properties (to be discussed later) that characterize the behaviors of the sort of non-linear, non-periodic deterministic systems that Lorenz argued were most representative of the atmosphere. In contrast to ‘disorder’ and ‘confusion’, these properties, in fact, lend some sense of order to the evolution of these dynamical systems, although the nature of that order is highly dependent on the system state.

A deterministic, non-periodic systems model

First, let’s dive into a few details of Lorenz’s 1963 work, using some figures and descriptions from a later paper (Palmer, 1993) that are easier to follow. While the 1963 paper is quite dense, much of the mathematical logic is dedicated to justifying the use of a particular system of equations that forms the basis of the study. This system of 3 equations and 3 variables (X, Y, Z) describes a non-linear, dissipative model of convection in a fluid of uniform depth, where there is a temperature difference between the upper and lower surfaces. Lorenz derived the set of 3 equations shown in the upper left panel of Figure 2 from earlier work by Rayleigh (1916) on this particular problem. In short, X relates to the intensity of convective motion, Y relates to the temperature difference between ascending and descending currents, and Z relates to the distortion of the vertical temperature profile from linearity; the details of these variables are not actually all that important to the study. What is important is that this system of equations has no general solutions (aside from the steady state solution) and must be numerically integrated in the time dimension to determine the convective flow dynamics. The ‘trajectories’ of these integrations shown in the phase space diagrams in Figure 2 exhibit the sorts of unstable, non-periodic behaviors that Lorenz thought were the most useful analog to atmospheric dynamics, albeit in a much simpler system. (‘Much’ is an understatement here; modern NWP models have a phase space on the order of 109 in contrast to the 3-dimensional phase space of this problem, Wilks, 2019. Of note, the use of phase-space diagrams (i.e. plots where each axis corresponds to a dependent variable in the system of equations) preceded Lorenz, but his use of them is perhaps one of the best-known early instances of this kind of visualization. Other uses of phase space relationships can be found in Rohini’s post or Dave’s post)

Figure 2. a) Lorenz equations and the XY phase space diagram of their integrations. b-c) X variable timeseries of two separate integrations of the Lorenz equations from very similar initial conditions. (Palmer, 1993)

Regime structure and multiple distinct timescales

What ‘behaviors’ then, are we talking about? Referring again to Figure 2a, we see the projection of a long series of model integration trajectories onto the XY-plane of the phase space diagram, where each dot is a timestep in the integration. What is immediately apparent in the form of these integrations is the two lobed ‘butterfly’ shape of the diagram. Each lobe has a central attractor where the trajectories often reside for multiple revolutions, but can then transition to the other attractor when passing near the center of the diagram. These so-called ‘Lorenz attractors’ comprise one of the key properties of chaotic dynamics, namely regime structure, which is tendency of the trajectories to reside around phase space attractors for some period of time. This residence time in a regime is generally quite a bit longer than the timescales of the trajectories’ individual revolutions around an attractor. This attribute is referred to as multiple distinct timescales and is evidenced in Figure 2b-c, where smaller amplitude sections of the timeseries show residence in one or the other regime and large amplitude sections describe transitions between regimes. Often, but not always, the residence in these regimes occurs for multiple revolutions, suggesting that there are shorter timescale evolutions of the system that take place within these regimes, while infrequent, random shifts to the other regimes occur at longer timescales.

Sensitive-dependence and state-dependent variation in predictability

Figure 3. a-c) Different trajectories through the XY phase space dependent on the initial condition state-space (black circle). (Palmer, 1993)

Returning now to the ‘butterfly effect’; what, then, is this whole sensitive-dependence thing mentioned earlier? Figure 3a-c provide a nice graphical representation of this phenomenon. In each panel, a set of nearby initial states are chosen at different location in the phase space diagram and then are followed through their set of trajectories. In 3a, these trajectories neatly transition from one regime to the other, remaining relatively close to each other at the trajectories’ end. In 3b, a set of initial states not so far from 3a are chosen and instead of neatly transitioning to the other regime, they diverge strongly near the center of the diagram, with some trajectories remaining in the original regime, and some transitioning. However, for about half of the timespan, the trajectories remain very close to one another. Finally, an initial state chosen near the center of the diagram (3c) diverges quickly into both regimes, with trajectories ending up at nearly opposite ends of the phase space (black tails at upper right and left of diagram). Figures b-c, in particular, showcase the sensitive-dependence on initial conditions attributes of the system. In other words, from a set of very close-by initial states, the final trajectories in phase space can yield strongly divergent results. Importantly, this divergence in trajectories over some period of time can occur right away (3c), at some intermediate time (3b), or not at all (3a).

This is the basic idea behind the last core property of these systems, state-dependent variation in predictability. Clearly, a forecast initialized from the starting point in 3a could be a little bit uncertain about the exact starting state and still end up in about the right spot for an accurate future prediction at the end of the forecast horizon. At medium ranges, this is also the case for 3b, but the longer range forecast is highly uncertain. For 3c, all forecast ranges are highly uncertain; in other words, the flap of a butterfly’s wing can mean the difference between one or the other trajectory! Importantly, one can imagine in this representation that an average value of 3c’s trajectories (i.e. the ensemble mean) would fall somewhere in the middle of the phase space and be representative of none of the two physically plausible trajectories into the right or left regimes. This is an important point that we’ll return to at the end of this post.

From the Lorenz model to ensemble forecasting

The previous sections have highlighted this idealized dynamical system (Lorenz model) that theoretically has properties of the actual atmosphere. Those 4 properties (the big 4!) were: 1) sensitive-dependence on initial conditions, 2) regime structure, 3) multiple distinct timescales, and 4) state-dependent variation in predictability. In this last section, I will tie these concepts into a theoretical discussion of ensemble forecasting. Notably, in the previous sections, I discussed ‘timescales’ without any reference to the real atmosphere. The dynamics of the Lorenz system prove to be quite illuminating about the actual atmosphere when timescales are attached to the system that are roughly on the order of the evolution of synoptic scale weather systems. If one assumes that a lap around the Lorenz attractor equates to a synoptic timescale of ~5 days, then the Lorenz model can be conceptualized in terms of relatively homogenous synoptic weather progressions occurring within two distinct weather regimes that typically persist on the order of weeks to months (see Figure 3b-c). This theoretical foundation jives nicely with the empirically derived weather-regime based approach (see Rohini’s post or Nasser’s post) where incidentally, 4 or more weather regimes are commonly found (The real atmosphere is quite a bit more complex than the Lorenz model after all). This discussion of the Lorenz model has hopefully given you some intuition that conceptualizing real world weather progressions outside the context of an appropriate regime structure could lead to some very unphysical representations of the climate system.

Ensemble forecasting as a discrete approximation of forecast uncertainty

In terms of weather prediction, though, the big 4 make things really tough. While there are certainly conditions in the atmosphere that can lead to excellent long range predictability (i.e. ‘forecasts of opportunity’, see Figure 3a), the ‘typical’ dynamics of the atmosphere yield the potential for multiple regimes and associated transitions within the short-to-medium range timeframe (1-15 days) where synoptic-scale hydro-meteorological forecasting is theoretically possible. (Note, by ‘synoptic-scale’ here, I am referring to the capability to predict actual weather events, not prevailing conditions. Current science puts the theoretical dynamical limit to predictability at ~2 weeks with current NWP technology achieving ‘usable’ skill out to ~7 days, give or take a few dependent on the region)

Early efforts to bring the Lorenz model into weather prediction sought to develop analytical methods to propagate initial condition uncertainty into a useful probabilistic approximation of the forecast uncertainty at various lead times. This can work for simplified representation of the atmosphere like the Lorenz model, but quickly becomes intractable as the complexity and scale of the governing equations and boundary conditions increases.

Figure 4. Conceptual depiction of ensemble forecasting including sampling of initial condition uncertainty, forecasts at different lead times, regime structure, and ensemble mean comparison to individual ensemble members. Solid arrow represents a ‘control’ member of the ensemble. Histograms represent an approximate empirical distribution of the ensemble forecast. (Wilks, 2019)

Thankfully, rapid advances in computing power led to an alternate approach, ensemble forecasting! Ensemble forecasting is a stochastic-dynamic approach that couples a discrete, probabilistic sampling of the initial condition uncertainty (Figure 4 ‘Initial time’) and propagates each of those initial condition state vectors through a dynamical NWP model. Each of these NWP integrations is a unique trajectory through state space of the dynamical model that yields a discrete approximation of the forecast uncertainty (Figure 4 ‘Intermediate/Final forecast lead time’). This discrete forecast uncertainty distribution theoretically encompasses the full space of potential hydro-meteorologic trajectories and allows a probabilistic representation of forecast outcomes through analysis of the empirical forecast ensemble distribution. These forecast trajectories highlight many of the big 4 properties discussed in previous sections, including regime structure and state-dependent predictability (Figure 4 trajectories are analogous to the Figure 3b trajectories for the Lorenz model). The ensemble mean prediction is an accurate and dynamically consistent prediction at the intermediate lead time, but at the final lead time where distinct regime partitioning has occurred, it is no longer dynamically consistent and delineates a region of low probability in the full ensemble distribution. I will explore properties of ensemble averaging, both good and bad, in a future post.

Lastly, I will note that the ensemble forecasting approach is a type of Monte Carlo procedure. Like other Monte Carlo approaches with complex systems, the methodology for sampling the initial condition uncertainty has a profound effect on the uncertainty quantification contained within the final NWP ensemble output, especially when considering the high degree of spatiotemporal relationships within the observed climatic variables that form the initial state vector. This is a key area of continued research and improvement in ensemble forecasting models.

Final thoughts

I hope that you find this discussion of the theoretical underpinnings of chaotic dynamics and ensemble forecasting to be useful. I have always found these foundational concepts to be particularly fascinating. Moreover, the basic theory has strong connections outside of ensemble forecasting, including the ties to weather regime theory mentioned in this post. I also think these foundational concepts are necessary to understand how much actual ensemble forecasting techniques can diverge from the theoretical stochastic-dynamic framework. This will be the subject of some future posts that will delve into more practical and applied aspects of ensemble forecasting with a focus on water resources management.

Reference

Lorenz, E. N. (1963). Deterministic nonperiodic flow. Journal of the Atmospheric Sciences, 20, 130–141.

Lorenz, E. N. (2006). Reflections on the Conception , Birth , and Childhood of Numerical Weather Prediction. Annual Review of Earth and Planetary Science, 34, 37–45. https://doi.org/10.1146/annurev.earth.34.083105.102317

Palmer, T. N. (1993). Extended-range atmospheric prediction and the Lorenz model. Bulletin – American Meteorological Society, 74(1), 49–65. https://doi.org/10.1175/1520-0477(1993)074<0049:ERAPAT>2.0.CO;2

Rayleigh, L. (1916). On convective currents in a horizontal layer when the higher temperature is on the underside. Phil. Mag., 32, 529-546.

Wilks, D. S., (2019). Statistical Methods in the Atmospheric Sciences, 4th ed. Cambridge, MA: Elsevier.

Introduction to Bayesian Regression using PyMC

Motivation

Fans of this blog will know that uncertainty is often a focus for our group. When approaching uncertainty, Bayesian methods might be of interest since they explicitly provide uncertainty estimates during the modeling process.

PyMC is the best tool I have come across for Bayesian modeling in Python; this post gives a super brief introduction to this toolkit.

Introduction to PyMC

PyMC, described in their own words:
“… is a probabilistic programming library for Python that allows users to build Bayesian models with a simple Python API and fit them using Markov chain Monte Carlo (MCMC) methods.”

In my opinion, the best part of PyMC is the flexibility and breadth of model design features. The space of different model configurations is massive. It allows you to make models ranging from simple linear regressions (shown here), to more complex hierarchical models, copulas, gaussian processes, and more.

Regardless of your model formulation, PyMC let’s you generate posterior estimates of model parameter distributions. These parameter distributions reflect the uncertainty in the model, and can propagate uncertainty into your final predictions.

The posterior estimates of model parameters are generated using Markov chain Monte Carlo (MCMC) methods. A detailed overview of MCMC is outside the scope of this post (maybe in a later post…).

In the simplest terms, MCMC is a method for estimating posterior parameter distributions for a Bayesian model. It generates a sequence of samples from the parameter space (which can be huge and complex), where the probability of each sample is proportional to its likelihood given the observed data. By collecting enough samples, MCMC generates an approximation of the posterior distribution, providing insights into the probable values of the model parameters along with their uncertainties. This is key when the models are very complex and the posterior cannot be directly defined.

The PyMC example gallery has lots of cool stuff to get you inspired, with examples that go far and beyond the simple linear regression case.


Demonstration:

When writing drafting this post, I wanted to include a demonstration which is (a) simple enough to cover in a brief post, and (b) relatively easy for others to replicate. I settled on the simple linear regression model described below, since this was able to be done using readily retrievable CAMELS data.

The example attempts to predict mean streamflow as a linear function of basin catchment area (both in log space). As you’ll see, it’s not the worst model, but its far from a good model; there is a lot of uncertainty!

CAMELS Data

For a description of the CAMELS dataset, see Addor, Newman, Mizukami and Clark (2017).

I pulled all of the national CAMELS data using the pygeohydro package from HyRiver which I have previously recommended on this blog. This is a convenient single-line code to get all the data:

import pygeohydro as gh

### Load camels data
camels_basins, camels_qobs = gh.get_camels()

The camels_basins variable is a dataframe with the different catchment attributes, and the camels_qobs is a xarray.Dataset. In this case we will only be using the camels_basins data.

The CAMELS data spans the continental US, but I want to focus on a specific region (since hydrologic patterns will be regional). Before going further, I filter the data to keep only sites in the Northeaster US:

# filter by mean long lat of geometry: NE US
camels_basins['mean_long'] = camels_basins.geometry.centroid.x
camels_basins['mean_lat'] = camels_basins.geometry.centroid.y
camels_basins = camels_basins[(camels_basins['mean_long'] > -80) & (camels_basins['mean_long'] < -70)]
camels_basins = camels_basins[(camels_basins['mean_lat'] > 35) & (camels_basins['mean_lat'] < 45)]

I also convert the mean flow data (q_mean) units from mm/day to cubic meters per day:

# convert q_mean from mm/day to m3/s
camels_basins['q_mean_cms'] = camels_basins['q_mean'] * (1e-3) *(camels_basins['area_gages2']*1000**2) * (1/(60*60*24)) 

And this is all the data we need for this crude model!

Bayesian linear model

The simple linear regression model (hello my old friend):

Normally you might assume that there is a single, best value corresponding to each of the model parameters (alpha and beta). This is considered a Frequentist perspective and is a common approach. In these cases, the best parameters can be estimated by minimizing the errors corresponding to a particular set of parameters (see least squares, for example.

However, we could take a different approach and assume that the parameters (intercept and slope) are random variables themselves, and have some corresponding distribution. This would constitute a Bayesian perspective.

Keeping with simplicity in this example, I will assume that the intercept and slope each come from a normal distribution with a mean and variance such that:

When it comes time to make inferences or predictions using our model, we can create a large number of predictions by sampling different parameter values from these distributions. Consequently, we will end up with a distribution of uncertain predictions.

PyMC implementation

I recommend you see the PyMC installation guide to help you get set up.

NOTE: The MCMC sampler used by PyMC is written in C and will be SIGNIFICANTLY faster if you provide have access to GCC compiler and specify the it’s directory using the the following:

import pymc as pm

import os
os.environ["THEANO_FLAGS"] = "gcc__cxxflags=-C:\mingw-w64\mingw64\bin"

You will get a warning if you don’t have this properly set up.

Now, onto the demo!

I start by retrieving our X and Y data from the CAMELS dataset we created above:

# Pull out X and Y of interest
x_ftr= 'area_gages2'
y_ftr = 'q_mean_cms'
xs = camels_basins[x_ftr] 
ys = camels_basins[y_ftr]

# Take log-transform 
xs = np.log(xs)
ys = np.log(ys)

At a glance, we see there is a reasonable linear relationship when working in the log space:

Two of the key features when building a model are:

  • The random variable distribution constructions
  • The deterministic model formulation

There are lots of different distributions available, and each one simply takes a name and set of parameter values as inputs. For example, the normal distribution defining our intercept parameter is:

alpha = pm.Normal('alpha', mu=intercept_prior, sigma=10)

The value of the parameter priors that you specify when construction the model may have a big impact depending on the complexity of your model. For simple models you may get away with having uninformative priors (e.g., setting mu=0), however if you have some initial guesses then that can help with getting reliable convergence.

In this case, I used a simple least squares estimate of the linear regression as the parameter priors:

slope_prior, intercept_prior = np.polyfit(xs.values.flatten(), ys.values.flatten(), 1)

Once we have our random variables defined, then we will need to formulate the deterministic element of our model prediction. This is the functional relationship between the input, parameters, and output. For our linear regression model, this is simply:

y_mu = alpha + beta * xs

In the case of our Bayesian regression, this can be thought of as the mean of the regression outputs. The final estimates are going to be distributed around the y_mu with the uncertainty resulting from the combinations of our different random variables.

Putting it all together now:

### PyMC linear model
with pm.Model() as model:
    
    # Priors
    alpha = pm.Normal('alpha', mu=intercept_prior, sigma=10)
    beta = pm.Normal('beta', mu=slope_prior, sigma=10)
    sigma = pm.HalfNormal('sigma', sigma=1)

    # mean/expected value of the model
    mu = alpha + beta * xs

    # likelihood
    y = pm.Normal('y', mu=mu, sigma=sigma, observed=ys)

    # sample from the posterior
    trace = pm.sample(2000, cores=6)
 

With our model constructed, we can use the pm.sample() function to begin the MCMC sampling process and estimate the posterior distribution of model parameters. Note that this process can be very computationally intensive for complex models! (Definitely make sure you have the GCC set up correctly if you plan on needing to sample complex models.)

Using the sampled parameter values, we can create posterior estimates of the predictions (log mean flow) using the posterior parameter distributions:

## Generate posterior predictive samples
ppc = pm.sample_posterior_predictive(trace, model=model)

Let’s go ahead and plot the range of the posterior distribution, to visualize the uncertainty in the model estimates:

### Plot the posterior predictive interval
fig, ax = plt.subplots(ncols=2, figsize=(8,4))

# log space
az.plot_hdi(xs, ppc['posterior_predictive']['y'], 
            color='cornflowerblue', ax=ax[0], hdi_prob=0.9)
ax[0].scatter(xs, ys, alpha=0.6, s=20, color='k')
ax[0].set_xlabel('Log ' + x_ftr)
ax[0].set_ylabel('Log Mean Flow (m3/s)')

# original dim space
az.plot_hdi(np.exp(xs), np.exp(ppc['posterior_predictive']['y']), 
            color='cornflowerblue', ax=ax[1], hdi_prob=0.9)
ax[1].scatter(np.exp(xs), np.exp(ys), alpha=0.6, s=20, color='k')
ax[1].set_xlabel(x_ftr)
ax[1].set_ylabel('Mean Flow (m3/s)')
plt.suptitle('90% Posterior Prediction Interval', fontsize=14)
plt.show()

And there we have it! The figure on the left shows the data and posterior prediction range in log-space, while the figure on the right is in non-log space.

As mentioned earlier, it’s not the best model (wayyy to much uncertainty in the large-basin mean flow estimates), but at least we have the benefit of knowing the uncertainty distribution since we took the Bayesian approach!

That’s all for now; this post was really meant to bring PyMC to your attention. Maybe you have a use case or will be more likely to consider Bayesian approaches in the future.

If you have other Bayesian/probabilistic programming tools that you like, please do comment below. PyMC is one (good) option, but I’m sure other people have their own favorites for different reasons.


PyMC resources:

References

Addor, N., Newman, A. J., Mizukami, N. and Clark, M. P. The CAMELS data set: catchment attributes and meteorology for large-sample studies, Hydrol. Earth Syst. Sci., 21, 5293–5313, doi:10.5194/hess-21-5293-2017, 2017.

Nonstationary stochastic watershed modeling

In this post, I will describe the motivation for and implementation of a nonstationary stochastic watershed modeling (SWM) approach that we developed in the Steinschneider group during the course of my PhD. This work is in final revision and should be published in the next month or so. This post will attempt to distill key components of the model and their motivation, saving the full methodological development for those who’d like to read the forthcoming paper.

SWMs vs SSM/SSG

Before diving into the construction of the model, some preliminaries are necessary. First, what are SWMs, what do they do, and why use them? SWMs are a framework that combine deterministic, process-based watershed models (think HYMOD, SAC-SMA, etc.; we’ll refer to these as DWMs from here forward) with a stochastic model that capture their uncertainty. The stochastic part of this framework can be used to generate ensembles of SWM simulations that both represent the hydrologic uncertainty and are less biased estimators of the streamflow observations (Vogel, 2017).

Figure 1: SWM conceptual diagram

SWMs were developed to address challenges to earlier stochastic streamflow modeling/generation techniques (SSM/SSG; see for instance Trevor’s post on the Thomas-Fiering SSG; Julie’s post and Lillian’s post on other SSG techniques), the most important of which (arguably) being the question of how to formulate them under non-stationarity. Since SSMs are statistical models fitted directly to historical data, any attempt to implement them in a non-stationary setting requires strong assumptions about what the streamflow response might look like under an alternate forcing scenario. This is not to say that such an approach is not useful or valid for exploratory analyses (for instance Rohini’s post on synthetic streamflow generation to explore extreme droughts). SWMs attempt to address this issue of non-stationarity by using DWMs in their stochastic formulation, which lend some ‘physics-based’ cred to their response under alternate meteorological forcings.

Construction of an SWM

Over the years, there have been many SWM or SWM-esque approaches devised, ranging from simple autoregressive models to complex Bayesian approaches. In this work, we focus on a relatively straightforward SWM approach that models the hydrologic predictive uncertainty directly and simply adds random samples of it to the DWM simulations. The assumption here being that predictive uncertainty is an integrator of all traditional component modeling uncertainties (input, parameter, model/structural), so adding it back in can inject all these uncertainties into the SWM simulations at once (Shabestanipour et al., 2023).

Figure 2: Uncertainty components

By this straightforward approach, the fitting and parameter estimation of the DWM is accomplished first (and separately) via ‘standard’ fitting procedures; for instance, parameter optimization to minimize Nash-Sutcliffe Efficiency (NSE). Subsequently, we develop our stochastic part of the model on the predictive uncertainty that remains, which in this case, is defined simply by subtracting the target observations from the DWM predictions. This distribution of differenced errors is the ‘predictive uncertainty distribution’ or ‘predictive errors’ that form the target of our stochastic model.

Challenges in modeling predictive uncertainty

Easy, right? Not so fast. There is a rather dense and somewhat unpalatable literature (except for the masochists out there) on the subject of hydrologic uncertainty that details the challenges in modeling these sorts of errors. Suffice it to say that they aren’t well behaved. Any model we devise for these errors must be able to manage these bad behaviors.

So, what if we decide that we want to try to use this SWM thing for planning under future climates? Certainly the DWM part can hack it. We all know that lumped, conceptual DWMs are top-notch predictors of natural streamflow… At the least, they can produce physically plausible simulations under alternate forcings (we think). What of the hydrologic predictive uncertainty then? Is it fair or sensible to presume that some model we constructed to emulate historical uncertainty is appropriate for future hydrologic scenarios with drastically different forcings? My line of rhetorical questioning should clue you in on my feelings on the subject. YES!, of course. ‘Stationarity is immortal!’ (Montanari & Koutsoyiannis, 2014).

Towards a hybrid, state-variable dependent SWM

No, actually, there are a number of good reasons why this statement might not hold for hydrologic predictive uncertainty under non-stationarity. You can read the paper for the laundry list. In short, hydrologic predictive uncertainty of a DWM is largely a reflection of its structural misrepresentation of the true process. Thus, the historical predictive uncertainty that we fit our model to is a reflection of that structural uncertainty propagated through historical model states under historical, ‘stationary’ forcings. If we fundamentally alter those forcings, we should expect to see model states that do not exist under historical conditions. The predictive errors that result from these fundamentally new model states are thus likely to not fall neatly into the box carved out by the historical scenarios.

Figure 3: Structural uncertainty

To bring this back to the proposition for a nonstationary SWM approach. The intrinsic link between model structure and its predictive uncertainty raises an interesting prospect. Could there be a way to leverage a DWM’s structure to understand its predictive uncertainty? Well, I hope so, because that’s the premise of this work! What I’ll describe in the ensuing sections is the implementation of a hybrid, state-variable dependent SWM approach. ‘Hybrid’ because it couples both machine learning (ML) and traditional statistical techniques. ‘State-variable dependent’ because it uses the timeseries of model states (described later) as the means to infer the hydrologic predictive uncertainty. I’ll refer to this as the ‘hybrid SWM’ for brevity.

Implementation of the hybrid SWM

So, with backstory in hand, let’s talk details. The remainder of this post will describe the implementation of this hybrid SWM. This high-level discussion of the approach supports a practical training exercise I put together for the Steinschneider group at the following public GitHub repo: https://github.com/zpb4/hybrid-SWM_training. This training also introduces a standard implementation of a GRRIEN repository (see Rohini’s post). Details of implementing the code are contained in the ‘README.md’ and ‘training_exercise.md’ files in the repository. My intent in this post is to describe the model implementation at a conceptual level.

Model-as-truth experimental design

First, in order to address the problem of non-stationary hydrologic predictive uncertainty, we need an experimental design that can produce it. There is a very real challenge here of not having observational data from significantly altered climates to compare our hydrologic model against. We address this problem by using a ‘model-as-truth’ experimental design, where we fit one hydrologic model (‘truth’ model) to observations, and a second hydrologic model (‘process’ model) to the first truth model. The truth model becomes a proxy for the true, target flow of the SWM modeling procedure, while the process model serves as our proposed model, or hypothesis, about that true process. Under this design, we can force both models with any plausible forcing scenario to try to understand how the predictive uncertainty between ‘truth’ and ‘process’ models might change.

Figure 4: Conceptual diagram of ‘model-as-truth’ experimental design

For the actual work, we consider a very simple non-stationary scenario where we implement a 4oC temperature shift to the temperature forcing data, which we refer to as the ‘Test+4C’ scenario. We choose this simple approach to confine non-stationarity to a high-confidence result of anthropogenic climate change, namely, thermodynamic warming. We compare this Test+4C scenario to a ‘Test’ scenario, which is the same out-of-sample temporal period (WY2005-2018) of meteorological inputs under historical values. SAC-SMA and HYMOD are the truth model and process model for this experiment, respectively. Other models could have been chosen. We chose these because they are conceptually similar and commonly used.

Figure 5: Errors between truth and process models in 5 wettest years of Test/Test+4C scenarios.

Hybrid SWM construction

The core feature of the hybrid SWM is a model for the predictive errors (truth model – process model) that uses the hydrologic model state-variables as predictors. We implement this model in two steps that have differing objectives, but use the same state-variable predictor information. An implicit assumption in using state-variable dependencies in both steps is that these dependencies can exist in both stages. In other words, we do not expect the error-correction step to produce independent and identically distributed residuals. We call the first step an ‘error-correction model’ and the second step a ‘dynamic residual model’. Since we use HYMOD as our process model, we use its state-variables (Table 1) as the predictors for these two steps.

Table 1: HYMOD state variables

Short NameLong NameDescription
simSimulationHYMOD predicted streamflow in mm
runoffRunoffUpper reservoir flow of HYMOD in mm
baseflowBaseflowLower reservoir flow of HYMOD in mm
precipPrecipitationBasin averaged precipitation in mm
tavgAverage temperatureBasin averaged temperature in oC
etEvapotranspirationModeled evapotranspiration (Hamon approach) in mm
upr_smUpper soil moistureBasin averaged soil moisture content (mm) in upper reservoir
lwr_smLower soil moistureBasin averaged soil moisture (mm) in lower reservoir
sweSnow water equivalentBasin averaged snow water equivalent simulated by degree day snow module (mm)

Hybrid SWM: Error correction

The error-correction model is simply a predictive model between the hydrologic model (HYMOD) state-variables and the raw predictive errors. The error-correction model also uses lag-1 to 3 errors as covariates to account for autocorrelation. The objective of this step is to infer state-dependent biases in the errors, which are the result of the predictive errors subsuming the structural deficiencies of the hydrologic model. This ‘deterministic’ behavior in the predictive errors can also be conceived as the ‘predictive errors doing what the model should be doing’ (Vogel, 2017). Once this error-correction model is fit to its training data, it can be implemented against any new timeseries of state-variables to predict and debias the errors. We use a Random Forest (RF) algorithm for this step because they are robust to overfitting, even with limited training data. This is certainly the case here, as we consider only individual basins and a ~15 year training period (WY1989-2004). Moreover, we partition the training period into a calibration and validation subset and fit the RF error-correction model only to the calibration data (WY1989-1998), reducing available RF algorithm training data to 9 years.

Hybrid SWM: Dynamic residual model

The dynamic residual model (DRM) is fit to the residuals of the error correction result in the validation subset. We predict the hydrologic model errors for the validation subset from the fitted RF model and subtract them from the empirical errors to yield the residual timeseries. By fitting the DRM to this separate validation subset (which the RF error-correction model has not seen), we ensure that the residuals adequately represent the out-of-sample uncertainty of the error-correction model.

A full mathematical treatment of the DRM is outside the scope of this post. In high-level terms, the DRM is built around a flexible distributional form particularly suited to hydrologic errors, called the skew exponential power (SEP) distribution. This distribution has 4 parameters (mean: mu, stdev: sigma, kurtosis: beta, skew: xi) and we assume a mean of zero (due to error-correction debiasing), while setting the other 3 parameters as time-varying predictands of the DRM model (i.e. sigmat, betat, xit). We also include a lag-1 autocorrelation term (phit) to account for any leftover autocorrelation from the error-correction procedure. We formulate a linear model for each of these parameters with the state-variables as predictors. These linear models are embedded in a log-likelihood function that is maximized (i.e. MLE) against the residuals to yield the optimal set of coefficients for each of the linear models.

With a fitted model, the generation of a new residual at each timestep t is therefore a random draw from the SEP with parameters (mu=0, sigmat, betat, xit) modified by the residual at t-1 (epsilont-1) via the lag-1 coefficient (phit).

Figure 6: Conceptual diagram of hybrid SWM construction.

Hybrid SWM: Simulation

The DRM is the core uncertainty modeling component of the hybrid SWM.  Given a timeseries of state-variables from the hydrologic model for any scenario, the DRM simulation is implemented first, as described in the previous section. Subsequently, the error-correction model is implemented in ‘predict’ mode with the timeseries of random residuals from the DRM step. Because the error-correction model includes lag-1:3 terms, it must be implemented sequentially using errors generated at the previous 3 timesteps. The conclusion of these two simulation steps yields a timeseries of randomly generated, state-variable dependent errors that can be added to the hydrologic model simulation to produce a single SWM simulations. Repeating this procedure many times will produce an ensemble of SWM simulations.

Final thoughts

Hopefully this discussion of the hybrid SWM approach has given you some appreciation for the nuanced differences between SWMs and SSM/SSGs, the challenges in constructing an adequate uncertainty model for an SWM, and the novel approach developed here in utilizing state-variable information to infer properties of the predictive uncertainty. The hybrid SWM approach shows a lot of potential for extracting key attributes of the predictive errors, even under unprecedented forcing scenarios. It decouples the task of inferring predictive uncertainty from features of the data like temporal seasonality (e.g. day of year) that may be poor predictors under climate change. When linked with stochastic weather generation (see Rohini’s post and Nasser’s post), SWMs can be part of a powerful bottom-up framework to understand the implications of climate change on water resources systems. Keep an eye out for the forthcoming paper and check out the training noted above on implementation of the model.

References:

Brodeur, Z., Wi, S., Shabestanipour, G., Lamontagne, J. R., & Steinschneider, S. A hybrid, non-stationary Stochastic Watershed Model (SWM) for uncertain hydrologic simulations under climate change. Water Resources Research, in review.

Montanari, A., & Koutsoyiannis, D. (2014). Modeling and mitigating natural hazards: Stationarity is immortal! Water Resources Research, 50, 9748–9756. https://doi.org/10.1002/ 2014WR016092

Shabestanipour, G., Brodeur, Z., Farmer, W. H., Steinschneider, S., Vogel, R. M., & Lamontagne, J. R. (2023). Stochastic Watershed Model Ensembles for Long-Range Planning : Verification and Validation. Water Resources Research, 59. https://doi.org/10.1029/2022WR032201

Vogel, R. M. (2017). Stochastic watershed models for hydrologic risk management. Water Security, 1, 28–35. https://doi.org/10.1016/j.wasec.2017.06.001

Weather Regime-Based Stochastic Weather Generation (Part 2/2)

In this post on the Water Programming Blog, we continue to explore the application of the stochastic weather generator (available on GitHub) for climate-change scenario developments. This is the second installment of a two-part series of blog posts, and readers are strongly encouraged to familiarize themselves with different components of the stochastic weather generator, as explained in Part 1 by Rohini Gupta (The Reed Research Group).

Here, we will begin by offering a concise overview of developing climate change scenarios and how these scenarios are integrated into the weather generation model. Following this, we will proceed to interpret the impact of these climatic change conditions on key statistical insights concerning occurrences of floods and droughts. Through these examples, the implications of these climatic shifts on water resources management and flood risk analysis will become evident.

Climate Change Perturbations

In this stochastic weather generator, we specifically focus on two aspects of climate change scenario developments, including 1) thermodynamic and 2) dynamic perturbations.

1) Thermodynamic Change

Thermodynamic climate change, often referred to as temperature-driven change, is primarily driven by changes in the Earth’s energy balance and temperature distribution. This warming affects various aspects of the climate system, such as intensifying precipitation extremes, melting snowpacks and ice sheets, rising sea levels, altered weather patterns, and shifts in ecosystems. The primary driver of temperature-driven climate change is the increase in regional-to-global average temperatures due to the enhanced greenhouse effect. As temperatures rise due to natural and anthropogenic forcings, they trigger a cascade of interconnected impacts throughout the climate system.

In the stochastic weather generator, scenarios of temperature change are treated simply by adding trends to simulated temperature data for each location across the spatial domain. However, scenarios of thermodynamic precipitation intensification are modeled using a quantile mapping technique through scaling the distribution of daily precipitation in a way that replicates the effects of warming temperatures on precipitation as the moisture holding capacity of the atmosphere increases. In the context of California, previous studies have demonstrated that as temperatures rise, the most severe precipitation events (often associated with Atmospheric Rivers landfalling) are projected to increase in frequency, while the intensity of smaller precipitation events is expected to decrease (Gershunov et al., 2019). This alteration effectively stretches the distribution of daily precipitation, causing extreme events to become more pronounced while reducing the occurrence and strength of lighter precipitation events. We replicate this phenomenon by making adjustments to the statistical characteristics and distribution percentiles of precipitation (e.g., Pendergrass and Hartmann, 2014). To further elaborate on this, we select a scaling factor for the 99th percentile of nonzero precipitation and then modify the gamma-GPD mixed distribution to enforce this chosen scaling factor. For instance, in a scenario with a 3°C temperature warming and a 7% increase in extreme precipitation per °C (matching the theoretical Clausius-Clapeyron rate of increase in atmospheric water holding capacity due to warming; Najibi and Steinschneider (2023)), the most extreme precipitation events are projected to increase by approximately 22.5% (1.073). We adjust the gamma-GPD models fitted to all locations to ensure this percentage increase in the upper tail of the distribution. Assuming that mean precipitation remains constant at baseline levels, this adjustment will cause smaller precipitation values in the gamma-GPD model to decrease, compensating for the increase in extreme events through stretching the distribution of nonzero precipitation.

Lines 33-40 from ‘config.simulations.R’ show the user-defined changes to implement the thermodynamic scenarios based on temperature in Celsius (tc: e.g. 1 °C), percent change in extreme precipitation quantile (pccc: e.g. 7% per °C), and percent change in average precipitation (pmuc: e.g. 12.5% decrease) inputs. Needless to say that the stochastic weather generator runs at baseline mode if tc=0 and pmuc=0.

    ##-------------Define perturbations-------------##
    ##climate changes and jitter to apply:
    change.list <- data.frame("tc"=  c(0), # {e.g., 0, 1, 2, ...} (changes in temperature)
                              "jitter"= c(TRUE),
                              "pccc"= c( 0.00), # {e.g., 0, 0.07, 0.14, ...} (changes for precipitation extreme quantile -- CC)
                              "pmuc"= c( 0.00)# {e.g., 0, -.125, .125, ...} (changes in precipitation mean)
    )
    ##----------------------------------------------##

2) Dynamic Change

Dynamic climate change, also known as circulation-driven change, is driven by shifts in atmospheric and oceanic circulation patterns. These circulation patterns are influenced by a variety of factors, including temperature gradients, differences in air pressure, and Earth’s rotation. Changes in these patterns can lead to alterations in weather patterns, precipitation distribution, and regional climate characteristics. One well-known example of dynamic climate change is the phenomenon of El Niño and La Niña, which involve changes in ocean temperatures and atmospheric pressure in the Pacific Ocean. These events can significantly impact local-to-global weather patterns, causing droughts, heavy rainfall, and other extreme weather events (Pfahl et al., 2017).

Dynamic changes impact the evolution of weather patterns and can modify the occurrence frequency of these patterns. This influence can occur through direct adjustments to the transition probabilities between different weather regimes, or indirectly by modifying the covariates that govern the progression of these weather regimes. In Steinschneider et al. (2019), a Niño 3.4 index is used to force weather regime evolution and is systematically adjusted to create more frequent El Niño and La Niña events. In Gupta et al. (in review), a 600-year long sequence of tree-ring reconstructed principal components of weather regime occurrence are used as an alternative covariate to better capture natural variability inherent in the weather regimes.

In the most recent version of the stochastic weather generator, we developed a novel non-parametric approach to simulation of weather regimes, allowing for future dynamic change scenarios with altered (customized) weather regime probabilities. Assuming that the historical time series of water regimes is divided into distinct, consecutive segments without overlaps, each segment has a duration of D years, and there is a total of ND segments considered there (here, D=4 and ND=18). In the non-parametric method, each segment (indexed as n=1 to ND) is assigned a sampling probability denoted as pn. To generate a new sequence of daily weather regimes spanning any desired number of years, the procedure involves resampling (with replacement) the nth D-year segment of daily weather regimes using the corresponding probability pn. This process is repeated until the required number of years of simulated weather regimes has been attained. If needed, the last segment can be trimmed to ensure the precise desired duration of simulated weather regimes.

In the baseline scenario for the weather generator with no dynamic climate change (only thermodynamic change), each segment is considered equally likely (i.e., no changes to large-scale circulation patterns).

However, the probabilities pn can be adjusted to alter the frequencies of each of the identified weather regimes in the final simulation, enabling the generation of dynamic climate change scenarios (i.e., scenarios in which the frequencies of different atmospheric flow patterns change compared to their historical frequencies). This is achieved using a linear program. The goal of the linear model (not shown) is to identify new sampling probabilities pn that, when used in the non-parametric simulation approach above, create a sequence of weather regimes with long-term average frequencies that approach some vector of target probabilities for those identified weather regimes.

Lines 91-126 from ‘config.simulations.R’ show the user-defined changes to implement a non-parametric scenario with equal probabilities (0: no change to the historical sequence of weather regimes) to ten weather regimes, i.e., dynamic scenario #0; and a 30% increase in weather regime number three (a dry weather condition) in California, i.e., dynamic scenario #1.

##Choose below whether through parametric or non-parametric way to create the simulated WRs ##
    use.non_param.WRs <- TRUE #{TRUE, FALSE}: TRUE for non-parametric, FALSE for parametric simulated WRs

    dynamic.scenario  <- 0 # {0, 1, 2}: 0: no dynamic change; 1: dynamic scenario #1 (30% increase in WR3); or 2: dynamic scenario #2 (linear trend)

    if (use.non_param.WRs){      #----------- 1+2 dynamic scenarios ----------#
      if (dynamic.scenario==0){
        ##===> Attempt #0 (thermodynamic only; no change to freq of WRs) ===##
        # #specify target change (as a percent) for WR probabilities
        WR_prob_change <- c(0,0,0,0,0,0,0,0,0,0) # between 0 and 1
        # #how close (in % points) do the WR frequencies (probabilities) need to be to the target
        lp.threshold <- 0.00001
        # #how much change do we allow in a sub-period sampling probability before incurring a larger penalty in the optimization
        piecewise_limit <- .02
        
      }else if(dynamic.scenario==1){
        ##===> Attempt #1 (dynamic scenario #1) ===##
        # #specify target change (as a percent) for WR probabilities (if, increasing WR3 in future)
        WR_prob_change <- c(0,0,.3,0,0,0,0,0,0,0) # between 0 and 1
        # #how close (in % points) do the WR frequencies (probabilities) need to be to the target
        lp.threshold <- 0.007
        # #how much change do we allow in a sub-period sampling probability before incurring a larger penalty in the optimization
        piecewise_limit <- .02
        
      }else if(dynamic.scenario==2){
        ##===> Attempt #2 (dynamic scenario #2) ===##
        # specify target change (as a percent) for WR probabilities (if, continuing their current trends in future)
        WR_prob_change <- c(-0.09969436,  0.27467048,  0.33848792,
                            -0.28431861, -0.23549986,  0.03889970,
                            -0.05628958, 0.38059153, -0.16636739, -0.17995965) # between 0 and 1
        # how close (in % points) do the WR frequencies (probabilities) need to be to the target
        lp.threshold <- 0.008
        # how much change do we allow in a sub-period sampling probability before incurring a larger penalty in the optimization
        piecewise_limit <- .02
      }
    }

Stochastic Weather Generation for Climate Change Scenarios

The stochastic weather generator is utilized to generate two climate change scenarios as defined above. The description of specific configurations for each scenario is listed as follows:

  • Thermodynamic Scenario: 3°C increase in temperature, 7% per °C increase in precipitation extremes, no change in average precipitation.
  • Dynamic Scenario: 30% increase in occurrence frequency of one weather regime only, labeled as ‘WR3’, which exhibits a ridge directly over the northwest US, i.e., blocking moisture flow over California, and resulting in dry conditions during the cold season there.

Thus, we generated 1008 years of simulated precipitation and temperature for each 12 sites in the Tuolumne River Basin in California (similar to Part 1) following these two scenarios. Below is a list of figures to understand better the impact of each scenario on precipitation and temperature statistical distributions and important climate extremes at basin scale.

The two Figures below present the cumulative distribution function (CDF) of the generated scenario for precipitation (left) and temperature (right) based on the thermodynamic and dynamic change, respectively. The observed time-series of precipitation and temperature across these 12 sites is also illustrated.

As seen above, although the 3°C warming is clearly manifested in the alteration of simulated temperature’s CDF, it is hard to notice any drastic shifts in the overall CDF of precipitation time series, as the bulk of distribution has not been altered (remember the average precipitation remained constant although its extreme quantile scaled up by ~ 22.5%).

Similarly, while the CDF of precipitation demonstrates a slight shift towards a drier condition, we notice a large shift in tail of temperature distribution.

Now, we go ahead and examine a set of important indexes for climate risk assessment of water systems. The Figure below presents the 1-day precipitation maxima derived from the generated scenario for precipitation based on the thermodynamic (left) and dynamic (right) change.

In the plot above depicted for thermodynamic climate change, the median 1-day precipitation extremes at the basin scale throughout the entire synthetic weather generation vs. historical period demonstrates a 25.5% increase in its magnitude, which is consistent with the event-based precipitation scaling by 22.5% at each site. However, such metric has almost remained unchanged in the dynamic climate change scenario.

Finally, the Figure below shows the 5-year drought derived from the generated scenario for water-year precipitation total, based on the thermodynamic (left) and dynamic (right) change.

The boxplots presented above related to the thermodynamic scenario, revealing a consistent median 5-year drought magnitude, as anticipated (no shift in the distribution of average precipitation bulk). In contrast, the dynamic climate change scenario exhibits a substantial exacerbation, with the 5-year drought magnitude worsening by around 9% compared to the historical records.

There is plenty more to explore! The stochastic weather generator is suitable to quickly simulate a long sequence of weather variables that reflect any climate change of interest. Keep an eye out for upcoming additions to the repository in the coming months, and do not hesitate to contact us or create a GitHub issue if you need clarification.

References

Gershunov, A., Shulgina, T., Clemesha, R.E.S. et al. (2019). Precipitation regime change in Western North America: The role of Atmospheric Rivers. Scientific Reports, 9, 9944. https://doi.org/10.1038/s41598-019-46169-w.

Gupta, R.S., Steinschneider S., Reed, P.M. Understanding Contributions of Paleo-Informed Natural Variability and Climate Changes on Hydroclimate Extremes in the Central Valley Region of California. Authorea. March 13, 2023. https://doi.org/10.22541/essoar.167870424.46495295/v1

Najibi, N., and Steinschneider, S. (2023). Extreme precipitation-temperature scaling in California: The role of Atmospheric Rivers, Geophysical Research Letters, 50(14), 1–11, e2023GL104606. https://doi.org/10.1029/2023GL104606.

Pendergrass, A.G., and Hartmann, D.L. (2014). Two modes of change of the distribution of rain, Journal of Climate, 27(22), 8357-8371. https://doi.org/10.1175/JCLI-D-14-00182.1

Pfahl, S., O’Gorman, P.A., Fischer, E.M. (2017). Understanding the regional pattern of projected future changes in extreme precipitation, Nature Climate Change, 7 (6), 423-427. http://dx.doi.org/10.1038/nclimate3287

Steinschneider, S., Ray, P., Rahat, S. H., & Kucharski, J. (2019). A weather‐regime‐based stochastic weather generator for climate vulnerability assessments of water systems in the western United States. Water Resources Research, 55(8), 6923-6945. https://doi.org/10.1029/2018WR024446

Rodionov’s Regime Shift Detection Algorithm and the “Epic Pluvial” in the Delaware Basin

In their 2013 publication, Pederson et al. try to answer the question posed by the title: Is an Epic Pluvial Masking the Water Insecurity of the Greater New York City Region?

To help answer this question, they first reconstruct 472 years of Palmer Drought Severity Index (PDSI) for the region, created using tree-ring chronologies, and show that study the hydroclimate patterns over the historic record. The reconstructed PDSI, along with measured meteorological data dating back to 1895 are used to assess trends the region’s hydroclimate.

Their analysis shows that the upper Delaware River Basin (source of NYC drinking water) has experienced a prolonged wet period since the early 1970s, and that the water availability experienced between 1970 until the time of writing may not be representative of what is “normal” in the context of the past 400 years.

This works supports the findings of Seager et al. (2012) who also identified the post-1960s “Epic Pluvial”, and also compared the observational data with temperature-forced global climate models to show that the wet regime is plausibly explained by internal atmospheric variability of the region. Lehner and Deser (2023) define internal variability as “fluctuations that arise intrinsically in a non-linear dynamical system, even when that system is closed.”

The first figure in Pederson et al. (2013; below) visualizes this shift toward a wetter regime, starting in 1970, along with a second wetter-regime shift again in 2003. The orange line in the figure shows the average precipitation of the wet-regime detected in 1970, which was detected using the sequential algorithm for testing climate regime shifts proposed by Rodionov (2004).

When I saw this figure, several of my own questions came to mind:

  • Has the wet-regime continued after 2011 (end of study period)?
  • This regime shift was detected in precipitation, will it also be detected in streamflow data?

Post Overview

In this post, I present Rodionov’s regime shift detection algorithm, and use it to assess streamflow regime shifts in the upper Delaware River Basin (DRB) from 1904 through the present.

A Python implementation of the algorithm, the streamflow data studied here, and a Jupyter Notebook used to run the following analysis is all available here: Rodionov_regime_shifts

Rodionov’s Algorithm

The mathematical details of the algorithm are best described by Rodionov (2004), and it would not be advantageous for me to replicate those here, but I encourage you to read the formal description in combination with the descriptions here.

Below, I provide both a verbal explanation of the process along with the Python implementation here: rodionov.py

Rodionov’s algorithm in words

Step 1: Set parameters (l & p)

In Rodionov’s sequential regime algorithm, the first step is to specify the expected minimum length of a regime (denoted as l) and a required statistical significance level (p) used to test regime differences.

Additionally, the average variance of all periods of length l in the data record is calculated, and each regime is assumed to have the average variance.

Step 2: Determine statistically significant deviation values

The algorithm defines the threshold for identifying potential new regimes based on how different a value needs to be from the mean of the current regime for it to be considered a potential regime shift.

This difference depends on a Student’s t-test value, the minimum regime length, average variance, and the user-specified significance level (p).

Step 3: Calculate initial regime statistics

Assume that the initial regime is the first l days, specified in Step 1.

The mean of this initial regime is calculated, and upper and lower significance bounds are established using the difference value obtained in Step 2. Upper and lower bounds are equal to the mean plus/minus the significant difference value.

Step 4: Identifying potential regime shift points

One-by-one test each subsequent value in the series to tested to determine if it exceeds the upper or lower bounds established for the current regime distribution.

If the value is inside the bounds of the current regime, then re-calculate the current regime mean including the new point, and move on to the next day.

If a value is outside the bounds of the current regime, then consider that point as a potential regime shift (Step 5).

Step 5: Testing a potential regime shift

Once a potential regime shift is detected, adopt a null hypothesis that the new regime has a mean equal to the statistically significant boundary value of the previous regime distribution that was exceeded.

The regime shift hypothesis is then assessed by calculating the Regime Shift Index (RSI), which is a cumulative measure of exceedances beyond the prior regime’s significance value. The cumulative sum of exceedances over the minimum regime length (l) defines the RSI for the potential regime shift point.

Step 6: Rejecting potential regime shifts

If RSI <0 at any point within the l-periods after the initial shift period, then the new-regime hypothesis is rejected.

When the potential shift is rejected, the mean of the prior regime is modified to incorporate the potential shift point into the distribution, and the analysis continues by returning to Step 4 to search for additional potential regime shifts.

Step 7: Accepting regime shifts

If RSI>0 after accumulating over the minimum regime length (l), then the new regime R2 is identified as significant.

Once the new regime is adopted, the mean of the new regime is calculated for the first l periods of R2, and new significance bounds are calculated using the significant difference value from Step 2.

The regime-shift-search is continued, starting on the first day of the new regime R2 (Return to Step 4.)

Notably, by resuming the search at the start of R2, regimes can be shorter than the specified minimum length l, and re-assessing the values in R2 lessens the impact of making assumptions about l.

Rodionov’s algorithm in code

The Python implementation, rodionov_regimes(), thus follows:

def rodionov_regimes(data, l, p):
    """Implements Rodionov's (2004) regime shift detection algorithm:
    Rodionov, S. N. (2004). A sequential algorithm for testing climate regime shifts.
    Geophysical Research Letters, 31(9).

    Args:
        data (array): Timeseries array of std values
        l (int): The assumed minimum regime length
        p (float): The singificance probability to use when assessing shifts

    Returns:
        list, list: Two lists: The regime-shift indices, the RSI values
    """

    # Step 1: Set the cut-off length l of the regimes
    # l: Number of years for each regime
    # p: Probability level for significance
    n = len(data)
    regime_shift_indices = []
    rsi = np.zeros(n)
	
	# Step 2: Determine the difference diff for statistically significant mean values
	t_stat = np.abs(t_test.ppf(p, (2*l-2)))
	avg_var = np.mean([np.var(data[i:(i+l)]) for i in range(n-l)])	
	diff = t_stat * np.sqrt(2 * avg_var / l)

	# Step 3: Calculate initial mean for regime R1
    r1 = np.mean(data[:l])
    r1_lower = r1 - diff
    r1_upper = r1 + diff

    i = l + 1
    while i < n:
        # Step 4: Check if the value exceeds the range of R1 ± diff
        if data[i] < r1_lower or data[i] > r1_upper:
            j = i
  
            # Estimate the mean of regime 2 as the upper bound of r1 distribution
            test_r2 = r1_lower if data[i] < r1_lower else r1_upper

            # Step 5: Calculate regime shift index (RSI) across next l-window
            for k in range(j + 1, min(j + l,n)):
                if data[j] > r1:
                    rsi[j] += (data[k] - test_r2)/(avg_var*l)
                elif data[j] < r1:
                    rsi[j] += (test_r2 - data[k])/(avg_var*l)

                # Step 6: Test for a regime shift at year j
                if rsi[j] < 0:
                    rsi[j] = 0
                    break

            # Step 7: Confirm significant regime shift at year j
            if rsi[j] > 0:
                regime_shift_indices.append(j)
                r2 = np.mean(data[j:min(j+l,n)])
                r1 = r2
                r1_lower = r1 - diff
                r1_upper = r1 + diff

            i = j + 1
        else:
            # Recalculate average for R1 to include the new value
            r1 = ((l - 1) * r1 + data[i]) / l
            
        i += 1
    return regime_shift_indices, rsi

Usage in the DRB: Detecting streamflow regime shift

Now it is time to try and answer the questions posed earlier:

  • Has the wet-regime identified by Pederson et al. (2013) continued after 2011?
  • This regime shift was detected in precipitation, will it also be detected in streamflow data?

See the Rodionov_DRB_regime_demo.ipynb to see how the following results are generated.

Streamflow data

In choosing streamflow data for this study, I looked for a USGS gauge that (A) is in or downstream of the upper DRB region considered by Pederson et al. (2013), and (B) has a long record.

I settled on USGS-01434000 which is located on the Delaware River near the middle of the basin, and has daily streamflow data dating back to 1904! I pulled all flow data from 1904 to July, 2023 (present).

Before using Rodionov’s algorithm I standardized the flow data by removing seasonality (remove monthly means) and log-transforming the deviations from seasonal means.

Lastly, I take the annual mean of standardized flow values.

Application of Rodionov’s algorithm

Running the rodionov_regimes() function is as easy as passing a 1D timeseries array, along with the l and p parameters.

shift_indices, rsi_values = rodionov_regimes(Z.values.flatten(), l= 10, p= 0.005)

The shift_indices contains a list of the array indices where a statistically-significant regime shift was detected. The rsi_values contains the RSI for each year in the timeseries, and will be 0 whenever there is no regime shift.

If the lists are empty, then there were no statistically significant regimes detected; consider lowering your statistical confidence requirements (increase p) and try again.

In this study, I start by searching for regimes with an assumed minimum length of 10 years (l= 10) and significance level of p= 0.005.

Streamflow regime shifts in the DRB

Running Rodionov’s algorithm on the standardized streamflow data reveals a significant regime shift toward wetter conditions in 2003:

Notably, the 1970s regime shift reported by Pederson et al. (2013) is not detected in the streamflow timeseries, under the initial parameterization. However, the authors also identified the 2003 regime shift toward wetter conditions, and the 2003 shift was the only shift they detected when analyzing summer precipitation (see lower timeseries in the Pederson figure earlier in this post).

The regime shifts found using the Rodionov algorithm is sensitive to the l and parameterization, and it may be that Pederson et al. (2013) used a different parameterization (I didn’t see it detailed in the publication). Although, in hydrology is often very difficult to decide on a definitive timescale for hydrologic processes that are highly unpredictable, like wet/dry regimes.

Rather than specifying a single assumed minimum regime length, I re-ran the Rodionov algorithm using every regime length between 5 and 35.

The figure below shows all of the different regimes determined to be significant across the range of different regime lengths. I also added a measure of Found Frequency (lower row) which indicates how often a specific year contained a significant regime shift across the 30 instances of the search.

From the above analysis, we can see:

  1. Identified regimes are variable in duration, and magnitude depending on the specified minimum regime length.
  2. Despite the variability, the 2003 shift toward a wetter regime is identified as being statistically significant in 29/30 of the searches!

Conclusions

To answer the questions that motivated this post:

  • Has the wet-regime identified by Pederson et al. (2013) continued after 2011?

Yes! Based on the Rodionov algorithm, it seems clear that the period 2011-2023 continues to be classified as a significantly wet period.

  • This regime shift was detected in precipitation, will it also be detected in streamflow data?

Yes, kinda! Extension of the work by Pederson et al. (2013) to include streamflow does show that the wet regime is detected in the streamflow record, however it is not clear that the regime started in 1970. However, the 2003 shift is also reported by Pederson et al. (2013) and clearly detected in the streamflow timeseries.

Implications for water resource systems

The persistence of a prolonged wet regime has implications for the New York City water supply, as NYC depends on the upper Delaware River Basin, Catskills region as a major supply source, owning three major reservoirs in the region. Contextualizing the present hydroclimate, and understanding the potential variability on trends in water availability will be important to making informed decisions that lead to water supply reliability in an uncertain and potentially less-epic climate.

If you’ve made it this far, thank you for reading!

References

Lehner, F., & Deser, C. (2023). Origin, importance, and predictive limits of internal climate variability. Environmental Research: Climate2(2), 023001.

Pederson, N., Bell, A. R., Cook, E. R., Lall, U., Devineni, N., Seager, R., … & Vranes, K. P. (2013). Is an epic pluvial masking the water insecurity of the greater New York City region?. Journal of Climate26(4), 1339-1354.

Rodionov, S. N. (2004). A sequential algorithm for testing climate regime shifts. Geophysical Research Letters31(9).

Seager, R., Pederson, N., Kushnir, Y., Nakamura, J., & Jurburg, S. (2012). The 1960s drought and the subsequent shift to a wetter climate in the Catskill Mountains region of the New York City watershed. Journal of Climate25(19), 6721-6742.

Figure Library Part 2 – Visualizations Supporting Synthetic Streamflow Diagnostics

Motivation

This is the second post in a series which was started by Dave last week and seeks to make a library of plotting functions for common visualization tasks. In this post, I share some plotting functions I have developed for synthetic streamflow diagnostic visualizations.

A thorough review of synthetic streamflow verification is outside the scope of this post, but in short we want to verify that the synthetic timeseries have statistical properties similar to the historic streamflow. Additionally, we want to see that the synthetic timeseries reflect the range of internal variability present in the basin by showing that a large synthetic ensemble envelopes the limited historic observations.

The following functions can be used to compare the following properties of historic and synthetic streamflow:

  • Flow magnitude
  • Non-exceedance probabilities (flow duration curves)
  • Autocorrelation
  • Spatial correlation across locations

Diagnostic plotting functions

  • plot_flow_ranges()
    • Range and median of flow values in historic and synthetic flow timeseries.
  • plot_fdc_ranges()
    • The range of annual flow duration curves (FDCs) for historic and synthetic streamflow timeseries.
  • plot_autocorrelation()
    • Comparison of historic and synthetic autocorrelation across a range of lag values.
  • plot_spatial_correlation()
    • Heatmap visualizations of Pearson correlation between different flow locations in the historic data and a single synthetic realization.

These functions are designed to accept historic (Qh) and synthetic (Qs) data with the following properties:

  • Qh : A pd.Series containing a single historic streamflow timeseries. The index of this data must be type pd.DatetimeIndex.
  • Qs : A pd.DataFrame containing many synthetic streamflow timeseries. The columns should each contain a single synthetic realization and the index must be type pd.DatetimeIndex. The column names do not matter.

Additionally, each function (excluding plot_fdc_ranges()) has a timestep keyword argument which changes the temporal scale of the flow being considered. Options are ‘daily’, ‘weekly’, or ‘monthly’, and the pandas.groupby() function is used to aggregate data accordingly to allow for comparison of statistics at different temporal scales. The default is ‘daily’.

Streamflow data

The historic data used in this example is taken from the Delaware River Basin. Specifically, streamflow data from the USGS gauge at Lordville, NY, on the upper portion of the Delaware River is used.

The synthetic streamflow was generated using the Kirsch-Nowak [1, 2] synthetic streamflow generator. For more detail on this generator, see Julie’s post here: Open Source Streamflow Generator Part 1.

I created 100 synthetic realizations of 30-year, daily streamflow timeseries at the Lordville gauge.

The historic and synthetic streamflow timeseries are assumed to start at the same date. Most of the plotting functions below do not explicitly consider the date associated with the timeseries, however comparisons between flow statistics across some range of time assume that the timeseries are aligned for the same dates.


Flow magnitude range

This first plot simply shows the range (max, min) and median of flows for both historic and synthetic datasets.

def plot_flow_ranges(Qh, Qs, 
                     timestep = 'daily',
                     units = 'cms', y_scale = 'log',
                     savefig = False, fig_dir = '.',
                     figsize = (7,5), colors = ['black', 'orange'],
                     title_addon = ""):
    """Plots the range of flow for historic and syntehtic streamflows for a specific timestep scale.
    
    Args:
        Qh (pd.Series): Historic daily streamflow timeseries. Index must be pd.DatetimeIndex. 
        Qs (pd.DataFrame): Synthetic daily streamflow timeseries realizations. Each column is a unique realization. Index must be pd.DatetimeIndex.
        timestep (str, optional): The timestep which data should be aggregated over. Defaults to 'daily'. Options are 'daily', 'weekly', or 'monthly'.
        units (str, optional): Streamflow units, for axis label. Defaults to 'cms'.
        y_scale (str, optional): Scale of the y-axis. Defaults to 'log'.
        savefig (bool, optional): Allows for png to be saved to fig_dir. Defaults to False.
        fig_dir (str, optional): Location of saved figure output. Defaults to '.' (working directory).
        figsize (tuple, optional): The figure size. Defaults to (4,4).
        colors (list, optional): List of two colors for historic and synthetic data respectively. Defaults to ['black', 'orange'].
        title_addon (str, optional): Text to be added to the end of the title. Defaults to "".
    """

    # Assert formatting matches expected
    assert(type(Qh.index) == pd.DatetimeIndex), 'Historic streamflow (Qh) should have pd.DatatimeIndex.'
    assert(type(Qs.index) == pd.DatetimeIndex), 'Synthetic streamflow (Qh) should have pd.DatatimeIndex.'

    # Handle conditional datetime formatting
    if timestep == 'daily':
        h_grouper = Qh.index.dayofyear
        s_grouper = Qs.index.dayofyear
        x_lab = 'Day of the Year (Jan-Dec)'
    elif timestep == 'monthly':
        h_grouper = Qh.index.month
        s_grouper = Qs.index.month
        x_lab = 'Month of the Year (Jan-Dec)'
    elif timestep == 'weekly':
        h_grouper = pd.Index(Qh.index.isocalendar().week, dtype = int)
        s_grouper = pd.Index(Qs.index.isocalendar().week, dtype = int)
        x_lab = 'Week of the Year (Jan-Dec)'
    else:
        print('Invalid timestep input. Options: "daily", "monthly", "weekly".')
        return

    # Find flow ranges
    s_max = Qs.groupby(s_grouper).max().max(axis=1)
    s_min = Qs.groupby(s_grouper).min().min(axis=1)
    s_median = Qs.groupby(s_grouper).median().median(axis=1)
    h_max = Qh.groupby(h_grouper).max()
    h_min = Qh.groupby(h_grouper).min()
    h_median = Qh.groupby(h_grouper).median()
  
    ## Plotting  
    fig, ax = plt.subplots(figsize = figsize, dpi=150)
    xs = h_max.index
    ax.fill_between(xs, s_min, s_max, color = colors[1], label = 'Synthetic Range', alpha = 0.5)
    ax.plot(xs, s_median, color = colors[1], label = 'Synthetic Median')
    ax.fill_between(xs, h_min, h_max, color = colors[0], label = 'Historic Range', alpha = 0.3)
    ax.plot(xs, h_median, color = colors[0], label = 'Historic Median')
    
    ax.set_yscale(y_scale)
    ax.set_ylabel(f'{timestep.capitalize()} Flow ({units})', fontsize=12)
    ax.set_xlabel(x_lab, fontsize=12)
    ax.legend(ncols = 2, fontsize = 10, bbox_to_anchor = (0, -.5, 1.0, 0.2), loc = 'upper center')    
    ax.grid(True, linestyle='--', linewidth=0.5)
    ax.set_title(f'{timestep.capitalize()} Streamflow Ranges\nHistoric & Synthetic Timeseries at One Location\n{title_addon}')
    plt.tight_layout()
    
    if savefig:
        plt.savefig(f'{fig_dir}/flow_ranges_{timestep}.png', dpi = 150)
    return

Assuming I have loaded a pd.DataFrame named Qs which contains synthetic timeseries in each column, and a single historic timeseries Qh, I can quickly plot daily (or weekly or monthly) flow ranges using:

# Useage
plot_flow_ranges(Qh, Qs, timestep='daily')

This figure provides a lot of useful information about our synthetic ensemble:

  • The synthetic ensembles envelope the limited historic values, and include more extreme high and low flows
  • The seasonal trends in the synthetics align with the historic
  • The median of the synthetic ensemble aligns very closely with the median of the historic data.

Flow duration curve ranges

The plot_fdc_ranges() function will calculate an FDC for each year of the historic and synthetic data, and plot the ranges that these FDCs take. It will also "total" FDCs calculated for the entirety of the historic or synthetic records.

def plot_fdc_ranges(Qh, Qs, 
                    units = 'cms', y_scale = 'log',
                    savefig = False, fig_dir = '.',
                    figsize = (5,5), colors = ['black', 'orange'],                   
                    title_addon = ""):
    """Plots the range and aggregate flow duration curves for historic and synthetic streamflows.
    
    Args:
        Qh (pd.Series): Historic daily streamflow timeseries. Index must be pd.DatetimeIndex. 
        Qs (pd.DataFrame): Synthetic daily streamflow timeseries realizations. Each column is a unique realization. Index must be pd.DatetimeIndex.
        units (str, optional): Streamflow units, for axis label. Defaults to 'cms'.
        y_scale (str, optional): Scale of the y-axis. Defaults to 'log'.
        savefig (bool, optional): Allows for png to be saved to fig_dir. Defaults to False.
        fig_dir (str, optional): Location of saved figure output. Defaults to '.' (working directory).
        figsize (tuple, optional): The figure size. Defaults to (4,4).
        colors (list, optional): List of two colors for historic and synthetic data respectively. Defaults to ['black', 'orange'].
        title_addon (str, optional): Text to be added to the end of the title. Defaults to "".
    """
    
    ## Assertions
    assert(type(Qs) == pd.DataFrame), 'Synthetic streamflow should be type pd.DataFrame.'
    assert(type(Qh.index) == pd.DatetimeIndex), 'Historic streamflow (Qh) should have pd.DatatimeIndex.'
    assert(type(Qs.index) == pd.DatetimeIndex), 'Synthetic streamflow (Qh) should have pd.DatatimeIndex.'

    
    # Calculate FDCs for total period and each realization
    nonexceedance = np.linspace(0.0001, 0.9999, 50)
    s_total_fdc = np.quantile(Qs.values.flatten(), nonexceedance)
    h_total_fdc = np.quantile(Qh.values.flatten(), nonexceedance) 
    
    s_fdc_max = np.zeros_like(nonexceedance)
    s_fdc_min = np.zeros_like(nonexceedance)
    h_fdc_max = np.zeros_like(nonexceedance)
    h_fdc_min = np.zeros_like(nonexceedance)

    annual_synthetics = Qs.groupby(Qs.index.year)
    annual_historic = Qh.groupby(Qh.index.year)

    for i, quant in enumerate(nonexceedance):
            s_fdc_max[i] = annual_synthetics.quantile(quant).max().max()
            s_fdc_min[i] = annual_synthetics.quantile(quant).min().min()
            h_fdc_max[i] = annual_historic.quantile(quant).max()
            h_fdc_min[i] = annual_historic.quantile(quant).min()
    
    ## Plotting
    fig, ax = plt.subplots(figsize=figsize, dpi=200)
    
    #for quant in syn_fdc_quants:
    ax.fill_between(nonexceedance, s_fdc_min, s_fdc_max, color = colors[1], label = 'Synthetic Annual FDC Range', alpha = 0.5)
    ax.fill_between(nonexceedance, h_fdc_min, h_fdc_max, color = colors[0], label = 'Historic Annual FDC Range', alpha = 0.3)

    ax.plot(nonexceedance, s_total_fdc, color = colors[1], label = 'Synthetic Total FDC', alpha = 1)
    ax.plot(nonexceedance, h_total_fdc, color = colors[0], label = 'Historic Total FDC', alpha = 1, linewidth = 2)

    ax.grid(True, linestyle='--', linewidth=0.5)
    ax.set_yscale(y_scale)
    ax.set_ylabel(f'Flow ({units})')
    ax.set_xlabel('Non-Exceedance Probability')
    ax.legend(fontsize= 10)
    ax.grid(True, linestyle='--', linewidth=0.5)
    
    plt.title(f'Flow Duration Curve Ranges\nHistoric & Synthetic Streamflow\n{title_addon}')
    if savefig:
        plt.savefig(f'{fig_dir}/flow_duration_curves_{title_addon}.png', dpi=200)
    plt.show()
    return

## Usage
plot_fdc_ranges(Qh, Qs)

Similar to the flow range plot above, we see that the synthetic ensemble is nicely enveloping the historic data; there are synthetic instances where the extreme flows in a particular year are more extreme than historic observations. However, the FDC of the total synthetic ensemble (if all synthetics were combined) is almost exactly equal to the total historic FDC.

Autocorrelation

Hydrologic systems are known to have memory, or correlation between streamflow at different points in time. We want to make sure that our synthetic streamflow timeseries have similar memory or autocorrelation.

The plot_autocorrelation() function will calculate autocorrelation for the historic record, and each synthetic realization for a range of different lag values. In addition to the required Qh and Qs arguments this function requires lag_range which should be a list or array of lag values to be considered.

def plot_autocorrelation(Qh, Qs, lag_range, 
                         timestep = 'daily',
                         savefig = False, fig_dir = '.',
                         figsize=(7, 5), colors = ['black', 'orange'],
                         alpha = 0.3):
    """Plot autocorrelation of historic and synthetic flow over some range of lags.

    Args:
        Qh (pd.Series): Historic daily streamflow timeseries. Index must be pd.DatetimeIndex. 
        Qs (pd.DataFrame): Synthetic daily streamflow timeseries realizations. Each column is a unique realization. Index must be pd.DatetimeIndex.
        lag_range (iterable): A list or range of lag values to be used for autocorrelation calculation.
        
        timestep (str, optional): The timestep which data should be aggregated over. Defaults to 'daily'. Options are 'daily', 'weekly', or 'monthly'.
        savefig (bool, optional): Allows for png to be saved to fig_dir. Defaults to False.
        fig_dir (str, optional): Location of saved figure output. Defaults to '.' (working directory).
        figsize (tuple, optional): The figure size. Defaults to (4,4).
        colors (list, optional): List of two colors for historic and synthetic data respectively. Defaults to ['black', 'orange'].
        alpha (float, optional): The opacity of synthetic data. Defaults to 0.3.
    """
    
    ## Assertions
    assert(type(Qs) == pd.DataFrame), 'Synthetic streamflow should be type pd.DataFrame.'
    assert(type(Qh.index) == pd.DatetimeIndex), 'Historic streamflow (Qh) should have pd.DatatimeIndex.'
    assert(type(Qs.index) == pd.DatetimeIndex), 'Synthetic streamflow (Qh) should have pd.DatatimeIndex.'

    if timestep == 'monthly':
        Qh = Qh.resample('MS').sum()
        Qs = Qs.resample('MS').sum()
        time_label = 'months'
    elif timestep == 'weekly':
        Qh = Qh.resample('W-SUN').sum()
        Qs = Qs.resample('W-SUN').sum()
        time_label = f'weeks'
    elif timestep == 'daily':
        time_label = f'days'
        
    # Calculate autocorrelations
    autocorr_h = np.zeros(len(lag_range))
    confidence_autocorr_h = np.zeros((2, len(lag_range)))
    for i, lag in enumerate(lag_range):
        h_corr = pearsonr(Qh.values[:-lag], Qh.values[lag:])
        autocorr_h[i] = h_corr[0]
        confidence_autocorr_h[0,i] = h_corr[0] - h_corr.confidence_interval().low
        confidence_autocorr_h[1,i] = h_corr.confidence_interval().high - h_corr[0]
    
    autocorr_s = np.zeros((Qs.shape[1], len(lag_range)))
    for i, realization in enumerate(Qs.columns):
        autocorr_s[i] = [pearsonr(Qs[realization].values[:-lag], Qs[realization].values[lag:])[0] for lag in lag_range]


    ## Plotting
    fig, ax = plt.subplots(figsize=figsize, dpi=200)

    # Plot autocorrelation for each synthetic timeseries
    for i, realization in enumerate(Qs.columns):
        if i == 0:
            ax.plot(lag_range, autocorr_s[i], alpha=1, color = colors[1], label=f'Synthetic Realization')
        else:
            ax.plot(lag_range, autocorr_s[i], color = colors[1], alpha=alpha, zorder = 1)
        ax.scatter(lag_range, autocorr_s[i], alpha=alpha, color = 'orange', zorder = 2)

    # Plot autocorrelation for the historic timeseries
    ax.plot(lag_range, autocorr_h, color=colors[0], linewidth=2, label='Historic', zorder = 3)
    ax.scatter(lag_range, autocorr_h, color=colors[0], zorder = 4)
    ax.errorbar(lag_range, autocorr_h, yerr=confidence_autocorr_h, 
                color='black', capsize=4, alpha=0.75, label ='Historic 95% CI')

    # Set labels and title
    ax.set_xlabel(f'Lag ({time_label})', fontsize=12)
    ax.set_ylabel('Autocorrelation (Pearson)', fontsize=12)
    ax.set_title('Autocorrelation Comparison\nHistoric Timeseries vs. Synthetic Ensemble', fontsize=14)

    ax.legend(fontsize=10)
    ax.grid(True, linestyle='--', linewidth=0.5)
    plt.subplots_adjust(left=0.12, right=0.95, bottom=0.12, top=0.92)

    if savefig:
        plt.savefig(f'{fig_dir}/autocorrelation_plot_{timestep}.png', dpi=200, bbox_inches='tight')
    plt.show()
    return

Let’s visualize the autocorrelation of daily streamflow for a maximum of 100-day lag.

## Usage
plot_autocorrelation(Qh, Qs, lag_range=range(1,100, 5), timestep='daily') 

Spatial Correlation

The plot_spatial_correlation() function generates a heatmap visualization of the Pearson correlation matrix for streamflow at different locations in both the historic and synthetic datasets.

The plot_spatial_correlation() function is different from the other three functions shown above, since it requires a set of timeseries data for multiple different locations.

Rather than providing a set of many different realizations for a single site we need to provide a single realization of synthetics across all sites of interest, with a timeseries for each site in each column. This new input is Qs_i, and must the same columns as Qh (they both must contain a timeseries for each location.

def plot_correlation(Qh, Qs_i,
                        timestep = 'daily',
                        savefig = False, fig_dir = '.',
                        figsize = (8,4), color_map = 'BuPu',
                        cbar_on = True):
    """Plots the side-by-side heatmaps of flow correlation between sites for historic and synthetic multi-site data.
    
    Args:
        Qh (pd.DataFrame): Historic daily streamflow timeseries at many different locations. The columns of Qh and Qs_i must match, and index must be pd.DatetimeIndex. 
        Qs (pd.DataFrame): Synthetic daily streamflow timeseries realizations. Each column is a unique realization. The columns of Qh and Qs_i must match, and index must be pd.DatetimeIndex.
        timestep (str, optional): The timestep which data should be aggregated over. Defaults to 'daily'. Options are 'daily', 'weekly', or 'monthly'.
        savefig (bool, optional): Allows for png to be saved to fig_dir. Defaults to False.
        fig_dir (str, optional): Location of saved figure output. Defaults to '.' (working directory).
        figsize (tuple, optional): The figure size. Defaults to (4,4).
        color_map (str, optional): The colormap used for the heatmaps. Defaults to 'BuPu.
        cbar_on (bool, optional): Indictor if the colorbar should be shown or not. Defaults to True.
    """
    
    ## Assertions
    assert(type(Qh) == type(Qs_i) and (type(Qh) == pd.DataFrame)), 'Both inputs Qh and Qs_i should be pd.DataFrames.'
    assert(np.mean(Qh.columns == Qs_i.columns) == 1.0), 'Historic and synthetic data should have same columns.'
    
    if timestep == 'monthly':
        Qh = Qh.resample('MS').sum()
        Qs_i = Qs_i.resample('MS').sum()
    elif timestep == 'weekly':
        Qh = Qh.resample('W-SUN').sum()
        Qs_i = Qs_i.resample('W-SUN').sum()
    
    ## Plotting
    fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=figsize, dpi = 250)
    
    # Heatmap of historic correlation
    h_correlation = np.corrcoef(Qh.values.transpose())
    sns.heatmap(h_correlation, ax = ax1, 
                square = True, annot = False,
                xticklabels = 5, yticklabels = 5, 
                cmap = color_map, cbar = False, vmin = 0, vmax = 1)

    # Synthetic correlation
    s_correlation = np.corrcoef(Qs_i.values.transpose())
    sns.heatmap(s_correlation, ax = ax2,
                square = True, annot = False,
                xticklabels = 5, yticklabels = 5, 
                cmap = color_map, cbar = False, vmin = 0, vmax = 1)
    
    for axi in (ax1, ax2):
        axi.set_xticklabels([])
        axi.set_yticklabels([])
    ax1.set(xlabel = 'Site i', ylabel = 'Site j', title = 'Historic Streamflow')
    ax2.set(xlabel = 'Site i', ylabel = 'Site j', title = 'Synthetic Realization')
    title_string = f'Pearson correlation across different streamflow locations\n{timestep.capitalize()} timescale\n'
        
    # Adjust the layout to make room for the colorbar
    if cbar_on:
        plt.tight_layout(rect=[0, 0, 0.9, 0.85])
        cbar_ax = fig.add_axes([0.92, 0.1, 0.02, 0.6])  # [left, bottom, width, height]
        cbar = fig.colorbar(ax1.collections[0], cax=cbar_ax)
        cbar.set_label("Pearson Correlation")
    plt.suptitle(title_string, fontsize = 12)
    
    if savefig:
        plt.savefig(f'{fig_dir}/spatial_correlation_comparison_{timestep}.png', dpi = 250)
    plt.show()
    return

To use this function, I need to load a different pd.DataFrame containing synthetic timeseries at all of my study locations.

## Usage
single_synthetic_all_sites = pd.read_csv('synthetic_streamflow_single_realization_all_sites.csv', sep=',', index_col=0, parse_dates=True)

plot_correlation(Qh_all_sites, single_synthetic_all_sites, timestep='weekly')

Across the different streamflow sites, we see that the synthetic timeseries do a decent job at preserving spatial correlation, although correlation is decreased for sites which are not strongly correlated in the historic record.

Conclusions

I hope these functions save you some time in your own work, and allow you expand your synthetic diagnostic considerations.

References

[1] Kirsch, B. R., Characklis, G. W., & Zeff, H. B. (2013). Evaluating the impact of alternative hydro-climate scenarios on transfer agreements: Practical improvement for generating synthetic streamflows. Journal of Water Resources Planning and Management139(4), 396-406.

[2] Nowak, K., Prairie, J., Rajagopalan, B., & Lall, U. (2010). A nonparametric stochastic approach for multisite disaggregation of annual to daily streamflow. Water Resources Research46(8).

Exploring Internal Variability with Hidden Markov Models in Our Newly Published eBook Interactive Tutorial

We recently published two new Jupyter Notebook tutorials as technical appendices to our eBook on Addressing Uncertainty in MultiSector Dynamics Research. Dave debuted his tutorial on the blog earlier this month here. This post will cover my addition to the eBook, which outlines a hidden Markov modeling approach to creating synthetic streamflow scenarios. Similarly to Dave’s, this post will outline the notebook’s connections to topics in the main text of the eBook rather than detailing the code demonstrated in the notebook.

In this post, I’ll first give a brief overview of how a Hidden-Markov model-based streamflow generator can help facilitate exploratory modeling in the Colorado River Basin. Then I’ll go through examples of how it can be used to explore internal variability that characterizes the region’s hydroclimate.

Exploratory Modeling for Colorado River Basin Planning and Management

The Colorado River Basin (CRB) is experiencing an unprecedented water shortage crisis brought upon by large hydroclimatic changes and over-allocation of the Colorado river. Parts of the basin are experiencing warming that is nearly twice the global average1. Even if above average snowpack can provide additional water, dry soils absorb this flow rather than allowing the runoff to reach and refill downstream reservoirs. The complexity of accommodating the demands of growing cities in the basin combined with a diminishing water supply makes planning and management a growing challenge in the region. Effective planning and management in the CRB must consider the fact that the region is experiencing a shift to a more arid climate. The historic streamflow record is now not a sufficient representation of future hydroclimate. Tools are needed that allow the exploration of the vulnerabilities of the CRB in future conditions that are plausible but haven’t been experienced before.

Herein lies the value of exploratory modeling as discussed in Chapter 2.2 of the eBook. Exploratory modeling can be used to run simulation models under vast ensembles of potential futures and then identify those with consequential effects. Rather than trying to predict future conditions exactly, the focus is shifted to a more holistic approach of discovering what conditions can lead to futures that cause system vulnerability or failure. The simulation model we use is a monthly water allocation model called StateMod which the State of Colorado uses for planning and simulating future management policies. In order to use this model in an exploratory modeling context, we need to find a way of generating plausible future scenarios that can be used to force the model. The future of the CRB will be defined by many uncertainties including demand, population growth, and policy changes pertaining to the Colorado River Compact. In this tutorial, we will focus specifically on the changing hydroclimate in the region and on a tool that allows us to generate streamflow scenarios for basins in the state of Colorado.

An HMM as a tool for exploring internal variability

As discussed in Chapter 3.3.6 of the eBook, there are many different ways to generate synthetic input data. To generate traces of streamflow, one can use physically-informed GCM projections, purely statistical approaches, or a hybrid of the two. In this tutorial, we utilize a Hidden Markov Model (HMM)-based streamflow generator, which is a purely statistical approach, to generate synthetic traces of streamflow. The HMM is (1) conditioned directly on regional streamflow and therefore better able to capture local drought dynamics than downscaled CMIP5/6 projections and (2) more computationally tractable to help facilitate exploratory analyses. The HMM can be utilized to generate stationary representations of streamflow and multipliers or shifts can be applied to the parameters to create non-stationary traces.

Decadal Drought Conditions

If we plot the historical annual streamflow at the outlet gage of the Upper Colorado River Basin (UCRB) in Figure 1, we can quantify decadal drought risk in the observed period. We use a metric proposed in Ault et al. (2014). The authors define a decadal drought as where the 11-year running mean of the available record is more than a half a standard deviation below the overall mean of the record. By this metric, the region has experienced two decadal droughts.

Figure 1: Historical annual streamflow at the outlet of the UCRB. Decadal drought periods are overlaid in tan.

However, this historical record has very few instances of extremes and thus underestimates flood/drought hazards. It is important to remember that the streamflow that we have observed in the region over the last century is only one instance of the hydrology that could occur since the atmosphere is an inherently stochastic system. Thus, we require a tool that will allow us to see multiple plausible realizations of the streamflow record to understand the internal variability that characterizes the historical period. This is where the HMM-based generator comes in. If a system follows a Markov process, it switches between a number of “hidden states” dictated by a transition matrix. The UCRB is best described by a two-state system: wet or dry. Each state has its own probability distribution. We use a Gaussian distribution for each hidden state’s distribution and thus each distribution is defined by a mean and standard deviation. One can draw samples from this distribution to create synthetic flows that fit the properties of the historical distribution. HMMs are an attractive choice for this region because they can simulate persistence (i.e., long duration droughts), which is needed to create future conditions that the region will likely experience. Figure 2 shows the 2-state Gaussian HMM that is fit in the tutorial and the Viterbi sequence (i.e. the classification of years in each state).

Figure 2: a) Example wet and dry state distributions and b) resulting Viterbi sequence

If we sample 105-year traces from the HMM, we are creating alternative hydrologies that the region could have experienced that are different that what was experienced historically. In the historical dataset, there are two decadal drought periods which leads to 20 years classified in decadal drought conditions (1959-1969 and 2000-2010). However, if we sample 1000 times from the model to create 1000 more 105-year traces, we have the capacity to create realizations with many more years classified in decadal drought conditions. Figure 3 shows a histogram of the years classified in decadal drought across the 1000 samples. The historic trace is shown as a red vertical line. Note how the HMM is creating many instances of >20 years of decadal drought conditions.

Figure 3: The number of years in a 105-year trace classified to be in decadal drought conditions across 1000 samples from the stationary generator

Figure 4 shows one of the 1000 traces where 53 years (or half of the whole record) are classified in decadal drought conditions.

Figure 4: Example trace from stationary generator which is characterized by more frequent drought conditions

Multi-Decadal Drought Conditions

The decadal drought metric in Ault et al. (2014) can be extended to a multi-decadal drought context by inspecting where the 35-year running mean dips below the drought threshold. By this metric, the region has experienced no multi-decadal droughts (Figure 5).

Figure 5: Historical annual streamflow at the outlet of the UCRB. The 35-year rolling mean does not cross the drought threshold.

However, the stationary synthetic generator does have the capacity to create multi-decadal drought, such as the trace shown in Figure 6, even though the historical record does not have one.


Figure 6: An example trace from the stationary generator that exhibits a multi-decadal drought

The traces shown in Figure 4 and 6 are characterized by substantially more years of drought and since they were created with a stationary generator, they are examples of what the UCRB could have experienced historically rather than what is shown in Figure 1. These examples demonstrate that natural variability alone can create records that contain more frequent droughts than what has been seen historically (even in the absence of climate changes).

Drought Severity

Is the HMM creating droughts that are more severe than what was experienced historically? Let’s use the decadal drought as an example. We can define severity as the maximum volume (cf) that rolling mean dips below the drought threshold. Figure 7 shows the histogram of these dips and once again, we see that we are creating droughts that dip substantially below the threshold unlike the behavior observed in Figure 1 and represented as the red line.


Figure 7: The severity of the decadal drought periods in each 105-year trace across 1000 samples from the stationary generator

Figure 8 shows an example trace of severe drought conditions where the rolling mean substantially dips below the drought threshold. Thus we see that natural variability alone can create records that contain more severe droughts than what has been seen historically.

Figure 8: An example trace from the stationary generator that exhibits severe drought conditions.

Conclusion

This tutorial was meant to demonstrate the utility of using the HMM-based generator in a stationary context with a focus on its ability to create more frequent and severe decadal and multi-decadal droughts. Sampling from the generator allows the ability to explore the rich internal variability that characterizes the region. We demonstrate that internal variability captured by the generator can produce more frequent and severe droughts than what was experienced historically and that this is in absence of adjustments to the HMM to create climate change proxies. Thus, it is important to acknowledge that future droughts may become even more severe and frequent based on phases of internal variability interacting with climate changes.

Though we won’t go over it in this blog post, the technical appendix also covers adjustments that can be made to the HMM parameterization to create non-stationary traces which you can find here.

References

1 The Nature Conservancy: https://www.nature.org/en-us/about-us/where-we-work/priority-landscapes/colorado-river/colorado-river-in-crisis/

Ault, T. R., Cole, J. E., Overpeck, J. T., Pederson, G. T., & Meko, D. M. (2014). Assessing the risk of persistent drought using climate model simulations and paleoclimate data. Journal of Climate27(20), 7529-7549.

Introducing the GRRIEn Analysis Framework: Defining standards for reproducible and robust supervised learning of earth surface processes at large spatial scales

I’m very excited to write this blogpost to highlight the GRRIEn framework that was developed in part by Dr. Elizabeth Carter at Syracuse University, along with collaborators Carolynne Hultquist and Tao Wen. I first saw Liz present the GRRIEn framework at the Frontiers in Hydrology conference last year and I knew it was a framework that I wanted to eventually demo on the blog. The associated manuscript (now published here) that introduces the GRRIEn (Generalizable, Reproducible, Robust, and Interpreted Environmental) framework is a cheat sheet of best management practices to extract generalizable insight on environmental systems using supervised learning of global environmental observations (EOs; satellites and coupled earth systems models), including standards for reproducible data engineering, robust model training, and domain-specialist interpreted model diagnostics (Carter et al., 2023).

In short, this is a paper that I wish existed when I first started diving into data science and machine learning in graduate school. If you are a student/have a student who is new to statistics, data science, and machine learning for environmental applications, I highly recommend this paper as the single most useful one-stop shop read. Below, I will introduce a bit about Liz and her group, the steps of the framework, and include some of the awesome ways that Liz has incorporated GRRIEn into her classes.

The CHaRTS Lab

Dr. Carter is based at Syracuse University and runs the Climate Hazards Research Team Syracuse (CHaRTS). CHaRTS uses proxy observations of the hydrologic cycle from satellites, photographs, and earth systems models to support fair and stable management of water resources and hydroclimatic risk in the twenty-first century. I talked to Liz about creating the GRRIEn framework and the results that she has seen in the classroom.

Why did you decide to create GRRIen?

“GRRIEn was created as a teaching tool for my graduate class, Environmental Data Science (EDS). EDS was intended to equip students with the computation tools they would need to use global earth observations for thesis/dissertation research. Some of my students have an interest in spatial statistics and machine learning, but most of them are domain specialists who want to use global earth observations to learn more about very specific environmental processes. The GRRIEn method was created for these students, the domain specialists. The goal is to reduce barriers to accessing insight from global earth observations associated with computational methods and advanced statistical techniques. For example, many students struggle to find and process spatial data. In EDS, they adopt a standard reproducible computational pipeline in the form of a GitHub repository for accessing raw data and creation of an analysis-ready dataset. For students who have limited statistical skills, I teach a few simple model-agnostic tools to diagnose and control for feature and observation dependence in multivariate observational spatiotemporal datasets. Most importantly, I teach methods in model interpretation. We need domain specialists to evaluate whether data-driven models are a physically plausible representation of environmental processes if we want to use global earth observations for knowledge discovery.”

What kind of results have you seen as you incorporate GRRIen in the classroom?

“My favorite part of EDS is seeing students going from never having written a line of code to processing massive volumes of data. Because so much of the process has been constrained by the GRRIEn method, students can really spread their wings in terms of application. For the past three years, I’ve had several students take their EDS project to conferences like AGU, or submit it for publication.”

Overview of the GRRIEn Framework

As the volume of earth observations from satellites, global models, and the environmental IoT continues to grow, so does the potential of these observations to help scientists discover trends and patterns in environmental systems at large spatial scales. Because many emerging global datasets are too large to store and process on personal computers, and because of scale-dependent qualities of spatial processes that can reduce the robustness of traditional statistical methods, domain specialists need targeted and accessible exposure to skills in reproducible scientific computing and spatial statistics to allow the use of global observations from satellites, earth systems models, and the environmental IoT to generalize insights from in-situ field observations across unsampled times and locations. The GRRIEn framework (which stands for generalizable, robust, reproducible, and interpretable environmental analytic framework) was developed for this purpose (Carter al., 2023). Below are the four key components of the framework, but please refer to the manuscript for full descriptions and examples. 

Generalizable: How well do your experimental results from a sample extend to the population as a whole?

Three common sources of global gridded earth observations (EOs) include active satellite remote sensing data, such as synthetic aperture radar imagery (left), passive satellite remote sensing data, including optical and passive microwave imagery (center), and gridded outputs from global coupled earth system models (right).

A primary motivation to generate global EOs is to allow scientists to translate insight from our limited field measurements to unsampled times and locations to improve the generalizability of our earth systems theories. As global EOs are (approximately) spatiotemporally continuous observations, the overall objective of GRRIEn analysis is to train a supervised learning algorithm that can predict an environmental process using globally available EOs as input data. This algorithm can be used to generalize insights from direct observations of an environmental process sampled experimentally or in-situ across unlabeled times or locations where global EOs are available (i.e. through interpolation, extrapolation, and diagnostic modeling)

Robust: Do your statistics show good performance on data drawn from a wide range of probability and joint probability distributions?

In order for your model to generalize well, you must quantify and account for scale-dependent multicollinearity and spatial/temporal dependence in your data. For example, multicollinearity occurs when one or more predictor variables are linearly related to each other. It is a problem for prediction and inference because the accuracy of predictions depends on the exact structure of multicollinearity that was present in the training dataset being present in the prediction dataset. Since it is associated with model instability in most machine learning and some deep learning applications, it also impacts diagnostic modeling; the model is less likely to interpret the system in a physically plausible way, and small changes in training data can lead to big changes in model weights and parameters.

Furthermore, correlation structure tends to be dynamic in space and time. The figure below shows an example of correlation structure between summertime air temperature and precipitation from 1950-2000 (left) and projected from 2050-2100 (right). One can see that the correlation between air temperature and precipitation will change for many locations in the US under climate change. This suggests that statistical models trained using temperature and precipitation under 20th century conditions will not make robust estimates in the 21st century.

Pearson’s correlation coefficient [scaled between -1 and 1, color bar (Benesty et al.
2009)] between bias-corrected statistically downscaled Climate Model Intercomparison
Project 5 ensemble mean monthly precipitation and daily max temperature. Historical
observations 1950-1999 (left) and moderate emissions forecast (RCP 4.5) for 2050-2099 (right) both indicate spatiotemporal variability collinearity between summertime maximum temperature and precipitation. Covariance of meteorological variables is a signature of local climate. As local climates shift due to global warming, so will the local covariability of meteorological variables (right). This generates complexity for predicting environmental process response to meteorological variables under climate change (Taylor et al. 2012)

We can’t expect to collect a sample of future covariability structure because these conditions haven’t happened yet. So how do we address these issues? The manuscript summarizes a great checklist of steps to make sure your model development is robust to both dependent features and observations.  

Checklist for robust data engineering and model development with dependent
features (left) and observations (right) in spatiotemporal systems.

Reproducible: Can other scientists understand and replicate your analysis and yield the same results?

The best way to facilitate this is to create a clear repository structure which contains a README, a container image that has specified versions of all packages that are used, and code that facilitates both the download of larger geospatial datasets that can’t be stored in a repository and code to reproduce analyses and figures.

In Liz’s Environmental Data Science class at Syracuse (CEE 609), she has her students complete a semester long project where every piece of the project is documented in a GRRIEn repository structure (see here for a great example). By the end of the semester, she noted that her students have a written paper, fully documented workflow, and a populated repository that often has served as a basis for a journal publication for students in her research group. 

Interpretable: Do your model parameters reflect a physically plausible diagnosis of the system?

The most important step to ensuring that your model will make the right predictions in different contexts is to make sure those predictions are happening for the right reason: have your model weights and parameters diagnosed the importance of your predictors, and their relationship to the environmental process that serves as your predictand, in a physically plausible way? The authors suggest forming a set of interpretable hypotheses before modeling that denotes the data that you are using and the relationships you expect to find and then utilizing local and global interpretation methods to confirm or reject the original hypotheses.   

Conclusion/Resources

I think the GRRIEn framework is a really great tool for young researchers embarking on data science and machine learning projects. If you are a newer faculty that is interested in incorporating some of these concepts into your class, Liz has made “Code Sprints” from her class available here. There are Jupyter Notebooks on working with Python, raster data, vector data, regressions, autocorrelation, and multicollinearity. Be sure to keep up with the work coming out of the CHaRTS lab on Twitter here!

Find the paper here: https://doi.org/10.1175/AIES-D-22-0065.1

Copula Coding Exercise

This Halloween, you may be scared by many things, but don’t let one of them be copulas. Dave Gold wrote an excellent post on the theory behind copulas and I’ll just build off of that post with a simple coding exercise. When I first heard about copulas, they sounded really relevant to helping me come up with a way to capture joint behavior across variables (which I then wanted to use to capture joint hydrologic behavior across watersheds). However, because I hadn’t learned about them formally in a class, I had a hard time translating what I was reading in dense statistics textbooks into practice and found myself blindly trying to use R packages and not making much progress. It was very helpful for me to start to wrap my head around copulas by just starting with coding a simple trivariate Gaussian copula from scratch. If copulas are new to you, hopefully this little exercise will be helpful to you as well.

Let’s pose a scenario that will help orient the reason why a copula will be helpful. Let’s imagine that we have historical daily full natural flow for three gauged locations in the Central Valley region of California: Don Pedro Reservoir, Millerton Lake, and New Melones Lake. Because these three locations are really important for water supply for the region, it’s helpful to identify the likelihood of joint flooding or drought tendencies across these basins. For this post, let’s focus on joint flooding.

First let’s fit distributions for each basin individually. I calculate the water year and create another column for that and then take the daily flow and determine the aggregate peak three-day flow for each year and each basin. Given the different residence times in each basin, a three-day aggregate flow can better capture concurrent events than using a one-day flow. Let’s term the flows in each basin [xt,1…xt,n] which is the 3-day peak annual flows in each of the n basins in year t.

library(evd)
library(mvtnorm)
library(extRemes)
library(mnormt)

#Read in annual observed flows 
TLG_flows=read.table("C:/Users/rg727/Downloads/FNF_TLG_mm.txt")
MIL_flows=read.table("C:/Users/rg727/Downloads/FNF_MIL_mm.txt")
NML_flows=read.table("C:/Users/rg727/Downloads/FNF_NML_mm.txt")

#Calculate water year column
TLG_flows$water_year=0

for (i in 1:dim(TLG_flows)[1]){
  if (TLG_flows$V2[i]==10|TLG_flows$V2[i]==11|TLG_flows$V2[i]==12){
    TLG_flows$water_year[i]=TLG_flows$V1[i]+1
  } else{
    TLG_flows$water_year[i]=TLG_flows$V1[i]}
}


MIL_flows$water_year=0

for (i in 1:dim(MIL_flows)[1]){
  if (MIL_flows$V2[i]==10|MIL_flows$V2[i]==11|MIL_flows$V2[i]==12){
    MIL_flows$water_year[i]=MIL_flows$V1[i]+1
  } else{
    MIL_flows$water_year[i]=MIL_flows$V1[i]}
}


NML_flows$water_year=0

for (i in 1:dim(NML_flows)[1]){
  if (NML_flows$V2[i]==10|NML_flows$V2[i]==11|NML_flows$V2[i]==12){
    NML_flows$water_year[i]=NML_flows$V1[i]+1
  } else{
    NML_flows$water_year[i]=NML_flows$V1[i]}
}



#Calculate 3-day total flow


TLG_flows$three_day=0

for (i in 1:length(TLG_flows$V4)){
  if (i == 1){
    TLG_flows$three_day[i]=sum(TLG_flows$V4[i],TLG_flows$V4[i+1])
  }
  else
    TLG_flows$three_day[i]=sum(TLG_flows$V4[i-1],TLG_flows$V4[i],TLG_flows$V4[i+1])
}

MIL_flows$three_day=0

for (i in 1:length(MIL_flows$V4)){
  if (i == 1){
    MIL_flows$three_day[i]=sum(MIL_flows$V4[i],MIL_flows$V4[i+1])
  }
  else
    MIL_flows$three_day[i]=sum(MIL_flows$V4[i-1],MIL_flows$V4[i],MIL_flows$V4[i+1])
}

NML_flows$three_day=0

for (i in 1:length(NML_flows$V4)){
  if (i == 1){
    NML_flows$three_day[i]=sum(NML_flows$V4[i],NML_flows$V4[i+1])
  }
  else
    NML_flows$three_day[i]=sum(NML_flows$V4[i-1],NML_flows$V4[i],NML_flows$V4[i+1])
}



#Calculate peak annual flow
TLG_flow_peak_annual=aggregate(TLG_flows,by=list(TLG_flows$water_year),FUN=max,na.rm=TRUE, na.action=NULL)
MIL_flow_peak_annual=aggregate(MIL_flows,by=list(MIL_flows$water_year),FUN=max,na.rm=TRUE, na.action=NULL)
NML_flow_peak_annual=aggregate(NML_flows,by=list(NML_flows$water_year),FUN=max,na.rm=TRUE, na.action=NULL)

#Remove extra year (1986) from TLG
TLG_flow_peak_annual=TLG_flow_peak_annual[2:33,]


Next, we need to fit individual marginal distributions for each location. Since we are interested in capturing flood risk across the basins, a common marginal distribution to use is a Generalized Extreme Value distribution (GEV). We fit a different GEV distribution for each basin and from here, we create a bit of code to determine the peak three-day flows associated with a historical 10-year return period event.

#Determine level of a 10-year return period

getrlpoints <- function(fit){
  
  xp2 <- ppoints(fit$n, a = 0)
  ytmp <- datagrabber(fit)
  y <- c(ytmp[, 1])
  sdat <- sort(y)
  npy <- fit$npy
  u <- fit$threshold
  rlpoints.x <- -1/log(xp2)[sdat > u]/npy
  rlpoints.y <- sdat[sdat > u]
  rlpoints <- data.frame(rlpoints.x, rlpoints.y)
  
  return(rlpoints)
}


getcidf <- function(fit){
  
  rperiods = c(2,5,10,20,50,100,500,1000)
  bds <- ci(fit, return.period = rperiods)
  c1 <- as.numeric(bds[,1])
  c2 <- as.numeric(bds[,2])
  c3 <- as.numeric(bds[,3])
  ci_df <- data.frame(c1, c2, c3, rperiods)
  
  return(ci_df)
}


gevfit_MIL <- fevd(MIL_flow_peak_annual$three_day)
rlpoints <- getrlpoints(gevfit_MIL)
ci_df <- getcidf(gevfit_MIL)

gevfit_NML <- fevd(NML_flow_peak_annual$three_day)
rlpoints <- getrlpoints(gevfit_NML)
ci_df <- getcidf(gevfit_NML)

gevfit_TLG <- fevd(TLG_flow_peak_annual$three_day)
rlpoints <- getrlpoints(gevfit_TLG)
ci_df <- getcidf(gevfit_TLG)



#These are the historical thresholds defined from the GEV fit to the three-day sum of the peak flows 
ten_yr_event_TLG=58.53738	
ten_yr_event_MIL=40.77821
ten_yr_event_NML=58.95369

Now that we have our marginals fit, we need to use these in some way to fit a Gaussian copula. By definition, a copula is a multivariate cumulative distribution function for which the marginal probability distribution of each variable is uniform on the interval [0, 1]. So we need to transform our observations to be psuedo-uniform. To do so, we push the values of [xt,1…xt,n] through the inverse of the CDF of the respective fitted GEV function. These marginals are now uniformly distributed between 0 and 1. Let’s term these values now [ut,1…ut,n].

#Now we want to fit the copula on our historical flows. First create u's

u_TLG = pgev(TLG_flow_peak_annual$three_day, loc=gevfit_TLG$results$par[1], scale=gevfit_TLG$results$par[2], shape=gevfit_TLG$results$par[3],lower.tail = TRUE)
u_MIL = pgev(MIL_flow_peak_annual$three_day, loc=gevfit_MIL$results$par[1], scale=gevfit_MIL$results$par[2], shape=gevfit_MIL$results$par[3],lower.tail=TRUE)
u_NML = pgev(NML_flow_peak_annual$three_day, loc=gevfit_NML$results$par[1], scale=gevfit_NML$results$par[2], shape=gevfit_NML$results$par[3],lower.tail=TRUE)

#Let's also convert the thresholds to u's for each basin 
u_TLG_event = pgev(ten_yr_event_TLG, loc=gevfit_TLG$results$par[1], scale=gevfit_TLG$results$par[2], shape=gevfit_TLG$results$par[3], lower.tail = TRUE)
u_MIL_event = pgev(ten_yr_event_MIL, loc=gevfit_MIL$results$par[1], scale=gevfit_MIL$results$par[2], shape=gevfit_MIL$results$par[3],lower.tail=TRUE)
u_NML_event = pgev(ten_yr_event_NML, loc=gevfit_NML$results$par[1], scale=gevfit_NML$results$par[2], shape=gevfit_NML$results$par[3],lower.tail=TRUE)

Now we address the Gaussian part of the copula. Recall the following in Dave’s post:

Where Φ_R is the joint standard normal CDF with: 

mean and var

ρ_(i,j) is the correlation between random variables X_i and X_j.

Φ^-1 is the inverse standard normal CDF.

So we have our u vectors that we now need to push through an inverse standard normal distribution to get Z scores.

#Now takes these u's and push them through a standard normal to get Z scores 
 
Z_TLG=qnorm(u_TLG)
Z_MIL=qnorm(u_MIL)
Z_NML=qnorm(u_NML)

#Let's do the same for the historical events
Z_TLG_event=qnorm(u_TLG_event)
Z_MIL_event=qnorm(u_MIL_event)
Z_NML_event=qnorm(u_NML_event)

The last part that we need for our copula is a spearman rank correlation matrix that captures the structural relationship of the peak flows across the three basins.

cor_matrix=cor(cbind(Z_TLG,Z_MIL,Z_NML),method = "spearman")

Finally, we have all the components of our copula. What question can we ask now that we have this copula? How about “What is the likelihood of exceeding the historical 10-year event in each basin?”. We code up the following formula which expresses this exact likelihood. More details can be found in this very helpful book chapter:

We calculate this with the following line and we find an exceedance probability of 0.0564.

#Now calculate the probability of exceeding the historical threshold in all basins

exceedance=1-pnorm(Z_TLG_event)-pnorm(Z_MIL_event)-pnorm(Z_NML_event)+pmnorm(c(Z_TLG_event,Z_MIL_event),mean=rep(0,2),varcov = cor(cbind(Z_TLG,Z_MIL)))+pmnorm(c(Z_TLG_event,Z_NML_event),mean=rep(0,2),varcov = cor(cbind(Z_TLG,Z_NML)))+pmnorm(c(Z_MIL_event,Z_NML_event),mean=rep(0,2),varcov = cor(cbind(Z_MIL,Z_NML)))-pmnorm(c(Z_TLG_event,Z_MIL_event,Z_NML_event),mean=rep(0,3),varcov = cor(cbind(Z_TLG,Z_MIL,Z_NML)))

Now this intuitively makes sense because a 10-year event has a 10% or 0.1 probability of being exceeded in any given year. If you calculate a joint likelihood of exceedance of this event across multiple basins, then this event is even less likely, so our lower probability of 0.0564 tracks. Note that you can create synthetic realizations of correlated peak flows by simulating from from a multivariate normal distribution with a mean of 0 and the Spearman rank correlation matrix defined above. Hopefully this was a simple but helpful example to demonstrate the key components of the Gaussian copula. The script and corresponding datasets can be found in this repo.

A Hidden-Markov Modeling Based Approach to Creating Synthetic Streamflow Scenarios

As Dave mentioned in his prior post, our recently published eBook, Addressing Uncertainty in Multisector Dynamics Research, provides several interactive tutorials for hands on training in model diagnostics and uncertainty characterization. This blog post is a preview of an upcoming extension of these trainings featuring an interactive tutorial on creating synthetic streamflow scenarios using a Hidden-Markov Modeling (HMM)- based approach specifically for the Upper Colorado River Basin. This training heavily utilizes Julie Quinn’s prior tutorial on fitting an HMM-based streamflow generator and is inspired by her Earth’s Future paper on understanding if exploratory modeling can truly be scenario-neutral.

In this post, we will primarily be focusing on decadal hydrologic drought in the region as a metric of interest to explore in the historic period, the stationary-synthetic, and non-stationary synthetic ensembles. We will also explore how to place CMIP5 climate projections in the context of our synthetically-created ensembles.  All code for this demo is available in a Jupyter Notebook on Github that follows the progression of this blog step by step (though some content may be subject to change before going into the eBook formally). The code is written in Python.

Background

In the Western United States, and particularly the Colorado River Basin, recent studies have used tree-ring reconstructions to suggest that the megadrought that has been occurring in the Southwest over the past 22 years is the regions worst drought since about 800 AD (Williams et al., 2022). The study’s lead author, UCLA climatologist Park Williams, suggested that had the sequence of wet-dry years occurred as observed but without the human-caused drying trend, the 2000s would have likely still been dry, but not on the same level as the worst of the last millennium’s megadroughts.

Lake Powell shows persistent effects from drought (Source: U.S. Bureau of Reclamation)

The recent trend of warming and reduced soil moisture in the SW is becoming extremely problematic from a water systems planning and management perspective for the Colorado River Basin. It has becoming rapidly clear that the river is completely over-allocated and won’t be able to sustain flow requirements as dictated by the Colorado Compact. Thus, there has been an increasing focus in understanding how susceptible water systems in this region are to plausible future streamflow scenarios, particularly those that may contain persistent droughts. In this tutorial, we’ll discuss how to create these scenarios using a Hidden Markov Model (HMM)- based synthetic generator.

Observed Data

First let’s take a look at the observed data from 1909-2013 for the outlet gauge of the Upper Colorado River that is located at the CO-UT stateline. Below we create a plot of the annual streamflow. Let’s also add an 11-year rolling mean which will give us a sense of long-term averaged behavior.

The Colorado Compact prescribing flows between the Upper and Lower Colorado Basins was negotiated using data prior to 1922, which we now know was one of the consistently wetter periods in the record. It’s clear today that since the 1980s, the Southwest has been experiencing imminent aridification and that this observed record alone isn’t an accurate representation of what future climate might look like in this region.

Let’s get a little more specific and formally quantify decadal drought risk in the observed period. We use a metric proposed in Ault et al. (2014). The authors define a decadal drought as a streamflow value that is more than a half a standard deviation below the 11-year running mean of the available record.

By this metric, the Upper Colorado Basin region has experienced two decadal droughts in the past 100 years.

Synthetic Stationary Generator to Understand Natural Variability

It is important to remember that the historical record of the region is only one instance of streamflow, which ultimately comes from an inherently stochastic and chaotic system (the atmosphere). We require a tool that will allow us to see multiple plausible realizations of that same variability. The tool that we use to develop synthetic flows for the region is a Gaussian Hidden Markov Model. If a system follows a Markov process, it switches between a number of “hidden states” dictated by a transition matrix. Each state has its own Gaussian probability distribution (defined by a mean and standard deviation) and one can draw from this distribution to create synthetic flows that fit the properties of the historical distribution. HMMs are an attractive choice for this region because they can simulate long persistence, particularly long droughts, which is an important consideration for water systems vulnerabilities. The figure below shows an example of a 2-state Gaussian HMM that we will fit for the region.

At this point, I will direct readers to either Julie’s blog or my own Jupyter notebook, which have all the details and code required to fit the model and investigate the parameters. Now let’s create a sample 105-year synthetic realization and see what the drought dynamics look like in the synthetic scenario that we created.

Wow, it looks like we’ve created something like a megadrought type situation by just sampling within the bounds of the historical distribution! You can keep sampling from the model to create more 105-year traces and note how the location and number of decadal droughts changes. Maybe you will experience two like in the the historical period, or fewer or more. Re-sampling from the model will demonstrate how different 105-year traces can look just within the range of natural variability and that what we’ve observed historically is only one trace. It’s also important to remember that when droughts occur can also define the ultimate effect of the drought (i.e. is it at a time when there is a large population growth or a time when humans can adapt by conserving or building more infrastructure?). A hydrologic drought need not manifest into an agricultural drought of the same magnitude if stored surface water is available, for example.

Non-Stationary Synthetic Generator to Impose Climate Changes

Now, we want to be able to create flows under non-stationary conditions to get a better understanding of what flows can look like under climate changes. In order to create flows under non-stationary conditions, we can toggle the parameters of the HMM model in order to create systematic changes to the model that can represent a changing climate. The HMM has 6 parameters that define it. In the historical distribution, we fit a baseline value for these parameters. In this non-stationary generator, we define a range to sample these parameters from.

ParameterCurrent ValueLower BoundUpper Bound
Log-Space Wet State Mean Multiplier1.000.981.02
Log-Space Dry State Mean Multiplier1.000.981.02
Log-Space Wet State Standard Deviation Multiplier1.000.751.25
Log-Space Dry State Standard Deviation Multiplier1.000.751.25
Change in Dry-Dry Transition Probability0.00-0.30+0.30
Change in Wet-Wet Transition Probability0.00-0.30+0.30

Now refer to the Jupyter notebook for the code for the following steps. We first sample from these parameter ranges 1000 times using the Latin Hypercube Sample functionality within SALib. We then swap out the baseline parameters with the newly sampled parameters and sample from the model. Below is an example trace from the new non-stationary model in which we are generating more instances of decadal drought.

Placing Non-Stationary Flows in the Context of CMIP5 Projections

By creating a non-stationary generator, we can broaden the drought conditions that we are creating which that can be very useful to understand how our water systems model performs under potentially extreme scenarios. However, it’s useful to place our synthetically generated flows in the context of physically-driven CMIP5 projections to get a better understanding of how the two approaches compare. This example makes use of 97 CMIP5 projections used in the Colorado River Water Availability Study (CWCB, 2012). In each of these projections, monthly precipitation factor changes and temperature delta changes were computed between mean projected 2035–2065 climate statistics and mean historical climate statistics from 1950–2013. These 97 different combinations of 12 monthly precipitation multipliers and 12 monthly temperature delta shifts were applied to historical precipitation and temperature time series from 1950–2013. The resulting climate time series were run through a Variable Infiltration Capacity (VIC) model of the UCRB, resulting in 97 time series of projected future streamflows at the Colorado‐Utah state line.

We can fit an HMM to each trace of projected streamflow and retain the fitted parameters. Then we take the ratio between these parameters and the baseline HMM parameters in order to ultimately have a set of multipliers associated with each CMIP5 projection. This allows us to place the CMIP5 projections into the same space that we are sampling for our synthetic realizations.

In order to visualize these two ensembles in the same space, we plot a response surface that allows us to map how combinations of HMM parameters tend to lead to increased decadal drought occurrence. In order to get a continuous surface, we’ll fit a non-linear regression that ultimately maps the parameter values to decadal drought occurrence over a set of grid points. This response surface is shown by the color gradient below.

We choose two parameters, the dry-dry transition probability and the dry mean, that would most likely influence the decadal drought occurrence and create the response surface (note that a more formal sensitivity analysis can be used to identify the top most influential parameters as well). From the response surface, you can see that we’re more likely to see more instances of decadal droughts when we (1) increase the dry-dry transition probability, which inherently will increase persistence of the dry state, and (2) when we make the dry state log mean drier. We also place those multipliers from the CMIP5 projections into the space as white dots. Note that these CMIP5 projections tend to span the extent of the dry mean sample space, but are less representative of the dry transition probability sample space, particularly the increase in the dry-dry transition probability which leads to more persistent droughts. This suggests that the types of hydrological droughts represented in the projections tend to only be wetter to slightly drier than our baseline.

Both methods of producing these scenarios are valid, though if your goal is to produce a variety of ensembles that are characterized by many different drought characteristics, you will likely find that a generator approach will serve this purpose better.

Concluding Remarks

If you have any questions about the notebook or feedback as you go through the tutorial, please post below in the comments section or email me. We love getting as much feedback as possible before publication in the eBook.

References

Ault, T. R., Cole, J. E., Overpeck, J. T., Pederson, G. T., & Meko, D. M. (2014). Assessing the risk of persistent drought using climate model simulations and paleoclimate data. Journal of Climate27(20), 7529-7549.

CWCB (2012).Colorado River Water Availability Study Phase I Report. Colorado Water Conservation Board

Williams, A. P., Cook, B. I., & Smerdon, J. E. (2022). Rapid intensification of the emerging southwestern North American megadrought in 2020–2021. Nature Climate Change12(3), 232-234.