A time-saving tip for simulating multiple scenarios

A time-saving tip for simulating multiple scenarios

Last week I was reading a paper by Thomlinson, Arnott, and Harou (2020), A water resource simulator in Python. In this, they describe a method of evaluating many simulation scenarios (or hydrologic realizations) simultaneously, and claim that this approach results in significant improvements in computational efficiency.

I was struck because I have been constructing similar multi-scenario simulations using a different approach, by evaluating one scenario at a time.

After having thought I perfected my code, was it possible that a slight modification to the implementation could significantly improve the efficiency? (Spoiler: Yes! By many hours, in fact.)

I decided to post this here (A) as a simple demonstration in comparing runtime metrics of two different algorithm approaches, and (B) as a reminder to remain open to the possibility of simple yet significant improvements in code performance.

A GitHub repository containing all of the relevant Python scripts used below can be accessed here.


There are many reasons why an analyst might want to perform simulation of many different scenarios. For example, when simulating a water resource policy which is affected by stochastic exogenous forces, it is preferable to evaluate the policy under many different realizations of the stochastic variable. The performance of that policy can then be measured as an aggregation of the performance under the different realizations.

How one decides to simulate the different realizations can have significant impacts on the efficiency of the program.

Here, I compare two different implementations for many-scenario evaluation, and demonstrate a comparison of the methods using the shallow lake problem (Carpenter et al., 1999).

The two implementation techniques are shown graphically in the figure below.

Figure: Visual representation of the two different multi-scenario simulation implementations discusses in this post. (This figure was inspired by a similar figure contained in a Pywr presentation made available by James Thomlinson)

In the past, I (and others who came before me) have used what I refer to as a serial implementation, where the realizations are looped-through one-by-one, evaluating the model over the entire simulation time period before moving on to the next realization. Now, I am contrasting that with what I am calling the block implementation. For the block implementation, the simulation is evaluated for all realizations before progressing to the subsequent time step.

When viewed this way, it may seem obvious that the block simulation will perform more efficiently. The model is applied to the columns rather than the individual elements of a matrix of realizations or scenarios. However, hindsight is 20-20, and if you’re a water resource modeler with an engineering background (such as myself) this may not be apparent to you when constructing the model.

Now, onto the demonstration.

The Lake Problem

The shallow lake problem, as introduced by Carpenter et al. (1999), was used as a case study for design of multi-objective pollution policies by Professor Julie Quinn and has been explored in depth on this blog. While a comprehensive understanding of the lake problem, and the corresponding model, is not necessary for understanding this post, I will provide a brief summary here.

The problem centers around a hypothetical town located near a shallow lake. The town would like to discharge nutrient pollution (e.g., waste water treatment plant effluent) into the lake, while balancing the economic benefits (i.e., cost savings associated with the discharge) and environmental objectives (i.e., preserving the ecological health of the lake).

The pollution dynamics of the lake are described by a non-linear model representing pollution fluxes into and out of the lake:

X_{t+1} = X_t + a_t + Y_t + \frac{X_t^q}{1+X_t^q} - bX_t

Where X_t is the pollution concentration in the lake, a_t is the anthropogenically pollution discharged from the town, Y_t \approx Log-Normal(\mu, \sigma) is the natural pollution inflow into the lake and follows a log-normal distribution, q is the rate at which pollution is recycled from the sediment into the water column, and b is the rate at which the nutrient pollution is removed from the lake due to natural processes.

The non-linearity of the nutrient dynamics shown above result in the presence of an unstable equilibrium or ecological tipping point beyond which the lake becomes permanently eutrophic (ecologically unhealthy).

The problem then becomes identifying pollution policies that prescribe a preferable amount of pollution, a_t during each year while not crossing the ecological tipping point.

The lake model

A model can be constructed which simulates a pollution policy and measures corresponding environmental and economic performance objectives over a 100-year period. In her 2017 publication, Prof. Quinn shows how an adaptive pollution policy, which determines pollution discharges rates conditional upon the current lake pollution level, can result in greater system performance and robustness. However, for the sake of simplicity in this demonstration I am modeling an intertemporal pollution policy which consists of a set of annual pollution discharge quantities (release_decision in the following code). I selected one pollution policy from a set of pareto-approximate policies generated previously. To learn more about how optimal pollution discharge policies can be generated for this model, see the blog post here.

Since natural nutrient pollution inflow into the lake is stochastic, it is best to evaluate a single policy for many different realizations of natural inflow. The objective performance can then be measured as the summary (e.g., mean, sum, or max) of the performance across the different realizations.

Below, I am going to show two different ways of implementing multi-realization simulations corresponding to the two approaches described in the Introduction.

I’ve separated the common and and unique features of the two different implementations, to highlight the differences. Both models begin by defining a set of parametric constants, initializing variables for later use, and by calculating the ecological tipping point (critical pollution threshold):

def lake_model(release_decision):

    # Choose the number of stochastic realizations
    n_realizations = 100

    #Initialize conditions
    n_years = 100
    n_objs = 4
    n_time = 100

    # Constants
    reliab_threshold = 0.85
    inertia_threshold = 0.02
    b = 0.42
    q = 2.0
    delta = 0.98
    X0 = 0
    alpha = 0.4
    mu = 0.5
    sigma = np.sqrt(10**-0.5)

    # Initialize variables
    years_inertia_met = np.zeros(n_realizations)
    years_reliability_met = np.zeros(n_realizations)
    expected_benefits = np.zeros(n_realizations)
    average_annual_concentration = np.zeros(n_realizations)
    objs = [0.0]*n_objs

    # Identify the critical pollution threshold
    #Function defining x-critical. X-critical exists for flux = 0.
    def flux(x):
            f = pow(x,q)/(1+pow(x,q)) - b*x
            return f

    # solve for the critical threshold with unique q and b
    Xcrit = sp.bisect(flux, 0.01, 1.0)

Serial simulation

For the serial model evaluation, lake quality over a 100-year period is evaluated for one realization at a time. A single realization of stochastic nutrient pollution inflow, Y_t is generated and the policy is evaluated under these conditions. This is repeated 100 times for 100 unique realizations of natural inflow.

def serial_lake_model(release_decision):


    # Begin evaluation; evaluate one realization at a time
    for s in range(n_realizations):

        ### Generate a single inflow_realization
        inflow_realizations = np.random.lognormal(mean = mu, sigma = sigma, size = n_time)

        # Initialize a new matrix of lake-state variables
        lake_state = np.zeros(n_years)

        # Set the initial lake pollution level
        lake_state[0] = X0

        for t in range(n_years-1):

            lake_state[t+1] = lake_state[t] + release_decision[t] + inflow_realizations[t] + (lake_state[t]**q)/(1 + lake_state[t]**q)

            expected_benefits[s] += alpha * release_decision[t] * delta**(t+1)

            # Check if policy meets inertia threshold
            if t>= 1:
                years_inertia_met[s] += (abs(release_decision[t-1] - release_decision[t]) < inertia_threshold) * 1

            # Check if policy meets reliability threshold
            years_reliability_met[s] += (lake_state[t] < Xcrit) * 1

        average_annual_concentration[s] = np.mean(lake_state)

    # Calculate objective scores
    objs[0] = -np.mean(expected_benefits)
    objs[1] = max(average_annual_concentration)
    objs[2] = -np.sum(years_inertia_met / (n_years - 1))/n_realizations
    objs[3] = -np.sum(years_reliability_met)/(n_realizations * n_years)

    return objs

This is the way I have written this program in the past, and the way it has been presented previously in this blog.

Block Simulation

I’ve now re-written the model using the block simulation method, as shown in Figure 1. Rather than taking a single timeseries of natural inflow at a time, I produce a matrix of 100-realizations and perform calculations one-timestep (or column) at a time.

I recommend you pause to compare the above and below code segments, to make sure that you have a solid understanding of the differences in implementation before looking at the results.

def matrix_lake_model(release_decision):

    ### Create a matrix contaiining all inflow realizations
    inflow_realizations = np.zeros((n_realizations, n_time))
    for s in range(n_realizations):
        inflow_realizations[s, :] = np.random.lognormal(mean = mu, sigma = sigma, size = n_time)

    # Begin evaluation; evaluate all realizations one time step at a time
    for t in range(n_years-1):

        lake_state[:,t+1] = lake_state[:,t] + release_decision[t] + inflow_realizations[:,t] + (lake_state[:,t]**q)/(1 + lake_state[:,t]**q)

        expected_benefits[:] += alpha * release_decision[t] * delta**(t+1)

        # Check if policy meets inertia threshold
        if t>= 1:
            years_inertia_met += (abs(release_decision[t-1] - release_decision[t]) < inertia_threshold) * 1

        # Check if policy meets reliability threshold
        years_reliability_met += (lake_state[:,t] < Xcrit) * 1

    # Calculate objective scores
    objs[0] = -np.mean(expected_benefits)
    objs[1] = max(np.mean(lake_state, axis = 1))
    objs[2] = -np.sum(years_inertia_met / (n_years - 1))/n_realizations
    objs[3] = -np.sum(years_reliability_met)/(n_realizations * n_years)

    return objs

Running the same pollution policy in both of the above models, I first made sure that the output objective scores were identical. Having verified that, I set up some simple timing tests.

Comparison of evaluation time efficiency

I used the built-in time module in Python to perform runtime tests to compare the two implementation methods. The time module contains the function perf_counter() which makes a precise recording of the time at which it is called.

Calling perf_counter() before and after model evaluation provides a simple way of measuring the evaluation time. An example of this time measurement is shown below:

import time
import lake_model

start = time.perf_counter()
model_output = lake_model(release_decision, n_realizations)
end = time.perf_counter()
model_runtime = end - start

When performing an optimization of the pollution release policy using the Borg multi-objective evolutionary algorithm (MOEA), it is standard for our group to simulate each policy for 100-realizations of hydrologic stochasticity. The objective scores are calculated as some aggregation of the performance across those realizations.

With this in mind, I first measured the difference in runtime between the two simulation methods for 100-realizations of stochasticity.

I found the model using the block implementation, evaluating all realizations one time-step at a time, was 0.048 seconds faster then evaluating the 100-realizations in series. That may seem marginal at first, but how does this difference in speed scale when performing many thousands of model evaluations throughout an optimization?

Previously, when optimizing the release decisions for the lake problem, I allowed the Borg MOEA to run for a maximum of 50,000 model evaluations per seed, and repeated this process for 20 random seeds… at a savings of 0.048 seconds per evaluation, I would have saved 13.3 hours of computation time if I had used the block implementation technique!

This difference grows exponentially with the number of realizations performed in each evaluation. In the figure below, I am showing runtime of the block implementation relative to the serial implementation (I.e., \frac{T_{simul}}{T_{serial}}) for an increasingly large number of realizations:

Figure: Runtime of the matrix implementation technique relative to the serial technique, for a range of realization numbers.


In a stochastic world, evaluation of water resource policies under a broad ensemble of scenarios allows for the selection of more policies that are more robust under uncertain factors. Writing code that is computationally burdensome can limit the size of the ensemble, and ultimately limit the analysis.

When working with high performance computing resources, researchers are often limited in allotted compute-hours, or the time spent processing a program on the machine. Given this context, it is highly valuable to identify time-saving measures such as the block implementation presented here.

Ultimately, this experimentation was pleasantly humbling. I hope that this serves as motivation to approach one of your existing models with a fresh perspective. At the least, I hope to keep you from making the same costly mistake as me.

I want to shout out Thomlinson, Arnott, and Harou again for writing the paper that sparked this re-assessment, check out their water resource simulator for solving resource allocation problems, Pywr, here.

Taylor Diagram

The evaluation of model performance is a widely discussed and yet extremely controversial topic in hydrology, atmospheric science, and environmental studies. Generally, it is not super straightforward to quantify the accuracy of tools that simulate complex systems. One of the reasons is that these systems usually have various sub-systems. For example, a complex river system will simulate regulated streamflow, water allocation, dam operations, etc. This can imply that there are multiple goodness-of-fit objectives that cannot be fully optimized simultaneously. In this situation, there will always be tradeoffs in the accuracy of the system representation.

The other reason is that these complex systems are often state-dependent, and their behavior non-linearly changes as the system evolves. Finally, there are variables in the system that have several properties, and it is usually impossible to improve all of their properties at the same time (Gupta et al., 1998). Take streamflow as an example. In calculating model accuracy, we can focus on the daily fluctuation of streamflow or look into the changes weekly, monthly, and annually. We can also concentrate on the seasonality of streamflow, extreme low and high cases, and the persistence of these extreme events. Therefore, our results are not fully defensible if we only focus on one performance metric and make inferences that ignore many other components of the system. In this blog post, I am going to explain a visualization technique that allows us to draw three or more metrics of the system on the same plot. The plot is called a Taylor Diagram. The Taylor Diagram is not really a solution to the problem that I explained, but it does provide a systematic and mathematically sound way of demonstrating a few popular goodness-of-fit measures. The original version of the Taylor Diagram includes three performance metrics: i) standard deviation of simulated vs. observed; ii) correlation coefficient between observed and simulated; and iii) centered root mean square error (CRMSE). However, there are other versions of the Taylor Diagram that include more than these three metrics (e.g., here). The Taylor Diagram was developed by Karl E. Taylor (original paper) and has been frequently used by meteorologists and atmospheric scientists. In this blog post, I focus on streamflow.

Underlying Mathematical Relationships

As mentioned above, the Taylor Diagram draws the following three statistical metrics on a single plot: standard deviation, CRMSE, and correlation. The equation below relates these three:

Equation – 1

In this equation:

This relationship can be easily derived using the definition of CRMSE, which is:

Equation – 2

In this equation:

We can expand this equation to the following:

Equation – 3

The first two components of the equation are standard deviations of observed and simulated time series. To understand the third component of the equation, we need to recall the mathematical definition of correlation.

Equation – 4

If we multiply both sides of the above equation by standard deviation of observed and simulated, we see that the third components of the equations 3 and equation 4 are actually the same. The Taylor Diagram uses polar coordinates to visualize all of these components. Readers can refer to the original paper to find more discussion and the trigonometric proofs behind this form of plot.

Code Example

In this blog post, I am presenting a simple R package that can be used to create Taylor Diagrams. In this example, I use the same streamflow datasets that I explained in the previous blog post. First, you need to download the time series from GitHub and use the following commands to read these two time series into R.

Observed_Arrow<-read.table("~/Taylor Diagram/Arrow_observed.txt", header = T)
Simulated_Arrow<-read.table("~/Taylor Diagram/Arrow_simulated.txt", header = T)

In this post, I use the “openair” R library, which was originally developed for atmospheric chemistry data analysis. In addition to the Taylor Diagram, the package provides many helpful visualization options that can be accessed from here. Note that I was able to find at least one more R package that can be used to create Taylor Diagrams (i.e., plotrix ). There are also Python and MATLAB libraries that can be used to create Taylor Diagrams.

You can use the following to install the “openair” package and activate the library.


The following command can be used to create a Taylor Diagram.

combined_dataset<-cbind(Observed_Arrow[ 18263:length(Observed_Arrow[,1]),], Arrow_sim=Simulated_Arrow[1:10592,4])

TaylorDiagram(combined_dataset, obs = "ARROW_obs", mod = "Arrow_sim")

Interpretation of the Taylor Diagram

The Taylor Diagram indicates the baseline observed point where correlation is 1 and CRMSE is 0 (purple color). If the simulation point is close to the observed point, it means that they are similar in terms of standard deviation, their correlation is high, and their CRMSE is close to zero. There is also a black dashed line that represents the standard deviation of the observed time series. If the red dot is above the line, it means that the simulated data set has a higher variation. The other helpful information that we gain from the Taylor Diagram is the correlation between observed and simulated values. Higher correlation shows a higher level of agreement between observed and simulated data. The correlation goes down as a point moves toward higher sectors in the graph. Centered root mean square error also shows the quality of the simulation process; however, it puts more weight on outliers. The golden contour lines on the polar plot show the value of CRMSE. Basically, on this figure, the red point has a higher standard deviation, the correlation is around 90%, and the CRMSE is close to 20,000.

The package also allows us to look at the performance of the model during different months or years, and we can compare different simulation scenarios with the observed data. Here, I use the function to draw a point for each month. The “group” argument can be used to do that.

TaylorDiagram(combined_dataset, obs = "ARROW_obs", mod = "Arrow_sim", group = "Month")

The function also provides a simple way to break down the data into different plots for other attributes. For example, I created four panels for four different annual periods:

TaylorDiagram(combined_dataset, obs = "ARROW_obs", mod = "Arrow_sim", group = "Month", type = "Year")