Exploring droughts produced by an ensemble of synthetically generated streamflow records

If you’re a reader of this blog, you are probably well aware that historical observations of streamflow and rainfall represent single traces of inherently stochastic processes. Simply relying on the historical record when evaluating decisions in water resources problems may cause us to underestimate uncertainty and overestimate the reliability of our system. Synthetic generation of streamflow and weather data can help us overcome this challenge. Synthetic generation is a family of methods that allow us to explore conditions not observed in the historical record while preserving fundamental statistical properties of the recorded observations (for a great history of synthetic generation, see Jon’s post). Before using synthetically generated records, we first need to perform a diagnostic evaluation to ensure that the synthetic records preserve properties such as the moments of the historical record as well as the spatial and temporal correlation structures (for some great tutorials on this process, see Julie and Lillian’s posts). Another important aspect of understanding the synthetically generated records is the extreme events they produce – a topic that this blog has not explored in detail. Evaluating extreme events helps us understand the internal variability present in the stochastic system and contextualize the ensemble of synthetic records with events in the historical record (such as large droughts and floods).

In this post, we’ll explore how a small ensemble of synthetically generated streamflows in the Upper Colorado River basin produces decadal drought events. We’ll start with a definition of decadal drought and then explore the frequency, magnitude, and distribution of drought events across our ensemble of synthetically generated streamflows. The streamflow records used in this post were generated using a Gaussian Hidden Markov Model (HMM) streamflow generator. These records have previously been diagnostically evaluated to confirm that they reproduce the statistical properties of the historical record. For details on HMM generation and diagnostic evaluation, see Julie’s posts on introducing HMM generators and providing example code.

Defining decadal drought

We’ll begin by defining a measure of decadal drought introduced by Ault et al., (2014). This measure defines a decadal drought as a period when the 11-year rolling mean of streamflows drops below a drought threshold. The threshold is defined as half a standard deviation below the mean of the full historical record. Note this is one of many definitions of drought. For a detailed discussion of drought metrics, see Rohini’s recent post.

\mu_{11} < \mu- 0.5\sigma

Periods of decadal drought in the historical record are highlighted with shading in the figure below.

This image has an empty alt attribute; its file name is cm_historical_ts.png

I’ve written a simple Python function to find these droughts in a streamflow record, count the number of droughts, and calculate the magnitude of drought events (more on this below).

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statistics
import matplotlib.cm as cm
from mpl_toolkits.axes_grid1 import make_axes_locatable

def drought_statistics(file_name, hist, drought_def):
    '''
    Identifies drought periods in a streamflow record. Returns drought instances, list of drought years, and severity

    :param file_name: path to the annual Q file (string)
    :param hist: historical record toggle (bool)
    :param drought_def: size of moving window (float)

    :return:
        - drought_instances (list) - all drought periods in the record
        - final_drought_range (list) ranges of years of decadal droughts in the record
        - total_severity - sum of the difference between the mean & annual flows during a drought normalized by std
    '''

    # Read historical or synthetic record
    if hist:
        AnnualQ_s = pd.read_csv(file_name, sep=',')
    else:
        AnnualQ_s = pd.read_csv(file_name, sep=' ', header=None)
        AnnualQ_s.columns = ['cm', 'gm', 'ym', 'wm', 'sj']
    AnnualQ_s['Year'] = list(range(1909, 2014))

    # find the drought threshold
    std = statistics.stdev(AnnualQ_s.iloc[:, 0])
    threshold = np.mean(AnnualQ_s.iloc[:, 0]) - (0.5 * std)

    drought_instances = [i for i, v in enumerate(AnnualQ_s.iloc[:, 0].rolling(drought_def).mean()) if v < threshold]
    drought_years = AnnualQ_s.iloc[:, 5].rolling(drought_def).mean()[drought_instances]

    # find years of drought in each record
    drought_range = [range(0, 1)]
    j = 0
    for i in range(len(drought_years)):
        already_in = 0
        for j in range(len(drought_range)):
            if drought_years.iloc[i] in drought_range[j]:
                already_in = 1
        if already_in == 0:
            start_year = drought_years.iloc[i] - 5
            end_year = drought_years.iloc[i] + 5
            drought_range.append(range(int(start_year), int(end_year)))
        elif already_in == 1:
            if drought_years.iloc[i] - 5 < start_year:
                start_year = drought_years.iloc[i] - 5
            if drought_years.iloc[i] + 5 > end_year:
                end_year = drought_years.iloc[i] + 5
            drought_range.append(range(int(start_year), int(end_year)))

    # combine overlapping periods
    final_drought_range = []
    for i in range(1, len(drought_range)):
        if final_drought_range:
            if min(drought_range[i]) < max(final_drought_range[-1]):
                final_drought_range[-1] = range(min(final_drought_range[-1]), max(drought_range[i]) + 1)
            elif min(drought_range[i]) < min(drought_range[i - 1]):
                final_drought_range.append(range(min(drought_range[i - 1]), int(max(drought_range[i]) + 1)))
            else:
                final_drought_range.append(drought_range[i])
        else:
            if min(drought_range[i]) < min(drought_range[i - 1]):
                final_drought_range.append(range(min(drought_range[i - 1]), max(drought_range[i]) + 1))
            else:
                final_drought_range.append(drought_range[i])

    # calculate the severity of droughts
    total_severity = []
    if drought_instances:
        for i in range(len(final_drought_range)):
            cur_severity = 0
            for j in range(105):
                if AnnualQ_s.iloc[j, 5] in final_drought_range[i]:
                    cur_severity += (AnnualQ_s.iloc[:, 0].mean() - AnnualQ_s.iloc[j, 0])/std
            total_severity.append(cur_severity)

    return drought_instances, final_drought_range, total_severity

We can run this function for our small ensemble of 100 records generated (which can be found here) using the code below:

drought_def = 11 # size of moving window

# synthetic realizations
num_droughts = np.zeros(100)
severity_curve = []
all_drought_years =[]
max_severity = np.zeros(100)

for r in range(0,100):
    file_name = 'data/AnnualQ_s' + str(r) + '.txt' # edit here
    drought_instances, drought_years, drought_severity = drought_statistics(file_name, False, drought_def)

    if drought_instances:
        all_drought_years.append(drought_years)
        severity_curve.append(drought_severity)
        max_severity[r] = max(drought_severity)

# Historical (edit file name)
file_name = 'data/historical.csv'
drought_instances_hist, drought_years_hist, drought_severity_hist = drought_statistics(file_name, True, drought_def)

drought_ranks = np.zeros([len(severity_curve), 5])
for i in range(len(severity_curve)):
    sorted_curve = np.sort(severity_curve[i])[::-1]
    for j in range(len(sorted_curve)):
        drought_ranks[i, j] = sorted_curve[j]


drought_ranks_hist = np.zeros(5)
sorted_ranks_hist = np.sort(drought_severity_hist)[::-1]
for i in range(len(drought_severity_hist)):
        drought_ranks_hist[i] = sorted_ranks_hist[i]
max_ranks = np.max(drought_ranks, axis=0)
min_ranks = np.min(drought_ranks, axis=0)

# pad ranks with zeros for realizations without droughts
no_drought_traces = 99-len(severity_curve)
print('num no drought rels = ' + str(no_drought_traces))
for nr in range(no_drought_traces):
    drought_ranks = np.vstack((drought_ranks, np.zeros(5)))

Examining Drought Frequency

The first diagnostic we’ll perform is to examine the frequency of drought years in our ensemble of synthetic records (shaded years in the figure above) against the number of drought years in the historical record.

def count_drought_years(droughts):
    total_drought_years = 0
    for i in range(len(droughts)):
        total_drought_years += len(droughts[i])

    return total_drought_years

drought_lengths = np.zeros(100)
for d in range(len(all_drought_years)):
    drought_lengths[d] = count_drought_years(all_drought_years[d])

hist_drought_years = count_drought_years(drought_years_hist)


fig, ax = plt.subplots()
ax.bar(np.arange(len(num_droughts)), np.sort(drought_lengths))
ax.plot(np.arange(100), np.ones(100) * hist_drought_years, color='k', linestyle='--')
ax.set_xlim([0, 100])
ax.set_ylim([0, 50])
ax.set_xlabel('Rank Across Ensemble')
ax.set_ylabel('Number of Drought Years')
plt.savefig('DroughtFrequency.png')

This code produces the figure shown below. Each bar represents the number of drought years in a single synthetic realization (ranked from lowest to highest), and the dashed line represents the number of drought years in the historical record. We observe that the number of drought years in the historical record sits right in the middle of the range of synthetic realizations. We can see that the maximum number of drought years is twice as high as observed in the historical record – indicating that some realizations generate much more challenging drought scenarios than the observed record. We also see that some realizations have zero years of drought – indicating they are much milder than the observed record.

This image has an empty alt attribute; its file name is droughtfrequency.png

Evaluating drought magnitude

Another way we can evaluate droughts generated by our synthetic records is to examine the maximum magnitude of drought events produced by each record. To do this, we need a measure of drought magnitude. Here, we’ll define drought magnitude as the difference between the historical mean flow and the flow during drought periods normalized by the standard deviation. We can express this mathematically with a simple integral:

\int_{start}^{end} \frac{\mu-Q(t)}{\sigma} \, dt

We can visualize this metric in the figure below. Drought magnitude is the sum of the red regions for each drought.

Below, we’ll plot drought magnitude using a horizontal bar chart to make it look distinct from the frequency plot. The maximum drought magnitude in the historical record will be plotted as a black dashed line.

fig, ax = plt.subplots()
ax.barh(np.arange(len(num_droughts)), np.sort(max_severity))
ax.plot(np.ones(100)*max(drought_severity_hist), np.arange(100), color='k', linestyle='--')
ax.set_ylim([0,100])
ax.set_xlabel('Drought Magnitude')
ax.set_ylabel('Rank Across Ensemble')
plt.savefig('DroughtMagnitude.png')
This image has an empty alt attribute; its file name is droughtmagnitude.png

In the plot above, we observe that the magnitude of droughts in our synthetic records can get much more severe than the worst drought observed in the historical period. This finding aligns with studies on the Upper Colorado River basin using paleo data (Ault et al., 2014; Woodhouse et al., 2014) and underscores that relying solely on the historical record may underestimate drought risk for the basin.

Pulling it all together

To end our exploration, we’ll generate a final plot that simultaneously visualizes the frequency and magnitude of drought events in the synthetic realizations and the historical record. This plot will show all droughts in a given record, ranked according to their magnitude. To visualize the full ensemble of synthetic records, we’ll plot the percentiles of magnitude across the ensemble of 100 records.

fig, ax = plt.subplots()

# Calculate the percentiles of drought magnitude across the ensemble
ps = np.arange(0, 1.01, 0.01) * 100
for j in range(1, len(ps)):
    u = np.percentile(drought_ranks, ps[j], axis=0)
    l = np.percentile(drought_ranks, ps[j - 1], axis=0)
    # plot the percentiles
    ax.fill_between(np.arange(0, 1.25, .25), l, u, color=cm.OrRd(ps[j - 1] / 100.0), alpha=0.75,
                    edgecolor='none')
dummy = ax.scatter([100,200], [100,200], c=[0,1], cmap='OrRd') # dummy for the colorbar

# plot the historical droughts for context
for i in range(4):
    if drought_ranks_hist[i] > 0:
        ax.scatter(np.arange(0, 1.25, .25)[i], drought_ranks_hist[i], color='k', s=50, zorder=3)

# make a color bar
divider = make_axes_locatable(ax)
cax = divider.append_axes('right', size='5%', pad=0.05)
fig.colorbar(dummy, cmap='OrRd', cax=cax, orientation='vertical', label='Ensemble Exceedance')

# format the plot
ax.set_xlim([0, 1])
ax.set_xticks([0,.25, .5, .75, 1])
ax.set_ylim([0, 16])
ax.set_title('Drought In Synthetic Records', fontsize=16)
ax.set_ylabel('Drought Magnitude', fontsize=14)
ax.set_xlabel('Exceedance', fontsize=14)
plt.savefig('FreqSeverity.png')
This image has an empty alt attribute; its file name is freqseverity-1.png

The color in the figure above represents the percentiles of the synthetic records, and the two black points represent the droughts observed in the historical record. The vertical axis represents the magnitude of drought events, and the horizontal axis represents the fraction of droughts in a record that exceeds a given event. We can see that the maximum drought magnitude generated by the synthetic ensemble is much greater than the most severe drought of record (as observed in our magnitude figure). We can also see that the second most severe historical drought is right in the middle of the distribution of the second-highest droughts produced by the synthetic ensemble (.25 on the horizontal axis). Finally, we observe that while most realizations produce two decadal drought events, a small fraction contain an additional drought event.

Final thoughts

This analysis complements traditional diagnostic evaluation of synthetic streamflow records by revealing the extent of extreme events generated by our synthetic generator and contextualizing synthetic events in the historical record. Historical contextualization can help us communicate our experimental results to decision-makers by making an analysis that uses synthetic records less abstract. This analysis also helps us understand the internal variability of the stochastic process, which can provide a helpful baseline if we’re investigating non-stationarity.

References

Ault, T. R., Cole, J. E., Overpeck, J. T., Pederson, G. T., & Meko, D. M. (2014). Assessing the risk of persistent drought using climate model simulations and paleoclimate data. Journal of Climate, 27(20), 7529-7549.

Woodhouse, C. A., Gray, S. T., & Meko, D. M. (2006). Updated streamflow reconstructions for the Upper Colorado River basin. Water Resources Research, 42(5).

Multiobjective design of a cantilever beam – an interactive pedagogical example

I’ve been developing some course material on multiobjective optimization, and I thought I’d share an introductory exercise I’ve written as a blog post. The purpose of the exercise is to introduce students to multi-objective optimization, starting with basic terms and moving toward the central concept of Pareto dominance (which I’ll define below). The exercise explores trade-offs across two conflicting objectives in the design of a cantilever beam. This example is drawn from the textbook “Multi-Objective Optimization using Evolutionary Algorithms” by Kalyanmoy Deb (2001), though all code is original. This exercise is intended to be a teaching tool to complement introductory lectures on multi-objective optimization and Pareto dominance.

A simple model of a cantilever beam

In this exercise, we’ll first develop a simple model to evaluate a cantilever beam’s deflection, weight, and stress supporting a constant force (as specified by Deb, 2001). We’ll then enumerate possible design alternatives and visualize feasible designs and their performance. Next, we’ll introduce Pareto Dominance and examine trade-offs between conflicting system objectives. Finally, we’ll use the NSGA-II Multiobjective Evolutionary Algorithm (MOEA) to discover a Pareto-approximate set of designs and visualize design alternatives.

Let’s start by considering the cantilever beam shown below.

Our task is to “design” the beam by specifying its length (l) and diameter (d). Assuming the beam will be subjected to a constant force, P, we are interested in minimizing two conflicting performance objectives, the weight and the end deflection. We can model our beam using the following system of equations:

Following Deb (2001), we’ll use the parameter values shown below:

10mm ≤ d ≤ 50 mm; 200 mm ≤ l ≤ 1000 mm; ρ = 7800 kg/m3; P = 1 KN; E = 207 GPa

We would also like to ensure that our beam meets two feasibility constraints:

Modeling the beam in Python

As multiobjective design problems go, this is a fairly simple problem to solve. We can enumerate all possible design options and explore trade-offs between different combinations of length and diameter. Below I’ve coded a Python function to evaluate a beam with parameters l and d.

# create a model of the beam
def cantileverBeam(l, d):
    """
    Models the weight, deflection and stress on a cantilever beam subjected to a force of P=1 kN
    :param l:           length of the beam (mm)
    :param d:           diameter of the beam (mm)
    :return:
        - weight: the weight of the beam
        - deflection: the deflection of the beam
        - stress: the stress on the beam
    """

    weight = (7800.0 * (3.14159 * d**2)/4 * l)/1000000000 # (division by 1000000000 is unit conversion)
    deflection = (64.0*P * l**3)/(3.0*207.0*3.14159*d**4)
    stress = (32.0 * P * l)/(3.14159*d**3)

    return weight, deflection, stress

With this function, we can explore alternative beam designs. We’ll first create a set of samples of l and d, then simulate them through our beam model. Using our results, we’ll visualize the search space, which includes all possible design alternatives for l and d, enclosed 10≤ d ≤ 50 and 200 ≤ l ≤ 1000. Inside the search space, we can also delineate the feasible decision space composed of all combinations of l and d that meet our beam performance constraints. Every feasible solution can be mapped to the feasible objective space, which contains each sized beam’s corresponding weight and deflection. In the code below, we’ll simulate 1681 different possible beam designs and visualize the feasible decision space and the corresponding objective space.

import numpy as np
from matplotlib import pyplot as plt

# initialize arrays to store our design parameters
l = np.arange(200, 1020, 20)
d = np.arange(10,51,1)
P = 1.0

# initialize arrays to store model output
weight = np.zeros([41,41])
deflection = np.zeros([41,41])
constraints = np.zeros([41,41])
stress = np.zeros([41,41])

# evaluate the system
for i in range(41):
    for j in range(41):
        weight[i, j], deflection[i,j], stress[i,j] = cantileverBeam(l[i], d[j])
        if stress[i, j] <= 300 and deflection[i, j] <= 5:
            constraints[i,j] =1

# create a figure with two subplots
fig, axes = plt.subplots(1, 2, figsize=(12,4))

# plot the decision space
for i in range(41):
    for j in range(41):
        axes[0].scatter(i,j, c=constraints[i,j], cmap='Blues', vmin=0, vmax=1, alpha=.6)
axes[0].set_xticks(np.arange(0,45,5))
axes[0].set_yticks(np.arange(0,45,5))
axes[0].set_yticklabels(np.arange(10,55,5))
axes[0].set_xticklabels(np.arange(200,1100,100))
axes[0].set_xlabel('Length (mm)')
axes[0].set_ylabel('Diameter (mm)')
axes[0].set_title('Decision Space')

# plot the objective space
axes[1].scatter(weight.flatten(), deflection.flatten(), edgecolor='white')
axes[1].set_ylim([0,5])
axes[1].set_xlabel('Weight (kg)')
axes[1].set_ylabel('Deflection (mm)')
axes[1].set_title('Objective Space')

Examining the figures above, we observe that only around 40% of our evaluated beam designs are feasible (they meet the constraints of maximum stress and deflection). We can see a clear relationship between length and diameter – long narrow beams are unlikely to meet our constraints. When we examine the objective space of feasible beam designs, we observe trade-offs between our two objectives, weight and deflection. Examine the objective space. Is there a clear “best” beam that you would select? Are there any alternatives that you would definitely not select?

Introducing Pareto Dominance

Without a clear articulation of our preferences between weight and deflection, we cannot choose a single “best” design. We can, however, definitively rule out many of the beam designs using the concept of Pareto dominance. We’ll start by picking and comparing the objectives of any two designs from our set of feasible alternatives. For some pairs, we observe that one design is better than the other in both weight and deflection. In this case, the design that is preferable in both objectives is said to dominate the other design. For other pairs of designs, we observe that one design is preferable in weight while the other is preferable in deflection. In this case, both designs are non-dominated, meaning that the alternative is not superior in both objectives. If we expand from pairwise comparisons to evaluating the entire set of feasible designs simultaneously, we can discover a non-dominated set of designs. We call this set of designs the Pareto-optimal set. In our beam design problem, this is the set of designs that we should focus our attention on, as all other designs are dominated by at least one member of this set. The curve made by plotting the Pareto-optimal set is called the Pareto front. The Pareto front for the problem is plotted in orange in the figure below.

Multiobjective optimization

As mentioned above, the cantilever beam problem is trivial because we can easily enumerate all possible solutions and manually find the Pareto set. However, we are not so lucky in most environmental and water resources applications. To solve more challenging problems, we need new tools. Fortunately for us, these tools exist and have been rapidly improving in recent decades (Reed et al., 2013). One family of multiobjective optimization techniques that have been shown to perform well on complex problems is MOEAs (Coello Coello, 2007). MOEAs use search operators inspired by natural evolution, such as recombination and mutation, to “evolve” a population of solutions and generate approximations of the Pareto set. In the sections below, we’ll employ a widely used MOEA, NSGA-II (Deb et al., 2002), to find Pareto approximate designs for our simple beam problem. We’ll use the Platypus Python library to implement our evolutionary algorithm and explore results. Platypus is a flexible and easy-to-use library that contains 20 different evolutionary algorithms. For more background on MOEAs and Platypus, see the examples in the library’s online documentation.

Formulating the optimization problem

Formally, we can specify our multiobjective problem as follows:

Multiobjective optimization in Python with Platypus

Below we’ll define a function to use in Platypus based on our problem formulation. To work with Platypus, this function must accept a vector of decision variables (called vars) and return a vector of objectives and constraints.

# Define a function to use with NSGA-II
def evaluate_beam(vars):
    """
    A version of the cantilever beam model to optimize with an MOEA in Platypus

    :param vars:        a vector with decision variables element [0] is length, [1] is diameter
    :return:
        - a vector of objectives [weight, deflection]
        - a vector of constraints, [stress, deflection]

    """
    l = vars[0]
    d = vars[1]

    weight = (7800.0 * (3.14159 * d ** 2) / 4 * l)/1000000000
    deflection = (64.0*1.0 * l**3)/(3.0*207.0*3.14159*d**4)
    stress = (32.0 * 1.0 * l)/(3.14159*d**3)

    return [weight, deflection], [stress - 300, deflection - 5]

Next, we’ll instantiate a problem class in Platypus. We’ll first create a class with 2 decision variables, 2 objectives, and 2 constraints, then specify the allowable range of each decision variable, the type of constraints, and s function the algorithm will evaluate. We’ll then select NSGAII as our algorithm and run the optimization for 1000 generations.

# set up the Platypus problem (2 decision variables, 2 objectives, 2 constraints)
problem = Problem(2, 2, 2)

# specify the decision variable ranges (200 mm <= length <= 1000 mm) and (10 mm <= diameter <= 50 mm)
problem.types[:] = [Real(200, 1000), Real(10, 50)]

# specify the type of constraint
problem.constraints[:] = "<=0"

# tell Platypus what function to optimize
problem.function = evaluate_beam

# set NSGA II to optimize the problem
algorithm = NSGAII(problem)

# run the optimization for 1000 generations
algorithm.run(2000)

Plotting our approximation of the Pareto front allows us to examine trade-offs between the two conflicting objectives.

# plot the Pareto approximate set
fig = plt.figure()
plt.scatter([s.objectives[0] for s in algorithm.result],
            [s.objectives[1] for s in algorithm.result])
plt.xlabel('Weight (kg)')
plt.ylabel('Deflection (mm)')
plt.xlim([0,5])
plt.ylim([0,5])

NSGA-II has found a set of non-dominated designs that show a strong trade-off between beam weight and deflection. But how does our approximation compare to the Pareto front we discovered above? A good approximation of the Pareto front should have two properties 1) convergence, meaning the approximation is close to the true Pareto front, and 2) diversity, meaning the solutions cover as much of the true Pareto front as possible. We can plot the designs discovered by NSGA-II in the space of the enumerated designs to reveal that the algorithm is able to do a great job approximating the true Pareto front. The designs it discovered are right on top of the true Pareto front and maintain good coverage of the range of trade-offs.

In real-world design problems, the true Pareto front is usually not known, so it’s important to evaluate an MOEA run using runtime diagnostics to assess convergence. Here we can visualize the algorithm’s progress over successive generations to see it converge to a good approximation of the Pareto front. Notice how the algorithm first evolves solutions that maintain good convergence, then adds to these solutions to find a diverse representation of the Pareto front. In more complex or higher-dimensional contexts, metrics of runtime performance are required to assess MOEA performance. For background on runtime diagnostics and associated performance metrics, see this post.

Final thoughts

In this exercise, we’ve explored the basic multiobjective concepts through the design of a cantilever beam. We introduced the concepts of the search space, feasible decision space, objective space, and Pareto dominance. We also solved the multiobjective optimization problem using the NSGA-II MOEA. While the cantilever beam model used here is simple, our basic design approach can be adapted to much more complex and challenging problems.

References

  • Coello, C. C. (2006). Evolutionary multi-objective optimization: a historical view of the field. IEEE computational intelligence magazine, 1(1), 28-36.
  • Deb, K. (2011). Multi-objective optimisation using evolutionary algorithms: an introduction. In Multi-objective evolutionary optimisation for product design and manufacturing (pp. 3-34). London: Springer London.
  • Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. A. M. T. (2002). A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE transactions on evolutionary computation, 6(2), 182-197.
  • Reed, P. M., Hadka, D., Herman, J. D., Kasprzyk, J. R., & Kollat, J. B. (2013). Evolutionary multiobjective optimization in water resources: The past, present, and future. Advances in water resources, 51, 438-456.

Building the Reed Group Figure Library – Part 1 Visualizations Supporting MOEA Diagnostics

This summer, we are constructing a Reed Group Lab manual (release forthcoming) as a guide for new graduate students and a resource about our tools for folks outside the group. The manual will complement this blog by organizing information and providing training syllabi for our tools and methodologies. One element of the lab manual that I’m excited about is a Figure Library, a set gallery of starter code for common visualizations we use in the group. In this post, I’ll present the first set of visualizations that will be included in our Figure Libary. These visualizations support MOEA diagnostic experiments. For some background on MOEA diagnostics, see this set of posts.

The section below is a reproduction of the Jupyter Notebook that will be included in our Figure Libary.

MOEA Diagnostics

This page contains Python code to generate basic visualizations for the diagnostics evaluation of multiobjective evolutionary algorithms (MOEAs). MOEA diagnostics are a systematic process of evaluating algorithmic performance. For background on MOEA diagnostics and an introductory tutorial, see this post. For MOEA diagnostics on water resources applications, see Reed et al., (2013), Zatarain et al., (2016), Zatarain et al., (2017) and Gupta et al., (2020).

# first load the relevant libraries
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from matplotlib.cm import ScalarMappable
import matplotlib
from scipy.ndimage import gaussian_filter

Attainment Plots

MOEAs are stochastic tools that are initialize with random seeds (starting solutions) that may affect the quality of search results. To account for the inherent uncertainty in search we run multiple random seeds of an algorithm. Attainment plots are a helpful way of visualizing an algorithm’s performance across random seeds. Attainment plots show the distribution of a MOEA performance metric across seeds and highlight the maximum performance achieved by an algorithm.

In the code below, four hypothetical algorithms are compared after running 20 random seeds each. The color of each bar represents the percentage of random seeds that achieve a given hypervolume, and the white points represent the best hypervolume achieved by each algorithm.

def attainment_plot(performance_file, metric):
    '''
    Creates an attainment plot comparing the performance of multiple algorithms across random seeds

    :param performance_file:            the path to a file containing performance metrics from diagnostic evaluation.
                                        the first row is assumed to be titles describing each algorithm

    :param metric:                      the name of the metric being plotted
    '''

    # load data
    performance_data = np.loadtxt(performance_file, delimiter=',', skiprows=1)

    # create a figure and get the axis object
    fig, ax = plt.subplots()

    # add a row of zeros
    performance_data = np.vstack((performance_data, np.ones(len(performance_data[0,:]))))

    # sort the hypervolume of each algorithm across all seeds from largest to smallest
    for i in range(len(performance_data[0,:])):
        performance_data[:,i] = np.sort(performance_data[:, i])[::-1]

    # plot a bar with the hypervolume achieved by each seed
    # start with 1 (full red), then highest, 2nd highest, etc.
    cmap = matplotlib.colormaps['RdBu'] # a color map for the plot
    for j in range(len(performance_data)):
        bar_color = (j)/len(performance_data) # color of the bar being plotted
        pcolor = cmap(bar_color) # pull color from colormap
        ax.bar(np.arange(len(performance_data[0,:])), performance_data[j,:], color=pcolor, alpha=1) # plot the seed

    # plot a the best hypervolume achieved by each Algorithm
    ax.scatter(np.arange(len(performance_data[0,:])), performance_data[1,:], color='w')

    # create a color bar
    sm = ScalarMappable(cmap= cmap)
    sm.set_array([])
    cbar = plt.colorbar(sm)
    cbar.set_label('Probability of Attainment')

    # format the plot
    #ax.set_xticks(np.arange(4))
    title_df = pd.read_csv(performance_file)
    #print(list(title_df.columns))
    ax.set_xticks(np.arange(len(performance_data[0,:])))
    ax.set_xticklabels(title_df.columns)
    ax.set_ylabel(metric +'\nPreference $\longrightarrow$')

#Example Usage
attainment_plot('HV_20_Seeds.csv', 'Relative Hypervolume')

Control Maps

Many MOEAs have complex parameterizations that may impact their ability to solve challenging multiobjective problems. Control maps are one way we can visualize how an algorithm’s performance varies across different parameterizations. Control maps are usually two dimensional projections of sampled parameters, with each axis on the map representing a different parameter. The algorithm’s performance at different parameter combinations is shown as the color on the map. An advantage of some high quality algorithms such as the Borg MOEA is that they are relatively insensitive to their parameterization, and their performance is mainly controlled by the number of function evaluations (NFEs) performed.

In the example below, the control map shows an algorithm that is sensitive to both the NFEs and intial population size, indicating that users must be careful of their parameterization when using this algorithm.

def control_plot(control_data, parameter_1, parameter_2):
    '''
    Creates a two dimensional control plot for MOEA performance across many parameterizations

    :param control_data:                the path to a csv file with performance data aranged as an nxn where n is the number of parameter samples
    :param parameter_1                  the name of the parameter being plotted on the horizontal axis
    :param parameter_2                  the name of the parameter being plotted on the vertical axis
    '''

    metrics = np.loadtxt(control_data, delimiter=',')
    filtered_metrics = gaussian_filter(metrics, sigma=5)

    fig, ax = plt.subplots()
    cmap = plt.get_cmap('RdYlBu')


    im = ax.imshow(filtered_metrics, cmap=cmap)
    ax.set_ylim([0,len(filtered_metrics[0,:])-1])
    ax.set_yticks(np.arange(len(filtered_metrics[:,0])))
    ax.set_xticks(np.arange(len(filtered_metrics[0,:])))
    ax.set_ylabel(parameter_2)
    ax.set_xlabel(parameter_1)
    ax.set_title('Control Map')

    # create a color bar
    sm = ScalarMappable(cmap= cmap)
    sm.set_array([])
    cbar = plt.colorbar(sm)
    cbar.set_label('Relative Hypervolume')

control_plot('Control_data.csv', 'Initial Population Size ($10^1$)', 'NFE ($10^3$)')

Runtime Dynamics

Another important way can compare algorithms is by visualizing the runtime dynamics, which reveals how algorithms perform during the search. We do this by tracking performance metrics during the search and plotting the performance against the NFEs. In the chart below, I plot the performance of two algorithms on a hypothetical test problem. Here the max and min across seeds are plotted as a shaded region, and the mean performance is plotted as a solid line for each algorithm.

import seaborn as sns
sns.set()
runtime_1 = np.loadtxt('Alg1_runtime.csv', delimiter=',', skiprows=1)
runtime_1 = np.vstack((np.zeros(21), runtime_1))

maxes_1 = runtime_1[:,1:].max(axis=1)
mins_1 = runtime_1[:,1:].min(axis=1)
means_1 = runtime_1[:,1:].mean(axis=1)

runtime_2 = np.loadtxt('Alg2_runtime.csv', delimiter=',', skiprows=1)
runtime_2 = np.vstack((np.zeros(21), runtime_2))

maxes_2 = runtime_2[:,1:].max(axis=1)
mins_2 = runtime_2[:,1:].min(axis=1)
means_2 = runtime_2[:,1:].mean(axis=1)

fig, ax = plt.subplots()
ax.fill_between(runtime_1[:,0], mins_1, maxes_1, alpha=.3, color='lightcoral')
ax.plot(runtime_1[:,0], means_1, color='firebrick')
ax.fill_between(runtime_2[:,0], mins_2, maxes_2, alpha=.3, color='darkorchid')
ax.plot(runtime_2[:,0], means_2, color='darkviolet')

ax.set_xlim([0,200])
ax.set_ylim([0,1])
ax.set_xlabel('NFE (10^3)')
ax.set_ylabel('Relative Hypervolume')
ax.set_title('Runtime Dynamics')

Multiobjective trade-offs

Finally, we can visualize the implications of algorithmic performance by plotting the Pareto approximate sets across multiple objectives. An important tool for high-dimensional problems is the parallel axis plot. For an introduction to parallel axis plots, see this post. Pandas has a simple tool for making these plots that are useful for data with multiple categories (for example, different algorithms).

from pandas.plotting import parallel_coordinates

objs = pd.read_csv('objs_data.csv')
fig, ax =plt.subplots()
parallel_coordinates(objs, 'Algorithm', linewidth=3, alpha=.8, ax=ax, color = ('#556270', '#4ECDC4'))
ax.set_ylabel('Objective Value\n(Preference $\longrightarrow$)')
ax.set_title('Approximate Pareto Sets')

References


Gupta, R. S., Hamilton, A. L., Reed, P. M., & Characklis, G. W. (2020). Can modern multi-objective evolutionary algorithms discover high-dimensional financial risk portfolio tradeoffs for snow-dominated water-energy systems?. Advances in water resources, 145, 103718.

Reed, P. M., Hadka, D., Herman, J. D., Kasprzyk, J. R., & Kollat, J. B. (2013). Evolutionary multiobjective optimization in water resources: The past, present, and future. Advances in water resources, 51, 438-456.

Salazar, J. Z., Reed, P. M., Herman, J. D., Giuliani, M., & Castelletti, A. (2016). A diagnostic assessment of evolutionary algorithms for multi-objective surface water reservoir control. Advances in water resources, 92, 172-185.

Salazar, J. Z., Reed, P. M., Quinn, J. D., Giuliani, M., & Castelletti, A. (2017). Balancing exploration, uncertainty and computational demands in many objective reservoir optimization. Advances in water resources, 109, 196-210.

Exploring time-evolving vulnerability with the newly published interactive tutorial in the Addressing Uncertainty in MultiSector Dynamics Research eBook

We recently published two new Jupyter Notebook tutorials as technical appendices to our eBook on Addressing Uncertainty in MultiSector Dynamics Research developed as part of the Integrated MultiSector, Multiscale Modeling (IM3) project, supported by the Department of Energy Office of Science’s MultiSector Dynamics Program. The eBook provides an overview of diagnostic modeling, perspectives on model evaluation, and a framework for basic methods and concepts used in sensitivity analysis. The technical appendices demonstrate key concepts introduced in the text and provide example Python code to use as a starting point for future analysis. In this post, I’ll discuss the concepts introduced in one of the new Python tutorials, Time-evolving scenario discovery for infrastructure pathways. This post will focus on the notebook’s connections to topics in the main text of the eBook rather than detailing the code demonstrated in the notebook. For details on the test case and code used in the tutorial, see the tutorial I posted last year.

In this post, I’ll first give a brief overview of the example water supply planning test case used in the tutorial, then discuss the methodological steps used to explore uncertainty and vulnerability in the system. The main topics discussed in this post are the design of experiments, factor mapping, and factor prioritization.

The Bedford-Greene water supply test case

The Bedford-Greene infrastructure investment planning problem (Figure 1) is a stylized water resources test case designed to reflect the challenges faced when evaluating infrastructure systems that evolve over time and are subject to uncertainty system inputs. The Bedford-Greene system contains two water utilities developing an infrastructure investment and management strategy to confront growing water demands and climate change. The test case was chosen for an eBook tutorial because it contains complex and evolving dynamics driven by strongly coupled human-natural system interactions. To explore these dynamics, the tutorial walks through an exploratory modeling experiment that evaluates how a large ensemble of uncertainties influences system performance and how these influences evolve over time.

In the Bedford-Greene system, modelers do not know or cannot agree upon the probability distributions of key system inputs, a condition referred to as “deep uncertainty” (Kwakkel et al., 2016). In the face of deep uncertainties, we perform an exploratory experiment to understand how a large ensemble of future scenarios may generate vulnerability for the water supply system.

Figure 1: The Bedford-Greene Water Resources Test Case

Setting up the computational experiment

The first step in the exploratory experiment is to define which factors in the mathematical model of the system are considered uncertain and how to sample the uncertainty space. The specific uncertain factors and their variability can be elicited through expert opinion, historical observation, values in literature, or physical meaning. In the Bedford-Greene system, we define 13 uncertain factors drawn from literature on real-world water supply systems (Gorelick et al., 2022). The uncertain factors used in this tutorial can be found in Table 1.

Factor NamePlausible RangeDescription
Near-Term Demand Growth Rate Mult.-0.25 to 2.0A scaling factor on projected demand growth over the first 15 years of the planning period
Mid-Term Demand Growth Rate Mult.-0.25 to 2.0A scaling factor on projected demand growth over the second 15 years of the planning period
Long-Term Demand Growth Rate Mult.-0.25 to 2.0A scaling factor on projected demand growth over the third15 years of the planning period
Bond Term0.8 to 1.2A scaling factor on the number of years over which infrastructure capital costs are repaid as debt service
Bond Interest Rate0.6 to 1.2A scaling factor that adjusts the fixed interest rate on bonds for infrastructure
Discount Rate0.8 to 1.2The rate at which infrastructure investment costs are discounted over time
Restriction Efficacy0.8 to 1.2A scaling factor on how effective water use restrictions are at reducing demands
Infrastructure Permitting Period0.75 to 1.5A scaling factor on estimated permitting periods for infrastructure projects
Infrastructure Construction Time1 to 1.2A scaling factor on estimated construction times for infrastructure projects
Inflow Amplitude0.8 to 1.2A sinusoidal scaling factor to apply non-stationary conditions to reservoir inflows
Inflow Frequency0.2 to 0.5A sinusoidal scaling factor to apply non-stationary conditions to reservoir inflows
Inflow Phase-pi/2 to pi/2A sinusoidal scaling factor to apply non-stationary conditions to reservoir inflows
Table 1: Deep Uncertainties sampled for the Bedford-Greene System

After the relevant uncertainties and their plausible ranges have been identified, the next step is to define a sampling strategy. A sampling strategy is often referred to as a design of experiments, a term that dates back to the work of Ronald Fisher in the context of laboratory or field-based experiments (Fischer, 1936). The design of experiments is a methodological choice that should be carefully considered before starting computational experiments. The design of experiments should be chosen to balance the computational cost of the exploratory experiment with the amount of information needed to accurately characterize system vulnerability. An effective design of experiments allows us to explore complex relationships within the model and evaluate the interactions of system inputs. Five commonly used designs of experiments are overviewed in Chapter 3.3 of the main text of the eBook.

In the Bedford-Greene test case, we employ a Latin Hypercube Sampling (LHS) strategy, shown in Figure 3d. With this sampling technique for the 13 factors shown in Table 1, a 13-dimensional hypercube is generated, with each factor divided into an equal number of levels to obtain 2,000 different samples of future scenarios. 2,000 samples were chosen here based on testing from similar water supply test cases (Trindade et al., 2020) for other analyses, but the sampling size must be determined on a case-by-case basis. The LHS design guarantees sampling from every level of the uncertainty space without overlaps and will generate a diverse coverage of the entire space. When the number of samples is much greater than the number of uncertain factors, LHS effectively approximates the more computationally expensive full factorial sampling scheme shown in Figure 3a without needing to constrain samples to discrete levels for each factor, as done in fractional factorial sampling, shown in Figure 3c. For more details on each sampling scheme and general information on design of experiments, see Chapter 3.3 of the eBook.

Figure 2: Alternative designs of experiments reproduced from Figure 3.3 of the eBook main text. a) full factorial design sampling of three factors at four levels with a total of 64 samples; b) the exponential growth of a necessary number of samples when applying full factorial design at four levels; c) fractional factorial design of three factors at four levels at a total of 32 samples; d) Latin Hypercube sample of three factors with uniform distributions for a total of 32 samples.

A final step in our experimental setup is to determine which model outputs are relevant to model users. In the Bedford-Greene test case, we specify five performance criteria along with performance thresholds that the water utilities would like their infrastructure investment policy to meet under all future conditions. The performance criteria and thresholds are shown in Table 2. These values are based on water supply literature, and relevant criteria and thresholds should be determined on a case-by-case basis.

Performance criteriaThreshold
Reliability< 99%
Restriction Frequency>20%
Worst-case cost>10 % annual revenue
Peak financial cost> 80% annual revenue
Stranded assets> $5/kgal unit cost of expansion
Table 2: Performance criteria and thresholds

Discovering consequential scenarios

To explore uncertainty in the Bedford-Greene system, we run the ensemble of uncertainties developed by our design of experiments through a water resources systems model and examine the outputs of each sampled combination of uncertainty. In this system, we’re interested in understanding 1) which uncertainties have the most impact on system vulnerability, 2) which combinations of uncertainties lead to consequential outcomes for the water supply system, and 3) how vulnerability evolves over time. Chapter 3.2 of the eBook introduces diagnostic approaches that can help us answer these questions. In this tutorial, we utilize gradient-boosted trees, a machine-learning algorithm that uses an ensemble of shallow trees to generate an accurate classifier (for more on boosting, see Bernardo’s post). Gradient-boosted trees are particularly well suited to infrastructure investment problems because they are able to capture non-linear and non-differential boundaries in the uncertainty space, which often occur as a result of discrete capacity expansions. Gradient-boosted trees are also resistant to overfitting, easy to interpret, and provide a simple means of ranking the importance of uncertainties. For more background on gradient-boosted trees for scenario discovery, see this post from last year.

Gradient-boosted trees provide a helpful measure of feature importance, the percentage decrease in the impurity of the ensemble of trees associated with each factor. We can use this measure to examine how each uncertainty contributes to the ability of the region’s infrastructure investment and management policy. Infrastructure investments fundamentally alter the water utilities’ storage-to-capacity ratios and levels of debt burden, which will impact their vulnerability over time. To account for these changes, we examine feature importance over three different time periods. The results of our exploratory modeling process are shown in Figure 3. We observe that the importance of various uncertainties evolves over time for both water utilities. For example, while near-term demand growth is a key factor for both utilities in all three time periods, restriction effectiveness is a key uncertainty for Bedford in the near- and mid-term but not in the long-term, likely indicating that infrastructure investment reduces the utility’s need to rely on water use restrictions. Greene is not sensitive to restriction effectiveness in the near-term or long-term, but is very sensitive in the mid-term. This likely indicates that the utility uses restrictions as a bridge to manage high demands before infrastructure investments have been fully constructed.

Figure 3: factor importance for the two utilities. Darker colors indicate that uncertainties have higher predictive value for discovering consequential scenarios.

To learn more about how vulnerability for the two water utilities evolves, we use factor mapping (eBook Chapter 3.2) to delineate regions of the uncertainty space that lead to consequential model outputs. The factor maps in Figures 4 and 5 complement the factor ranking in Figure 3 by providing additional information about which combinations of uncertainties generate vulnerability for the two utilities. While near-term demand growth and restriction effectiveness appear to generate vulnerability for Bedford in the near-term, Figure 4 reveals that the vast majority of sampled future states of the world meet the performance criteria. When evaluated using a 22-year planning horizon, however, failures emerge as a consequence of high demand and low restriction effectiveness. When evaluated across a 45-year planning horizon, the utility appears extremely vulnerable to high demand, indicating that the infrastructure investment policy is likely insufficient to maintain water supply reliability.

Figure 4: Factor maps for Bedford

Greene’s factor maps tell a different story. In the near-term, the utility is vulnerable to high-demand scenarios. In the mid-term, the vulnerable regions have transformed, and two failure modes are apparent. First, the utility is vulnerable to a combination of high near-term demand and low restriction effectiveness, indicating the potential for water supply reliability failures. Second, the utility is vulnerable to low-demand scenarios, highlighting a potential financial failure from over-investment in infrastructure. When analyzed across the 45-year planning horizon, the utility is vulnerable to only low-demand futures, indicating a severe financial risk from over-investment. These factor maps provide important context to the factor priorities shown in Figure 3. While the factor prioritization does highlight the importance of demand growth for Greene, it does not indicate which ranges of uncertainty generate vulnerability. Evaluating the system across time reveals that though the utility is always sensitive to demand growth, the consequences of demand growth and the range that generates vulnerability completely transform over the planning period.

Figure 5: Factor maps for Greene

Concluding thoughts

The purpose of this post was to provide additional context to the eBook tutorial on time-evolving scenario discovery. The Bedford-Greene test case was chosen because it represents a tightly coupled human natural system with complex and nonlinear dynamics. The infrastructure investments made by the two water utilities fundamentally alter the system’s state dynamics over time, necessitating an approach that can capture how vulnerability evolves. Through a carefully designed computational experiment, and scenario discovery using gradient-boosted trees, we discover multiple failure modes for both water utilities, which can help regional decision-makers monitor policy performance and adapt to changing conditions. While each application will be different, the code in this tutorial can be used as a starting point for applying this methodology to other human-natural systems. As with all tutorials in the eBook, the Jupyter notebook ends with a section on how to apply this methodology to your problem.

References

Kwakkel, J. H., Walker, W. E., & Haasnoot, M. (2016). Coping with the wickedness of public policy problems: approaches for decision making under deep uncertainty. Journal of Water Resources Planning and Management, 142(3), 01816001.

Fisher, R.A. (1936). Design of experiments. Br Med J, 1(3923):554–554

Trindade, B. C., Gold, D. F., Reed, P. M., Zeff, H. B., & Characklis, G. W. (2020). Water pathways: An open source stochastic simulation system for integrated water supply portfolio management and infrastructure investment planning. Environmental Modelling & Software, 132, 104772.

Detecting Causality using Convergent Cross Mapping: A Python Demo using the Fisheries Game

This post demonstrates the use of Convergent Cross Mapping (CCM) on the Fisheries Game, a dynamic predator-prey system. To conduct CCM, we’ll use the causal_ccm Python package by Prince Joseph Erneszer Javier, which you can find here. This demo follows the basic procedure used in the tutorial for the causal_ccm package, which you can find here.

All code in this blog post can be found in a Jupyter notebook in this Github repo.

Convergent Cross Mapping

CCM is a technique for understanding causality in dynamical systems where the effects of causal variables cannot be separated or uncoupled from the variables they influence (Sugihara et al., 2012). Rohini wrote a great blog post about CCM in 2021, so I’ll just provide a quick summary here and refer readers to Rohini’s post for more details.

CM harnesses the idea that the dynamics of a system can be represented by an underlying manifold, which can be approximated using lagged information from the time series of each variable of interest. If variable X has a causal effect on variable Y, then information about X should be encoded in variable Y, and we can “recover” historical states of X from the historical time series of Y. Using this concept, CCM develops “shadow manifolds” for system variables, and examines the relationships between shadow manifolds using cross mapping, which involves sampling nearest-neighbor points in one manifold, and determining if they correspond to neighboring points in the other. For a more detailed explanation of CCM, I highly recommend reading Sugihara et al., 2012. I also found the series of videos created by the authors to be extremely helpful. You can find them here.

Simulating the fisheries predator-prey system

We’ll start by simulating the predator-prey system used in Hadjimichael et al., (2020). This system models the population dynamics of two species, a predator and a prey. The simulation uses two differential equations to model the dynamics of the predator-prey system:

\frac{dx}{dt} = bx(1-\frac{x}{K}) - \frac{\alpha x y}{y^{m }+ \alpha h x} - zx


\frac{dy}{dt} = \frac{c \alpha x y}{\alpha h x} -dy

Where x and y represent the prey and predator population densities, t represents the time in years, a is the prey availability, b is the prey growth rate, c is the rate at which prey is converted to new predators, d is the predator death rate, h is the time needed to consume prey (called the handling time), K is the carrying capacity for prey, m is the level of predator interaction, and z is the harvesting rate. In the cell below, we’ll simulate this system for a given set of environmental parameters (a, b etc.), and plot the dynamics over 120 time periods. For more details on the Fisheries system, see the training blog posts by Lillian and Trevor (part 0, part 1, part 2). There is also an interactive Jupyter notebook in our recently published eBook on Uncertainty Characterization for Multisector Dynamics Research.

import numpy as np
from matplotlib import pyplot as plt

# assign default parameters
tsteps = 120 # number of timesteps to simulate
a = 0.01 # prey availability
b = 0.25 # prey growth rate
c = 0.30 # rate that prey is converted to predator
d = 0.1 # predator death rate
h = 0.61 # handling time
K = 1900 # prey carrying capacity
m = .20 # predator interference rate

# create arrays to store predator and prey populations
prey = np.zeros(tsteps+1)
pred = np.zeros(tsteps+1)

# initial population levels
prey[0] = 28
pred[0] = 28

# harvesting, which we will keep at 0 for this example
z = np.zeros(len(prey))

# simulate the system
for t in range(tsteps):
    if prey[t] > 0 and pred[t] > 0:
        prey[t + 1] = (prey[t] + b * prey[t] * (1 - prey[t] / K) - (a * prey[t] * pred[t]) / (np.power(pred[t], m) +
                                                            a * h * prey[t]) - z[t] * prey[t])   # Prey growth equation
        pred[t + 1] = (pred[t] + c * a * prey[t] * pred[t] / (np.power(pred[t], m) + a * h *
                                                            prey[t]) - d * pred[t]) # Predator growth equation

# plot the polulation dynamics
fig = plt.figure(figsize=(6,6))
plt.plot(np.arange(tsteps+1), prey, label = 'Prey')
plt.plot(np.arange(tsteps+1), pred, label = 'Predator')
plt.legend(prop={'size': 12})
plt.xlabel('Timestep', size=12)
plt.ylabel('Population Size', size=12)
plt.title('Population Dynamics', size=15)
Figure 1: predator and prey population dynamics

With these parameters, we can visualize the trajectory and direction field of the system of equations, which is shown below (I’m not including ploting code here for brevity, but see this tutorial if you’re interested in making these plots). In CCM terms, this is a visualization of the underlying manifold of the dynamical system.

Figure 2: Trajectories and direction fields of the predator-prey system

Causal detection with CCM

From the system of equations above, we know there is a causal relationship between the predator and prey, and we have visualized the common manifold. Below, we’ll test whether CCM can detect this relationship. If the algorithm works as it should, we will see a clear indication that the two populations are causally linked.

To conduct CCM, we need to specify the number of dimensions to use for shadow manifolds, E, the lag time, tau, and the library size, L.

The CCM algorithm should converge to a stable approximation of causality as the library size increases. Below we’ll test library sizes from 10 to 100 to see if we achieve convergence for the Fisheries system. We’ll start by assuming shadow manifolds have two dimensions (E=2), and a lag time of one time step (tau=1). To test convergence, we’ll plot the correlation (rho) between the shadow manifold predictions and the historical states of the two variables.

from causal_ccm.causal_ccm import ccm
E = 2 # dimensions of the shadow manifold
tau = 1 # lag time
L_range = range(10, 100, 5) # test a range of library sizes
preyhat_Mpred, predhat_Mprey = [], [] # correlation list
for L in L_range:
    ccm_prey_pred = ccm(prey, pred, tau, E, L) # define new ccm object # Testing for X -> Y
    ccm_pred_prey = ccm(pred, prey, tau, E, L) # define new ccm object # Testing for Y -> X
    preyhat_Mpred.append(ccm_prey_pred.causality()[0])
    predhat_Mprey.append(ccm_pred_prey.causality()[0])

# Plot Cross Mapping Convergence
plt.figure(figsize=(6,6))
plt.plot(L_range, preyhat_Mpred, label='$\hat{Prey}(t)|M_{Prey}$')
plt.plot(L_range, predhat_Mprey, label='$\hat{Pred}(t)|M_{Pred}$')
plt.ylim([0,1])
plt.xlabel('Library Size', size=12)
plt.ylabel(r'$\rho$', size=12)
plt.legend(prop={'size': 12})
Figure 3: Convergence in cross mapping estimates

Figure 3 shows that CCM does seem to detect a causal relationship between the predator and prey (rho is far above 0), and the estimated strength of this relationship starts to stabilize (converge) with a library size of around 60.

Next, we’ll examine how our lag time (tau) used to construct the shadow manifolds impacts our findings.

from causal_ccm.causal_ccm import ccm
E = 2 # dimensions of the shadow manifold
L = 60 # length of library (we'll return to this later)

# first, test different lags for construction of the shadow manifolds
# we'll test lags from 0 to 30 time steps
preyhat_Mpred, predhat_Mprey = [], []  # lists to store correlation
for tau in range(1, 20):
    ccm_prey_pred = ccm(prey, pred, tau, E, L)  # define new ccm object # Testing for prey -> pred
    ccm_pred_prey = ccm(pred, prey, tau, E, L)  # define new ccm object # Testing for pred -> prey
    preyhat_Mpred.append(ccm_prey_pred.causality()[0]) # stores prey -> pred
    predhat_Mprey.append(ccm_pred_prey.causality()[0]) # stores pred -> prey

# plot the correlation for different lag times
plt.figure(figsize=(6,6))
plt.plot(np.arange(1,20), preyhat_Mpred, label='$\hat{Prey}(t)|M_{Prey}$')
plt.plot(np.arange(1,20), predhat_Mprey, label='$\hat{Pred}(t)|M_{Pred}$')
plt.ylim([0,1.01])
plt.xlim([0,20])
plt.xticks(np.arange(20))
plt.xlabel('Lag', size=12)
plt.ylabel(r'$\rho$', size=12)
plt.legend(prop={'size': 12})
Figure 4: The impact of lag time, tau, on CCM estimates

The results in Figure 4 indicate that a lag time of 1 (which we initially assumed) does not adequately capture the causal relationship between the two variables. When lag times are set between 5 and 10, CMM shows a much stronger relationship between the two variables. Using this information, we can again test the convergence across different library sizes.

E = 2 # dimensions of the shadow manifold
tau = 5
L_range = range(10, 100, 5)
preyhat_Mpred, predhat_Mprey = [], [] # correlation list
for L in L_range:
    ccm_prey_pred = ccm(prey, pred, tau, E, L) # define new ccm object # Testing for X -> Y
    ccm_pred_prey = ccm(pred, prey, tau, E, L) # define new ccm object # Testing for Y -> X
    preyhat_Mpred.append(ccm_prey_pred.causality()[0])
    predhat_Mprey.append(ccm_pred_prey.causality()[0])

# Plot Cross Mapping Convergence
plt.figure(figsize=(6,6))
plt.plot(L_range, preyhat_Mpred, label='$\hat{Prey}(t)|M_{Prey}$')
plt.plot(L_range, predhat_Mprey, label='$\hat{Pred}(t)|M_{Pred}$')
plt.ylim([0,1])
plt.xlabel('Library Size', size=12)
plt.ylabel(r'$\rho$', size=12)
plt.legend(prop={'size': 12})
Figure 5: With a lag of 5, CCM converges much faster, and detects a much stronger relationship between predator and prey

In the Figure 5, we observe that CCM with a lag size of 5 converges much faster and generates a stronger correlation than our original estimate using tau = 1. In fact, CCM can reconstruct the historical system states almost perfectly. To see why we can visualize the underlying shadow manifolds and cross mapping conducted for this analysis (this is conveniently available in the causal_ccm package with the visualize_cross_mapping function).

# set lag (tau) to 7 and examine results
tau = 5

# prey -> predator
ccm1 = ccm(prey,pred, tau, E, L) # prey -> predator
print("prey -> predator: " + str(ccm1.causality()[0]))

# visualize the cross mapping from the two shadow manifolds
ccm1.visualize_cross_mapping()
Figure 6: Shadow manifolds and cross mapping between prey (shown as X) and predator (shown as Y).

Figure 6 shows the two shadow manifolds for prey (X in these plots) and predator (Y in these plots). We observe that the shapes of the shadow manifolds preserve the general characteristics of the original manifold, as shown by the trajectories plotted in Figure 2. Figure 6 also shows the the nearest neighbor points sampled in each manifold (blue boxes) and their mapping to the other variable’s shadow manifold (red stars). We can see how similar points on one manifold correspond to similar points on the other in both directions.

We can also use the causal_ccm package to visualize the correlation between the prey and the CCM prey estimates, with very impressive results (Figure 7).

Figure 7: The relationship between observed populations and CCM estimates.

Concluding thoughts

This example demonstrates that CCM can indeed detect the causal relationship between predator and prey in this system and in fact, provides extremely accurate reconstructions of both population sets. This shouldn’t come as a surprise since we knew from the start that a strong causal relationship does exist within this system. Still, I find it almost unnerving how well a job CCM manages to do here. For more info on CCM and coding CCM, see the links below:

A great tutorial of CCM by the author of the causal_ccm Python package

Rohini’s blog post (with a demo of CCM in R)

Video links teaching CCM core concepts

References:

Hadjimichael, A., Reed, P., & Quinn, J. (2020). Navigating Deeply Uncertain Tradeoffs in Harvested Predator-Prey Systems. Complexity, 2020, 1-18. https://doi.org/10.1155/2020/4170453

Sugihara, G., May, R., Ye, H., Hsieh, C. H., Deyle, E., Fogarty, M., & Munch, S. (2012). Detecting causality in complex ecosystems. science338(6106), 496-500.

Time-evolving scenario discovery for infrastructure pathways

Our recently published eBook, Addressing Uncertainty in Multisector Dynamics Research, provides several interactive tutorials for hands on training in model diagnostics and uncertainty characterization. This blog post is a preview of an upcoming extension of these trainings featuring an interactive tutorial on time-evolving scenario discovery for the development of adaptive infrastructure pathways for water supply planning. This post builds off the prior tutorial on gradient-boosted trees for scenario discovery.

I’ll first introduce a styled water supply test case featuring two water utilities seeking to develop a cooperative infrastructure investment and management policy over a 45-year planning horizon. I’ll then demonstrate how the utilities can explore evolving vulnerability across the planning period. All code for this demo is available on Github. The code is written in Python, but the workflow is model agnostic and can be paired with simulation models in any language.

Background

The Bedford-Greene metropolitan area (Figure 1) is a stylized water resources test case containing two urban water utilities seeking to develop an infrastructure investment and management strategy to confront growing demands and changing climate. The utilities have agreed to jointly finance and construct a new water treatment plant on Lake Classon, a large regional resource. Both utilities have also identified a set of individual infrastructure options to construct if necessary.

Figure 1: The Bedford-Greene metropolitan area

The utilities have formulated a cooperative and adaptive regional water supply management strategy that uses a risk-of-failure (ROF) metric to trigger both short-term drought mitigation actions (water use restrictions and treated transfers between utilities) and long-term infrastructure investment decisions (Figure 2a). ROFs represent a dynamic measure of the utilities’ evolving capacity-to-demand ratios. Both utilities have specified a set of ROF thresholds to trigger drought mitigation actions and plan to actively monitor their short-term ROF on a weekly basis. When a utility’s ROF crosses a specified threhold, the utility will implement drought mitigation actions in the following week. The utilities will also monitor long-term ROF on an annual basis, and trigger infrastructure investment if long-term risk crosses a threshold dictated by the policy. The utilities have also specified a construction order for available infrastructure options.

Figure 2: a) the regional infrastructure investment and management policy. b) adaptive pathways generated across 2,000 deeply uncertain SOWs

The utilities performed a Monte Carlo simulation to evaluate how this policy responds to a wide array of future states of the world (SOWs), each representing a different sample of uncertainties including demand growth rates, changes to streamflows, and financial variables.

The ROF-based policies respond to each SOW by generating a unique infrastructure pathway – a sequence of infrastructure investment decisions over time. Infrastructure pathways across 2,000 SOWs are shown in Figure 2b. Three clusters summarizing infrastructure pathways are plotted as green lines which represent the median week that options are triggered. The frequency that each option is triggered across all SOWs is plotted as the shading behind the lines. Bedford relies on the Joint Water Treatment facility and short-term measures (water use restrictions and transfers) to maintain supply reliability. Greene constructs the Fulton Creek reservoir in addition to the Joint Treatment plant and does not strongly rely on short-term measures to combat drought.

The utilities are now interested in evaluating the robustness of their proposed policy, characterizing how uncertainties generate vulnerability and understanding how this vulnerability may evolve over time.

Time-evolving robustness

To measure the robustness of the infrastructure investment and management policy, the two utilities employ a satisficing metric, which measures the fraction of SOWs where the policy is able to meet a set of performance criteria. The utilities have specified five performance criteria that measure the policy’s ability to maintain both supply reliability and financial stability. Performance criteria are shown in Table 1.

Performance criteriaThreshold
Reliability< 99%
Restriction Frequency>20%
Worst-case cost>10 % annual revenue
Peak financial cost> 80% annual revenue
Stranded assets> $5/kgal unit cost of expansion
Table 1: Satisficing criteria

Figure 3 shows the evolution of robustness over time for the two utilities. While the cooperative policy is very robust after the first ten years of the planning horizon, it degrades sharply for both utilities over time. Bedford meets the performance criteria in nearly 100% of sampled SOWs after the first 10 years of the planning horizon, but its robustness is reduced to around 30% by the end of the 45-year planning period. Greene has a robustness of over 90% after the first 10 years and degrades to roughly 60% after 45 years. These degradations suggest that the cooperative infrastructure investment and management policy is insufficient to successfully maintain long-term performance in challenging future scenarios. But what is really going on here? The robustness metric aggregates performance across the five criteria shown in Table 1, giving us a general picture of evolving performance, but leaving questions about the nature of the utilities’ vulnerability.

Figure 3: Robustness over time

Figure 4 provides some insight into how utility vulnerability evolves over time. Figure 4 shows the fraction of failure SOWs that can be attributed to each performance criterion when performance is measured in the near term (next 10 years), mid-term (next 22 years), and long-term (next 45 years). Figure 4 reveals that the vulnerability of the two utilities evolves in very different ways over the planning period. Early in the planning period, all of Bedford’s failures can be attributed to supply reliability. As the planning horizon progresses, Bedford’s failures diversify into failures in restriction frequency and worst-case drought management cost, indicating that the utility is generally unable to manage future drought. Bedford likely needs more infrastructure investment than is specified by the policy to maintain supply reliability.

In contrast to Bedford’s performance, Greene begins with vulnerability to supply reliability, but its vulnerability shifts over time to become dominated by failures in peak financial cost and stranded assets – measures of the utility’s financial stability. This shift indicates that while the infrastructure investments specified by the cooperative policy mitigate supply failures by the end of the planning horizon, these investments drive the utility into financial failure in many future scenarios.

Figure 4: Failures over time

Factor mapping and factor ranking

To understand how and why the vulnerability evolves over time, we perform factor mapping. Figure 5 below, shows the uncertainty space projected onto the two most influential factors for Bedford, across three planning horizons. Each point represents a sampled SOW, red points represent SOWs that resulted in failure, while white points represent SOWs that resulted in success. The color in the background shows the predicted regions of success and failure from the boosted trees classification.

Figure 4 indicates that Bedford’s vulnerability is primarily driven by rapid and sustained demand growth and this vulnerability increases over time. When evaluated using a 22-year planning horizon, the utility only appears vulnerable to extreme values of near-term demand growth, combined with low values of restriction effectiveness. This indicates that the utility is relying on restrictions to mitigate supply failures, and is vulnerable when they do not work as anticipated. When evaluated over the 45-year planning horizon, Bedford’s failure is driven by rapid and sustained demand growth. If near-term demand grows faster than anticipated (scaling factor > 1.0 on the horizontal axis), the utility will likely fail to meet its performance criteria. If near-term demand is lower than anticipated, the utility may still fail to meet performance criteria if under conditions of high mid-term demand growth. These results provide further evidence that the infrastructure investment and management policy is insufficient to meet Bedford’s long-term water supply needs.

Figure 5: Time-evolving factor maps for Bedford

Greene’s vulnerability (Figure 6) evolves very differently from Bedford’s. Greene is vulnerable to high-demand scenarios in the near term, indicating that its current infrastructure is insufficient to meet rapidly growing demands. Greene can avoid this failure under scenarios where the construction permitting time multiplier is the lowest, indicating that new infrastructure investment can meet the utility’s near-term supply needs. When evaluated across a 22-year planning horizon, the utility fails when near-term demand is high and restriction effectiveness is low, a similar failure mode to Bedford. However, the 22-year planning horizon reveals a second failure mode – low demand growth. This failure mode is responsible for the stranded assets failures shown in Figure 3. This failure mode increases when evaluated across the 45-year planning horizon, and is largely driven by low-demand futures when the utility does not generate the revenue to cover debt service payments needed to fund infrastructure investment.

Figure 6: Time-evolving factor maps for Greene

The factor maps in Figures 5 and 6 only show the two most influential factors determined by gradient boosted trees, however, the utilities are vulnerable to other sampled uncertainties. Figure 7 shows the factor importance as determined by gradient boosted trees for both utilities across the three planning horizons. While near-term demand growth is important for both utilities under all three planning horizons, the importance of other factors evolves over time. For example, restriction effectiveness plays an important role for Greene under the 22-year planning horizon but disappears under the 45-year planning horizon. In contrast, the bond interest rate is important for predicting success over the 45-year planning horizon, but does not appear important over the 10- or 22-year planning horizons. These findings highlight how assumptions about the planning period can have a large impact on modeling outcomes.

Figure 7: Factor rankings over time

A step-by-step tutorial for scenario discovery with gradient boosted trees

Our recently published eBook, Addressing Uncertainty in Multisector Dynamics Research, provides several interactive tutorials for hands on training in model diagnostics and uncertainty characterization. The purpose of this post is to expand upon these trainings by providing a tutorial demonstrating gradient boosted trees for scenario discovery. I’ll first provide some brief background on scenario discovery and gradient boosted trees, then demonstrate a Python implementation on a water supply planning problem. All code here is written in Python, but the workflow is model agnostic, and can be paired with simulation models in any language. I’ve included my code within the text below, but all code and data for this post can also be found in this git repository.

Scenario discovery gradient boosted trees

In water resources planning and management, decision makers are often faced with uncertainty about how their system will change in the future. Traditionally, planners have confronted this uncertainty by developing prespecified narrative scenarios, which reduce the multitude of possible future conditions into a small subset of important future states of the world (a prominent example is the ‘scenario matrix framework’ used to evaluate climate change (O’Neill et al., 2014)). While this approach provides intuitive appeal, it may increase system vulnerability if future conditions do not evolve as decision makers expect (for a detailed critique of scenario based planning see Reed et al., 2022). This vulnerability is especially apparent for systems facing deep uncertainty, where decision makers do not know or cannot agree upon the probability density functions of key system inputs (Kwakkel et al., 2016).

Scenario discovery (Groves and Lempert, 2007) is an exploratory modeling centered approach that seeks to discover consequential future scenarios using computational experiments rather than relying on prespecified information. To perform scenario discovery, decision makers first identify a set of relevant uncertainties and their plausible ranges. Next, an ensemble of these uncertainties is developed by sampling across parameter ranges. Candidate policies are then evaluated across this ensemble and machine learning or data mining algorithms are used to examine which combinations of uncertainties generate vulnerability in the system. These combinations can then be used to develop narrative scenarios to inform implementation and monitoring efforts or new policy development.

A core element of the scenario discovery process is the algorithm used to classify future states of the world. Popular algorithms include the PRIM, CART and logistic regression. Recently, gradient boosted trees have been applied as an alternative classificiation algorithm. Gradient boosted trees have advantages over other scenario discovery algorithms because they can easily capture nonlinear and non-differentiable boundaries in the uncertainty space (which are particularly prevalent in water supply planning problems that have discrete capacity expansion options), are highly resistant to overfitting and provide a clear means of ranking the importance of uncertain factors (Trindade et al., 2020). For a comprehensive overview of gradient boosted trees, see Bernardo’s post here.

Test case: the Sedento Valley

To demonstrate gradient boosted trees for scenario discovery we’ll use the Sedento Valley water supply planning test case (Trindade et al., 2020). In the Sedento Valley, three water utilities seek to discover cooperative water supply managment and infrastructure investment portfolios to meet several conflicting objectives in a system facing deep uncertainty. In this post, we’ll investigate how these deep uncertainties (which include demand growth, the efficacy of water use restrictions, financial variables and parameters governing infrastructure permitting and construction time) impact a utility’s ability to maintain three performance criteria: keeping reliability > 98%, restriction frequency < 20% and worst case cost less than 10% of annual revenue. For simplicity, we’ll focus on one regional water utility named Watertown.

Step 1: create a sample of deeply uncertain states of the world

To start the scenario discovery process, we generate an ensemble of deep uncertainties that represent future states of the world (SOWs). Here, we’ll use Latin Hypercube Sampling with an implementation I found in the Surrogate Modeling Toolbox.

import numpy as np
from smt.sampling_methods import LHS

'''
This script will generate 1000 Latin Hypercube Samples (LHS)
of deeply uncertain system parameters for the Sedento Valley
'''


# create an array storing the ranges of deeply uncertain parameters
DU_factor_limits = np.array([
    [0.9, 1.1], # Watertown restriction efficacy 
    [0.9, 1.1], # Dryville restriction efficacy
    [0.9, 1.1], # Fallsland restriction efficacy
    [0.5, 2.0], # Demand growth rate multiplier
    [1.0, 1.2], # Bond term
    [0.6, 1.0], # Bond interest rate
    [0.6, 1.4], # Discount rate
    [0.75, 1.5], # New River Reservoir permitting time
    [1.0, 1.2], # New River Reservoir construction time
    [0.75, 1.5], # College Rock Reservoir (low) permitting time
    [1.0, 1.2], # College Rock Reservoir (low) construction time
    [0.75, 1.5], # College Rock Reservoir (high) permitting time
    [1.0, 1.2], # College Rock Reseroir (high) construction time
    [0.75, 1.5], # Water Reuse permitting time
    [1.0, 1.2], # Water Reuse construction time
    [0.8, 1.2], # Inflow amplitude
    [0.2, 0.5], # Inflow frequency
    [-1.57, 1.57]]) # Inflow phase

# Use the smt package to set up the LHS sampling
sampling = LHS(xlimits=DU_factor_limits)

# We will create 1000 samples
num = 1000

# Create the actual sample
x = sampling(num)

# save to a csv file
np.savetxt('DU_factors.csv', x, delimiter=',')

Step 2: Evaluate performance across SOWs

Next, we’ll evaluate the performance of our policy across the LHS sample of DU factors. For the Sedento Valley test case, we use WaterPaths, an open-source simulation system for integrated water supply portfolio management and infrastructure investment planning (for more see Trindade et al., 2020). This step is not included in the git repository as it requires high-performance computing for this system, but results can be found in the “Model_output.csv” file. For simulation details, see Gold et al., 2022.

Step 3: Convert model output into a boolean array for classification

To perform classification, we need to convert the results of our simulations to a binary array classifying each SOW as a “success” or “failure” based on whether the policy met the performance criteria under the SOW. First, we define a small function to determine if an SOW meets a set of criteria, then we apply this function to our results. We also load the DU factor LHS sample.

# First, define a function to check whether performance criteria are met
def check_criteria(objectives, crit_objs, crit_vals):
    """
    Determines if an objective meets a given set of criteria for a set of SOWs

    inputs:
        objectives: np array of all objectives across a set of SOWs
        crit_objs: the column index of the objective in question
        crit_vals: an array containing [min, max] of the values 
    
    returns:
        meets_criteria: an numpy array containing the SOWs that meet both min and max criteria

    """
    
    # check max and min criteria for each objective
    meet_low = objectives[:, crit_objs] >= crit_vals[0]
    meet_high = objectives[:, crit_objs] <= crit_vals[1]

    # check if max and min criteria are met at the same time
    meets_criteria = np.hstack((meet_low, meet_high)).all(axis=1)

    return meets_criteria


##### Load data and pre-process #####

# load objectives and create input array of boolean values for SD input
Reeval_objectives = np.loadtxt('Model_output.csv', skiprows=1, delimiter=',')
REL = check_criteria(Reeval_objectives, [0], [.979, 1])
RF = check_criteria(Reeval_objectives, [1], [0, 0.10])
WCC = check_criteria(Reeval_objectives, [2], [0, 0.10])
SD_input = np.vstack((REL, RF, WCC)).SD_input(axis=0)


# load DU factors
DU_factors = np.loadtxt('DU_factors.csv', skiprows=1, delimiter=',')
DU_names = ['Watertown Rest. Eff.', 'Dryville Rest. Eff.', 'FSD_inputsland Rest. Eff.',
            'Demand Growth Rate', 'Bond Term', 'Bond Interest',
            'Discount Rate', 'NRR Perm', 'NRR Const', 'CRR L Perm',
            'CRR L Const.',	'CRR H Perm.', 'CRR H Const.', 'WR1 Perm.',
             'WR1 Const.', 'Inflows A', 'Inflows m','Inflows p']

Step 4: Fit a boosted trees classifier

After we’ve formatted the data, we’re ready to perform boosted trees classification. There are several packages for boosted trees in Python, here we’ll use the implementation from scikit-learn. We’ll use an ensemble of 200 trees with depth 3 and a learning rate of 0.1. These parameters need to be tuned for the individual problem, I found this nice post that goes into detail on parameter tuning.

##### Boosted Tree Classification #####

from sklearn.ensemble import GradientBoostingClassifier

# create a gradient boosted classifier object
gbc = GradientBoostingClassifier(n_estimators=200,
                                 learning_rate=0.1,
                                 max_depth=3)

# fit the classifier
gbc.fit(DU_factors, SD_input)

Step 5: Examine which DU factors have the most impact on performance criteria

Now we’re ready to examine the results of our classification. First, we’ll examine how important each DU factor is to the classification results generated by boosted trees. To rank the imporance of each DU factor, we examine the percentage decrease in impurity of the ensemble of trees that is associated with each factor. In plain english, this is a measure of how helpful each DU factor is to correctly classifying SOWs. This infromation is generated during the fit of the classifier above and is easily accessible as an attribute of our scikit-learn classifier.

For our example, one deep uncertainty, demand growth rate, clearly stands out as the most influential, as shown in the figure below. A second, the restriction efficacy for Watertown (the utility we’re focusing on), also stands out as a higher level of importance. All other DU factors have little impact on the classification in this case.

##### Factor Ranking #####

# Extract the feature importances
feature_importances = deepcopy(gbc.feature_importances_)

# rank the feature importances and plot
importances_sorted_idx = np.argsort(feature_importances)
sorted_names = [DU_names[i] for i in importances_sorted_idx]

fig = plt.figure(figsize=(8,8))
ax = fig.gca()
ax.barh(np.arange(len(feature_importances)), feature_importances[importances_sorted_idx])
ax.set_yticks(np.arange(len(feature_importances)))
ax.set_yticklabels(sorted_names)
ax.set_xlim([0,1])
ax.set_xlabel('Feature Importance')
plt.tight_layout()

Step 6: Create factor maps

Finally, we visualize the results of our classification through factor mapping. In the plot below, we show the uncertainty space projected onto the two most influential factors, demand growth rate and restriciton efficacy. Each point represents a sampled SOW, red points represent SOWs that resulted in failure, while white points represent SOWs that resulted in success. The color in the background shows the predicted regions of success and failure from the boosted trees classification.

Here we observe that high levels of demand growth are the primary source of vulnerability for the water utility. When restriction efficacy is lower than our estimate (multiplier < 1), the utility faces failures at demand growth levels of about 1.7 times the estimated values. When restriction effectiveness is at or above estimates, the acceptable scaling of demand growth raises to about 1.8.

Taken as a whole, these results provide valueable insights for decision makers. From our original 18 deep uncertainties, we have discovered that two are critical for the success of our water supply management policy. Further, we have defined thresholds within the uncertainty space that define scenarios that lead to failure. We can use this information to inform monitoring efforts for the water supply policy, or to inform a new problem formulation that tailors actions to mitigate these vulnerabilities.

##### Factor Mapping #####

# Select the top two factors discovered above
selected_factors = DU_factors[:, [3,0]]

# Fit a classifier using only these two factors
gbc_2_factors = GradientBoostingClassifier(n_estimators=200,
                                 learning_rate=0.1,
                                 max_depth=3)
gbc_2_factors.fit(selected_factors, SD_input)

# plot prediction contours
x_data = selected_factors[:,0]
y_data = selected_factors[:,1]

x_min, x_max = (x_data.min(), x_data.max())
y_min, y_max = (y_data.min(), y_data.max())

# create a grid to makes predictions on
xx, yy = np.meshgrid(np.arange(x_min, x_max * 1.001, (x_max - x_min) / 100),
                        np.arange(y_min, y_max * 1.001, (y_max - y_min) / 100))
                        
dummy_points = list(zip(xx.ravel(), yy.ravel()))

z = gbc_2_factors.predict_proba(dummy_points)[:, 1]
z[z < 0] = 0.
z = z.reshape(xx.shape)

# plot the factor map        
fig = plt.figure(figsize=(10,8))
ax = fig.gca()
ax.contourf(xx, yy, z, [0, 0.5, 1.], cmap='RdBu',
                alpha=.6, vmin=0.0, vmax=1)
ax.scatter(selected_factors[:,0], selected_factors[:,1],\
            c=SD_input, cmap='Reds_r', edgecolor='grey', 
            alpha=.6, s= 100, linewidth=.5)
ax.set_xlim([.5, 2])
ax.set_ylim([.9,1.1])
ax.set_xlabel('Demand Growth Multiplier')
ax.set_ylabel('Restriction Eff. Multiplier')

References

Gold, D. F., Reed, P. M., Gorelick, D. E., & Characklis, G. W. (2022). Power and Pathways: Exploring Robustness, Cooperative Stability, and Power Relationships in Regional Infrastructure Investment and Water Supply Management Portfolio Pathways. Earth’s Future, 10(2), e2021EF002472.

Groves, D. G., & Lempert, R. J. (2007). A new analytic method for finding policy-relevant scenarios. Global Environmental Change, 17(1), 73-85.

Kwakkel, J. H., Walker, W. E., & Haasnoot, M. (2016). Coping with the wickedness of public policy problems: approaches for decision making under deep uncertainty. Journal of Water Resources Planning and Management, 142(3), 01816001.

O’Neill, B. C., Kriegler, E., Riahi, K., Ebi, K. L., Hallegatte, S., Carter, T. R., … & van Vuuren, D. P. (2014). A new scenario framework for climate change research: the concept of shared socioeconomic pathways. Climatic change, 122(3), 387-400.

Reed, P.M., Hadjimichael, A., Malek, K., Karimi, T., Vernon, C.R., Srikrishnan, V., Gupta, R.S., Gold, D.F., Lee, B., Keller, K., Thurber, T.B., & Rice, J.S. (2022). Addressing Uncertainty in Multisector Dynamics Research [Book]. Zenodo. https://doi.org/10.5281/zenodo.6110623

Trindade, B. C., Gold, D. F., Reed, P. M., Zeff, H. B., & Characklis, G. W. (2020). Water pathways: An open source stochastic simulation system for integrated water supply portfolio management and infrastructure investment planning. Environmental Modelling & Software, 132, 104772.

Easy batch parallelization of code in any language using mpi4py

The simplest form of parallel computing is what’s known as “embarrassingly” parallel processes. These processes involve fully independent runs of a model or script where little or no communication is needed across parallel processes. A common example is Monte Carlo evaluation, when we run a model over an ensemble of inputs. To parallelize an embarrassingly parallel application we simply need to send a set of commands to the cluster telling it to run each sample on a different core (or set of cores). For small applications, this can be done by submitting each run individually. For larger applications, SLURM Job Arrays (which are nicely detailed in Antonia’s post, here) can efficiently batch large number of function calls to independent computing cores. While this method is efficient and effective, I find it sometimes can be hard to keep track of, as you may be submitting tens or hundreds of jobs at a time. An alternative approach to submitting embarrassingly parallel tasks is to utilize MPI with Python to dispatch and organize jobs.

I like the MPI / Python combo because it consolidates all parallel applications into a single job, meaning you have one job to keep track of on a cluster at a time, and one output file generated by the batch set of model runs. I also find Python slightly easier to edit and debug than Bash scripts (which are used to create job arrays). Additionally, it’s very easy to assign each computing core a set of function evaluations to run (this can also be done with Job arrays, but again, I find Python easier to work with). Though Python is the language used to coordinate parallel tasks, we can use it to parallelize code in any language, as I’ll demonstrate below.

In this post I’ll first provide some background on MPI and its Python implementation, mpi4py. Next I’ll provide an example I’ve developed to demonstrate how to batch run a Matlab code on a cluster. The examples presented here are derived from some of Bernardo’s code in his post on Parallel programming in C/C++, which you can find here.

A very light introduction to MPI

MPI stands for “Message Passing Interface” and is the standard library for distributed memory parallelization (for background, see this post). To understand how MPI works, it’s helpful to define some of it’s basic components.

  1. Tasks: I’ll use the term task to define a processor (or group of processors) assigned to perform a specific set of instructions. These instructions may by a single evaluation of a function, or a set of function evaluations
  2. Communicators: A communicator is a group of MPI task units that are permitted to communicate with each other. In advanced MPI applications you may have multiple communicators, but for embarrassingly parallel applications we’ll only use one. The default communicator is called “MPI_COMM_WORLD” (I don’t know why, if anyone does please feel free to share in the comments), and that’s what I’ll work with here.
  3. Ranks: Each MPI task is assigned a unique identifier within the communicator called a rank. The processors running each task can access their own rank number, which will play an important role in how we use MPI for embarrassingly parallel applications.

A example schematic of the MPI_COMM_WORLD communicator with six tasks and their associated ranks is shown below.

mpi4py

MPI is implemented in Python with the mpi4py library. When we run an MPI code on a cluster, MPI creates the communicator and assigns each task a rank, then each task unit independently load the script. The processor/s associated with a task can then access their own unique rank.

The following snip of code loads this library, accesses the communicator and stores the rank of the given process:

# load the mpi4py library
from mpi4py import MPI

# access the MPI COMM WORLD communicator and assign it to a variable
comm = MPI.COMM_WORLD

# get the rank of the current process (different for each process on the cluster)
rank = comm.Get_rank()

Example of using mpi4py to batch parallel jobs

Here, I’ll parallelize the submission of a Matlab script called demoScript.m. This script reads an input file from a specific file location and prints out the contents of that file. For example purposes I’ve created 20 input files, each in their own folders. The folders are called “input_sample_0”, “input_sample_1” etc.. Each input_sample folder contains a file called “sample_data.txt”, which contains one line of text reading: “This is data for run <sample_number>”.

All code for this example can be found on Github, here: https://github.com/davidfgold/mpi4py_blog.git

Batching runs of demoScript.m process involves three components:

  1. Write demoScript.m so that it reads the sample number from the input.
  2. Write a Python script that will use mpi4py to distribute calls of demoScript.m. Here I’ll call this script “callDemoScript.py”
  3. Write a Bash script that sets up your MPI run and calls the Python function. Here I’ll call this script “submitDemoScript.sh”

1. demoScript.m

The demo Matlab script is found below. It reads in two arguments that are called from the command line. The first argument is the rank, which will vary for each task, and the second is the sample number, which will specify which input folder to read from.

%%%%%%%%%%%%%%%%%%%%
% demoScript.m
%
% reads an input file from a given sample number (specified via command line)
% prints output from the sample file associated with the sample number
% also prints the rank for demonstration purposes
%%%%%%%%%%%%%%%%%%%%

% read in command line input
arg_list = argv();
rank = arg_list{1,1}; % rank is the first argument
sample = arg_list{2, 1}; % sample number is the second argument

% Create a string that contains the location of the proper sample directory
sample_out = fileread(strcat("input_sample_", sample, "/sample_data.txt"));

% create a string to print the rank number
rank_call = strcat("This is rank_", rank, ", recieving the following input: \n");

% format the output and print
output = strcat(rank_call, sample_out);
fprintf(output)

2. callDemoScript.py

The second component is a Python script that uses mpi4py to call demoScript.m many times across different tasks. Each task will run a number of samples equal to a variable called “N_SAMPLES_PER_TASK” which will be fed to this script when it is called.

'''
callDemoScript.py

Called to batch demoScript.m across multiple MPI tasks

Reads in the total tasks and number of samples per task from command line.
'''
# load necessary libraries
from mpi4py import MPI
import numpy as np
import sys
import os
import time

# locate the COMM WORLD communicator
comm = MPI.COMM_WORLD

# get the number of the current rank
rank = comm.Get_rank()

# read in arguments from the submission script
TOTAL_TASKS = int(sys.argv[1]) # number of MPI processes
N_SAMPLES_PER_TASK = int(sys.argv[2]) # number of runs per/task

# loop through samples assigned to current rank
for i in range(N_SAMPLES_PER_TASK):
	sample= rank + TOTAL_TASKS * i

	# write the command that will be sent to the terminal (here RUN will replace the {})
	terminal_command = "octave-cli ./demoScript.m {} {} ".format(rank, sample)

	# write the terminal command to the process
	os.system(terminal_command)

	# sleep before submitting the next command
	time.sleep(1) # optional, for memory intensive submissions

comm.Barrier()

submitDemoscript.sh

The final component is a Bash script that will send this MPI job to the cluster. Here I’ll use SLURM to create 4 MPI tasks across 2 Nodes (each node will have 2 associated task). This will create a total of 4 MPI tasks, and each task will be assigned 5 samples to run.

I wrote this for a local cluster at Cornell, note that I had to load two modules to run Python and a third to run Octave (which is used to call Matlab scripts on Linux). I’ll call the Python script with mpirun, and then specify the total number of MPI tasks before making the function call. The output of the script is printed to a text file called demoOutput.txt

# Set up your parallel runs
SAMPLES_PER_TASK=5 # number of runs for each MPI task
N_NODES=2 # number of nodes
TASKS_PER_NODE=2 # number of tasks per node

TOTAL_TASKS=$(($N_NODES*$TASKS_PER_NODE)) # total number of tasks

# Submit the parallel job
#!/bin/bash
#SBATCH -n $(TOTAL_TASKS) -N $(N_NODES)
#SBATCH --time=0:01:00
#SBATCH --job-name=demoMPI4py
#SBATCH --output=output/demo.out
#SBATCH --error=output/demo.err
#SBATCH --exclusive
module load py3-mpi4py
module load py3-numpy
module load octave/6.3.0

mpirun -np $TOTAL_TASKS python3 callDemoScript.py $TOTAL_TASKS $SAMPLES_PER_TASK > demoOutput.txt

Additional resources

Putting some thought into how you design a set of parallel runs can save you a lot of time and headache. The example above has worked well for me when submitting sets of embarrassingly parallel tasks, but each application will be different, so take the time to find the procedure that works best for you. Our blog and the internet are full of resources that can help you parallelize your code, below are some suggestions:

Performing Experiments on HPC Systems

Scaling experiments: how to measure the performance of parallel code on HPC systems

Parallel processing with R on Windows

How to automate scripts on a cluster

Parallelization of C/C++ and Python on Clusters

Developing parallelised code with MPI for dummies, in C (Part 1/2)

Cornell CAC glossery on HPC terms: https://cvw.cac.cornell.edu/main/glossary

A great MPI tutorial I found online: https://mpitutorial.com/tutorials/

Make LaTeX easier with custom commands

LaTeX is a powerful tool for creating professional looking documents. Its ability to easily format mathematical equations, citations and complex figures makes LaTeX especially useful for developing peer-reviewed journal articles and scientific reports. LaTeX is highly customizable, which allows you to create documents that are not carbon copies of generic Microsoft Word templates.

Using LaTeX does have it’s drawbacks- instead of simply typing on a page, you construct the document by writing LaTeX code. Once you’ve written your code, a compiler translates it into a finished and formatted document. This can sometimes result in high overhead time for fixing bugs and managing format. But coding a document also has advantages, in addition to the vast array of existing LaTeX libraries and commands, you can create your own custom commands that speed up the writing and formatting process. Below I’ll overview the basics of creating custom LaTeX commands and provide some illustrative examples.

Commands with no arguments

If you have an equation or a complex sequence of text that you know you’ll be repeating, you can create a custom command to produce it. For example, if I’m constantly referencing the equation for an Ordinary Least Squares (OLS) estimator, I can make a new command that produces it:

\newcommand{\OLS}{$\hat{\beta}=(X^TX)^{-1}X^Ty$}

There are three parts to defining this command, as shown in the figure below:

  1. Tell LaTex you are defining a new command by specifying “\newcommand”
  2. name the command (make sure to include the backslash)
  3. Specify the output of the new command

Example LaTex code that calls the OLS command:

I can store complex terms using a predefined command: \OLS

Compiled output:

Commands with basic arguments

Single argument

You can also define commands that accept arguments. For example, if you want to make commands to assist tracking changes in a document, you can create a command that formats a section of added text so it has the color blue and is bolded:

\newcommand{\addtxt}[1]{{\color{blue} \textbf{#1}}} % Highlight text that has been added

The command defined above accepts one argument (shown as the “[1]”) and calls that argument using “#1”, as highlighted in the figure below:

Example Latex code using my command:

Demonstrating my custom commands with arguments:\addtxt{This text has be inserted into this sentence}

Example compiled output:

Multiple arguments

You can also define commands with multiple arguments, for example, you can create a template sentence that provides an update the timing of a project:

\newcommand{\projReportA}[2]{The project was planned to finish on \textbf{#1}after reviewing current progress we have determined that it will likely finish on \textbf{#2}} % insert a date for when a project was planned to be completed and when a project is likely to be completed

Here, argument #1 is the date when the project was planned on being completed, and argument #2 is the date that the project will likely be completed.

Example use of this function:

Another way you can use an argument:\\ \\
\projReportA{September $9^{th}$}{October $1^{st}$}

Example compiled output:

Commands with optional arguments

The project report command above can be modified to accept a default completion date with an option to include an updated date.

\newcommand{\progReportB}[2][September 9th]{The project was planned to finish on \textbf{#2}, after reviewing current progress, we have determined that it will likely finish on \textbf{#1}}

To create an optional argument, specify the default value of the first argument in a new set of brackets. Note that for basic Latex this only works for a single default argument, for more defaults you can use a package such as xparse.

Here’s an example using this new command with the default argument:

Here I'll will use the command without the optional argument, so it will print the default: \\
\\
\progReportB{September $9^{th}$}

This will compile to:

Here’s an example with the optional argument specified

Now I'll add the optional argument, which will be added in place of the default: \\ \\
\progReportB[October $1^{st}$]{September $9^{th}$}

This will compile to:

Concluding thoughts

These simple examples only scratch the surface of what you can do with LaTex commands. I should also note that while custom commands are useful, LaTex also contains a large suite of packages with predefined commands that can be easily imported into your document.

Helpful Latex resources:

Measuring the parallel performance of the Borg MOEA

In most applications, parallel computing is used to improve code efficiency and expand the scope of problems that can be addressed computationally (for background on parallelization, see references listed at the bottom of this post). For the Borg Many Objective Evolutionary Algorithm (MOEA) however, parallelization can also improve the quality and reliability of many objective search by enabling a multi-population search. The multi-population implementation of Borg is known as Multi-Master Borg, details can be found here. To measure the performance of Multi-Master Borg, we need to go beyond basic parallel efficiency (discussed in my last post, here), which measures the efficiency of computation but not the quality of the many objective search. In this post, I’ll discuss how we measure the performance of Multi-Master Borg using two metrics: hypervolume speedup and reliability.

Hypervolume speedup

In my last post, I discussed traditional parallel efficiency, which measures the improvement in speed and efficiency that can be achieved through parallelization. For many objective search, speed and efficiency of computation are important, but we are more interested in the speed and efficiency with which the algorithm produces high quality solutions. We often use the hypervolume metric to measure the quality of an approximation set as it captures both convergence and diversity (for a thorough explanation of hypervolume, see this post). Using hypervolume as a measure of search quality, we can then evaluate hypervolume speedup, defined as:

Hypervolume speedup = \frac{T_S^H}{T_P^H}

where T_S^H is the time it takes the serial version of the MOEA to achieve a given hypervolume threshold, and T_P^H is the time it takes the parallel implementation to reach the same threshold. Figure 1 below, adapted from Hadka and Reed, (2014), shows the hypervolume speedup across different parallel implementations of the Borg MOEA for the five objective NSGA II test problem run on 16,384 processors (in this work the parallel epsilon-NSGA II algorithm is used as a baseline rather than a serial implementation). Results from Figure 1 reveal that the Multi-Master implementations of Borg are able to reach each hypervolume threshold faster than the baseline algorithm and the master-worker implementation. For high hypervolume thresholds, the 16 and 32 Master implementations achieve the hypervolume thresholds 10 times faster than the baseline.

Figure 1: Hypervolume speedup for the five objective LRGV test problem across implementations of the Borg MOEA (epsilon NSGA-II, another algorithm) is used as the baseline here). This figure is adapted from Hadka and Reed, (2014).

Reliability

MOEAs are inherently stochastic algorithms, they are be initialized with random seeds which may speedup or slow down the efficiency of the search process. To ensure high quality Pareto approximate sets, it’s standard practice to run an MOEA across many random seeds and extract the best solutions across all seeds as the final solution set. Reliability is a measure of the probability that each seed will achieve a high quality set of solutions. Algorithms that have higher reliability allow users to run fewer random seeds which saves computational resources and speeds up the search process. Salazar et al., (2017) examined the performance of 17 configurations of Borg on the Lower Susquehanna River Basin (LSRB) for a fixed 10 hour runtime. Figure 2 shows the performance of each configuration across 50 random seeds. A configuration that is able to achieve the best hypervolume across all seeds would be represented as a blue bar that extends to the top of the plot. The algorithmic configurations are shown in the plot to the right. These results show that though configuration D, which has a high core count and low master count, achieves the best overall hypervolume, it does not do so reliably. Configuration H, which has two masters, is able to achieve nearly the same hypervolume, but has a much higher reliability. Configuration L, which has four masters, achieves a lower maximum hypervolume, but has vary little variance across random seeds.

Figure 2: Reliability of search adapted from Salazar et al., (2017). Each letter represents a different algorithmic configuration (shown right) for the many objective LSRB problem across 10 hours of runtime. The color represents the probability that each configuration was able to attain a given level of hypervolume across 50 seeds.

These results can be further examined by looking at the quality of search across its runtime. In Figure 3, Salazar et al. (2017) compare the performance of the three algorithmic configurations highlighted above (D, H and L). The hypervolume achieved by the maximum and minimum seeds are shown in the shaded areas, and the median hypervolume is shown with each line. Figure 3 clearly demonstrates how improved reliability can enhance search. Though the Multi-Master implementation is able to perform fewer function evaluations in the 10 hour runtime, it has very low variance across seeds. The Master-worker implementation on the other hand achieves better performance with it’s best seed (it gets lucky), but its median performance never achieves the hypervolume of the two or four master configurations.

Figure 3: Runtime hypervolume dynamics for the LSRB problem by Salazar et al., (2017). The reduction in variance in the Multi-Master implementations of Borg demonstrate the benefits of improved reliability.

Concluding thoughts

The two measures introduced above allow us to quantify the benefits of parallelizing the Multi-Master Borg MOEA. The improvements to search quality not only allow us to reduce the time and resources that we need to expend on many objective search, but may also allow us to discover solutions that would be missed by the serial or Master-Worker implementations of the algorithm. In many objective optimization contexts, this improvement may fundamentally alter our understanding of what is possible in a challenging environmental optimization problems.

Parallel computing resources

References

Hadka, D., & Reed, P. (2015). Large-scale parallelization of the Borg multiobjective evolutionary algorithm to enhance the management of complex environmental systems. Environmental Modelling & Software, 69, 353-369.

Salazar, J. Z., Reed, P. M., Quinn, J. D., Giuliani, M., & Castelletti, A. (2017). Balancing exploration, uncertainty and computational demands in many objective reservoir optimization. Advances in water resources, 109, 196-210.