Plotting change on maps

or how to replicate the New York Times presidential election shift map

This week’s blogpost is a visualization demo replicating a popular map from last year. The map below shows the shift in voter margin between the 2016 and 2020 Presidential Elections by the two major political parties in the United States. The direction and color of the arrows indicates the party and the length of the arrow indicates the shift. This type of figure can be useful in visualizing many types of spatially distributed changes (e.g. population change in a city, change in GDP per capita, losses and gains). This blogpost shows how to replicate it in Python using commonly used packages.

Screengrab of the original graphic from the NYT website. Original can be found here: https://www.nytimes.com/interactive/2020/11/03/us/elections/results-president.html

Even though the creators of the original provide their 2020 data, their 2016 data is not available so the data I’ll be using came from the MIT Election Data and Science Lab and can be downloaded here: https://doi.org/10.7910/DVN/VOQCHQ. All the code and data to replicate my figure can be found in this repository: https://github.com/antonia-had/election_data_shift

The main packages we’ll be using for this are cartopy and matplotlib to create the map and annotate elements on it, pandas for some simple data analysis and haversine to convert distances on the map (which you might not need if you’re applying the code to a small spatial scale).

First thing we do is load our packages and data. counties.csv contains the latitude and longitude for every country we’ll be plotting. countypres_2000-2020.csv contains our downloaded election data. As you can see in the code comments, I had to clean out some of the datapoints due to inconsistencies or errors. I’ll also only be plotting the contiguous US to simplify the exercise, but you can definitely include code to also plot Alaska and Hawaii in the same figure.

import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import pandas as pd
import cartopy.io.shapereader as shpreader
from haversine import inverse_haversine, Direction

# Read in county position data
pos_data = pd.read_csv('./data/counties.csv', delimiter=',', index_col=0)

# Read in county election data
# Data from https://doi.org/10.7910/DVN/VOQCHQ
# Data points without county FIPS code removed
all_election_data = pd.read_csv('./data/countypres_2000-2020.csv')
# Filter data to only keep years 2016 and 2020
# Dataset reports issues with Alaska data so filter those out too
# Missing data for 2020 for some counties
# County with FIPS code 46113 was assigned a new FIPS code (46102) which is changed in the downloaded data
mask = (all_election_data['year'] >= 2016) & \
       (all_election_data['state'] != 'ALASKA') &\
       (all_election_data['state'] != 'HAWAII') & \
       (all_election_data['county_fips'] != 11001) & \
       (all_election_data['county_fips'] != 51515) & \
       (all_election_data['county_fips'] != 36000)
election_data = all_election_data[mask]

Next we calculate the percentage of votes each party gained at each election and compare the results between the two elections to calculate their shift. A simplifying assumption here is that we’re only focussing on the top two parties (but you can do more with different color arrows for example). We’re also copying the latitude and longitude of each county so everything is in one dataframe.

# Calculate vote percentage per party
election_data['percentagevote'] = election_data['candidatevotes']/election_data['totalvotes'] * 100

# Create new dataframe to store county change results
shift = election_data[['state', 'county_name', 'county_fips']].copy()
# Drop duplicate rows (original dataframe was both 2016 and 2020)
shift = shift.drop_duplicates(['county_fips'])

# Create columns to store change for every party
shift['DEMOCRAT'] = 0.0
shift['REPUBLICAN'] = 0.0

#Create columns for latitude and longitude so everything is in the same dataframe
shift['lat'] = 0.0
shift['lon'] = 0.0

# Iterate through every county and estimate difference in vote share for two major parties
for index, row in shift.iterrows():
    county = row['county_fips']
    for party in ['DEMOCRAT', 'REPUBLICAN']:
        previous_result = election_data.loc[(election_data['year'] == 2016) &
                                            (election_data['county_fips'] == county) &
                                            (election_data['party'] == party)]['percentagevote'].values[0]
        new_result = election_data.loc[(election_data['year'] == 2020) &
                                       (election_data['county_fips'] == county) &
                                       (election_data['party'] == party)]['percentagevote'].values[0]
        # If any of the two results is nan assign zero change
        if pd.isna(new_result) or pd.isna(previous_result):
            shift.at[index, party] = 0
        else:
            shift.at[index, party] = new_result - previous_result
    # Combine lat and long values also so it's all in one dataframe
    shift.at[index, 'lat'] = pos_data.at[county, 'lat']
    shift.at[index, 'lon'] = pos_data.at[county, 'lon']

To create our map we do the following.

Set up matplotlib figure with the map extent of the contiguous United States and use cartopy geometries to add the shapes of all states.

fig = plt.figure(figsize=(12, 8))
ax = fig.add_subplot(1, 1, 1, projection=ccrs.LambertConformal(), frameon=False)
ax.set_extent([-120, -74, 24, 50], ccrs.PlateCarree())
# Add states shape
shapename = 'admin_1_states_provinces_lakes'
states_shp = shpreader.natural_earth(resolution='110m',
                                     category='cultural', name=shapename)
ax.add_geometries(shpreader.Reader(states_shp).geometries(), ccrs.PlateCarree(),
                  facecolor='#e5e5e5', edgecolor='white', zorder=0)

We then need to determine how the shift should be plotted in each county. A simplifying assumption here is that we’re showing the largest positive shift (i.e., if both parties lost votes we’re only showing a small grey point). There’s several ways to draw an arrow at each point, depending on what you’d like to show and the complexity you’re comfortable with. The way I am showing here is exploiting the matplotlib annotate function, typically used to annotate a figure with text and arrows.

The way I’m going about this is a little mischievous but works: I’m only using the arrow component of it with a blank text annotation and identify a point where each arrow should be pointing to by using each county’s lat and long and the estimated shift. If this was a simple matplotlib figure using cartesian coordinates, calculating the end point would be simple trigonometry. Since latitude and longitude are not on a cartesian plane, we need to convert them using the haversine formula (or its inverse). It’s fairly easy to implement yourself but since there already exceeds a handy python package for it, I’m using that instead. The transform function I am using up top is necessary for matplotlib to know how to transform the points from the annotation function (typically not necessary to do if using, say, ax.scatter()), some explanation of why that is can be found here. The colors and all other customization is done so the figure looks as close as possible to the original.

transform = ccrs.PlateCarree()._as_mpl_transform(ax)
for index, row in shift.iterrows():
    # Determine arrow color
    dem_shift = shift.at[index, 'DEMOCRAT']
    rep_shift = shift.at[index, 'REPUBLICAN']
    # Check if both lost votes, then set arrow to grey
    if dem_shift<0 and rep_shift<0:
        arrow_color = 'grey'
        ax.scatter(shift.at[index, 'lon'], shift.at[index, 'lat'],
                   color=arrow_color, transform=ccrs.PlateCarree(),
                   s=0.5)
    # If at least one of them gained votes
    else:
        if dem_shift >= rep_shift:
            arrow_color = '#1460a8'
            direction = Direction.NORTHWEST
            change = dem_shift
        else:
            arrow_color = '#bb1d2a'
            direction = Direction.NORTHEAST
            change = rep_shift
        end_location = inverse_haversine((shift.at[index, 'lat'], shift.at[index, 'lon']), change*25, direction)[::-1]
        ax.annotate(" ", xytext=(shift.at[index, 'lon'], shift.at[index, 'lat']), xy=end_location,
                    arrowprops=dict(facecolor=arrow_color, edgecolor=arrow_color,
                                    width=0.2, headwidth=3, headlength=5),
                    xycoords=transform, zorder=1)
plt.tight_layout()
plt.savefig('electionshiftmap.png', dpi=300)

The resulting figure looks like this, which I am calling pretty close, considering the dataset differences. Tinkering with colors, widths, lengths and transforms can get you a different look if you’re after that.

Pam agrees:

Easy batch parallelization of code in any language using mpi4py

The simplest form of parallel computing is what’s known as “embarrassingly” parallel processes. These processes involve fully independent runs of a model or script where little or no communication is needed across parallel processes. A common example is Monte Carlo evaluation, when we run a model over an ensemble of inputs. To parallelize an embarrassingly parallel application we simply need to send a set of commands to the cluster telling it to run each sample on a different core (or set of cores). For small applications, this can be done by submitting each run individually. For larger applications, SLURM Job Arrays (which are nicely detailed in Antonia’s post, here) can efficiently batch large number of function calls to independent computing cores. While this method is efficient and effective, I find it sometimes can be hard to keep track of, as you may be submitting tens or hundreds of jobs at a time. An alternative approach to submitting embarrassingly parallel tasks is to utilize MPI with Python to dispatch and organize jobs.

I like the MPI / Python combo because it consolidates all parallel applications into a single job, meaning you have one job to keep track of on a cluster at a time, and one output file generated by the batch set of model runs. I also find Python slightly easier to edit and debug than Bash scripts (which are used to create job arrays). Additionally, it’s very easy to assign each computing core a set of function evaluations to run (this can also be done with Job arrays, but again, I find Python easier to work with). Though Python is the language used to coordinate parallel tasks, we can use it to parallelize code in any language, as I’ll demonstrate below.

In this post I’ll first provide some background on MPI and its Python implementation, mpi4py. Next I’ll provide an example I’ve developed to demonstrate how to batch run a Matlab code on a cluster. The examples presented here are derived from some of Bernardo’s code in his post on Parallel programming in C/C++, which you can find here.

A very light introduction to MPI

MPI stands for “Message Passing Interface” and is the standard library for distributed memory parallelization (for background, see this post). To understand how MPI works, it’s helpful to define some of it’s basic components.

  1. Tasks: I’ll use the term task to define a processor (or group of processors) assigned to perform a specific set of instructions. These instructions may by a single evaluation of a function, or a set of function evaluations
  2. Communicators: A communicator is a group of MPI task units that are permitted to communicate with each other. In advanced MPI applications you may have multiple communicators, but for embarrassingly parallel applications we’ll only use one. The default communicator is called “MPI_COMM_WORLD” (I don’t know why, if anyone does please feel free to share in the comments), and that’s what I’ll work with here.
  3. Ranks: Each MPI task is assigned a unique identifier within the communicator called a rank. The processors running each task can access their own rank number, which will play an important role in how we use MPI for embarrassingly parallel applications.

A example schematic of the MPI_COMM_WORLD communicator with six tasks and their associated ranks is shown below.

mpi4py

MPI is implemented in Python with the mpi4py library. When we run an MPI code on a cluster, MPI creates the communicator and assigns each task a rank, then each task unit independently load the script. The processor/s associated with a task can then access their own unique rank.

The following snip of code loads this library, accesses the communicator and stores the rank of the given process:

# load the mpi4py library
from mpi4py import MPI

# access the MPI COMM WORLD communicator and assign it to a variable
comm = MPI.COMM_WORLD

# get the rank of the current process (different for each process on the cluster)
rank = comm.Get_rank()

Example of using mpi4py to batch parallel jobs

Here, I’ll parallelize the submission of a Matlab script called demoScript.m. This script reads an input file from a specific file location and prints out the contents of that file. For example purposes I’ve created 20 input files, each in their own folders. The folders are called “input_sample_0”, “input_sample_1” etc.. Each input_sample folder contains a file called “sample_data.txt”, which contains one line of text reading: “This is data for run <sample_number>”.

All code for this example can be found on Github, here: https://github.com/davidfgold/mpi4py_blog.git

Batching runs of demoScript.m process involves three components:

  1. Write demoScript.m so that it reads the sample number from the input.
  2. Write a Python script that will use mpi4py to distribute calls of demoScript.m. Here I’ll call this script “callDemoScript.py”
  3. Write a Bash script that sets up your MPI run and calls the Python function. Here I’ll call this script “submitDemoScript.sh”

1. demoScript.m

The demo Matlab script is found below. It reads in two arguments that are called from the command line. The first argument is the rank, which will vary for each task, and the second is the sample number, which will specify which input folder to read from.

%%%%%%%%%%%%%%%%%%%%
% demoScript.m
%
% reads an input file from a given sample number (specified via command line)
% prints output from the sample file associated with the sample number
% also prints the rank for demonstration purposes
%%%%%%%%%%%%%%%%%%%%

% read in command line input
arg_list = argv();
rank = arg_list{1,1}; % rank is the first argument
sample = arg_list{2, 1}; % sample number is the second argument

% Create a string that contains the location of the proper sample directory
sample_out = fileread(strcat("input_sample_", sample, "/sample_data.txt"));

% create a string to print the rank number
rank_call = strcat("This is rank_", rank, ", recieving the following input: \n");

% format the output and print
output = strcat(rank_call, sample_out);
fprintf(output)

2. callDemoScript.py

The second component is a Python script that uses mpi4py to call demoScript.m many times across different tasks. Each task will run a number of samples equal to a variable called “N_SAMPLES_PER_TASK” which will be fed to this script when it is called.

'''
callDemoScript.py

Called to batch demoScript.m across multiple MPI tasks

Reads in the total tasks and number of samples per task from command line.
'''
# load necessary libraries
from mpi4py import MPI
import numpy as np
import sys
import os
import time

# locate the COMM WORLD communicator
comm = MPI.COMM_WORLD

# get the number of the current rank
rank = comm.Get_rank()

# read in arguments from the submission script
TOTAL_TASKS = int(sys.argv[1]) # number of MPI processes
N_SAMPLES_PER_TASK = int(sys.argv[2]) # number of runs per/task

# loop through samples assigned to current rank
for i in range(N_SAMPLES_PER_TASK):
	sample= rank + TOTAL_TASKS * i

	# write the command that will be sent to the terminal (here RUN will replace the {})
	terminal_command = "octave-cli ./demoScript.m {} {} ".format(rank, sample)

	# write the terminal command to the process
	os.system(terminal_command)

	# sleep before submitting the next command
	time.sleep(1) # optional, for memory intensive submissions

comm.Barrier()

submitDemoscript.sh

The final component is a Bash script that will send this MPI job to the cluster. Here I’ll use SLURM to create 4 MPI tasks across 2 Nodes (each node will have 2 associated task). This will create a total of 4 MPI tasks, and each task will be assigned 5 samples to run.

I wrote this for a local cluster at Cornell, note that I had to load two modules to run Python and a third to run Octave (which is used to call Matlab scripts on Linux). I’ll call the Python script with mpirun, and then specify the total number of MPI tasks before making the function call. The output of the script is printed to a text file called demoOutput.txt

# Set up your parallel runs
SAMPLES_PER_TASK=5 # number of runs for each MPI task
N_NODES=2 # number of nodes
TASKS_PER_NODE=2 # number of tasks per node

TOTAL_TASKS=$(($N_NODES*$TASKS_PER_NODE)) # total number of tasks

# Submit the parallel job
#!/bin/bash
#SBATCH -n $(TOTAL_TASKS) -N $(N_NODES)
#SBATCH --time=0:01:00
#SBATCH --job-name=demoMPI4py
#SBATCH --output=output/demo.out
#SBATCH --error=output/demo.err
#SBATCH --exclusive
module load py3-mpi4py
module load py3-numpy
module load octave/6.3.0

mpirun -np $TOTAL_TASKS python3 callDemoScript.py $TOTAL_TASKS $SAMPLES_PER_TASK > demoOutput.txt

Additional resources

Putting some thought into how you design a set of parallel runs can save you a lot of time and headache. The example above has worked well for me when submitting sets of embarrassingly parallel tasks, but each application will be different, so take the time to find the procedure that works best for you. Our blog and the internet are full of resources that can help you parallelize your code, below are some suggestions:

Performing Experiments on HPC Systems

Scaling experiments: how to measure the performance of parallel code on HPC systems

Parallel processing with R on Windows

How to automate scripts on a cluster

Parallelization of C/C++ and Python on Clusters

Developing parallelised code with MPI for dummies, in C (Part 1/2)

Cornell CAC glossery on HPC terms: https://cvw.cac.cornell.edu/main/glossary

A great MPI tutorial I found online: https://mpitutorial.com/tutorials/

MORDM Basics VI: Processing the output and reevaluating for robustness

In the previous post, we conducted a basic WaterPaths tutorial in which we ran a simulation-optimization of the North Carolina Research Triangle test case (Trindade et al, 2019); across 1000 possible futures, or states of the world (SOWs). To briefly recap, the Research Triangle test case consists of three water utilities in Cary (C), Durham (D) and Raleigh (R). Cary is the main supplier, having a water treatment plant of its own. Durham and Raleigh purchase water from Cary via treated transfers.

Having obtained the .set file containing the Pareto-optimal decision variables and their respective performance objective values, we will now walk through the .set file processing and visualize the decision variables and performance objective space.

Understanding the .set file

First, it is important that the .set file makes sense. The NC_output_MS_S3_N1000.set file should have between 30-80 rows, and a total of 35 columns. The first 20 columns contain values of the decision variables. Note that only Durham and Raleigh have water transfer triggers as they purchase treated water from Cary.

  1. Restriction trigger, RT (C, D, R)
  2. Transfer trigger, TT (D, R)
  3. Jordan Lake allocation, JLA (C, D, R)
  4. Reserve fund contribution as a percentage of total annual revenue, RC (C, D, R)
  5. Insurance trigger, IT (C, D, R)
  6. Insurance payments as a percentage of total annual revenue, IP (C, D, R)
  7. Infrastructure trigger, INF (C, D, R)

The last 15 columns contain the objective values for the following performance objectives of all three utilities:

  1. Reliability (REL) to be maximized
  2. Restriction frequency (RF) to be minimized
  3. Infrastructure net present cost (INF_NPC) to be minimized
  4. Peak financial cost of drought mitigation actions (PFC) to be minimized
  5. Worst-case financial cost of drought mitigation actions (WCC) to be minimized

This reference set needs to be processed to output a .csv file to enable reevaluation for robustness analysis. To do so, run the post_processing.py file found in this GitHub repository in the command line:

python post_processing.py

In addition to post-processing the optimization output files, this file also conduct regional minimax operation, where each regional performance objective is taken to be the objective value of the worst-performing utility (Gold et al, 2019).

This should output two files:

  1. NC_refset.csv No header row. This is the file that will be used to run the re-evaluation for robustness analysis in the next blog post.
  2. NC_dvs_objs.csv Contains a header row. This file that contains the labeled decision variables and objectives, including the minimax regional performance objectives. Will be used for visualizing the reference set’s decision variables and performance objectives.

Visualizing the reference set

Due to the higher number of decision variables, we utilize parallel axis plots to identify how varying the decision variables can potentially affect certain performance objectives. Here, we use the regional reliability performance objective, REL, as an example. Figure 1 below demonstrates how all decision variables vary with regional reliability.

Figure 1: All decision variables for the three utilities. A darker blue indicates a higher degree of reliability.

From Figure 1, most solutions found via the optimization conducted in the previous blog post seem to have relatively high reliability across the full range of decision variable values. It is unclear as to how each decision variable might affect regional reliability. It is thus more helpful to identify specific sets of decision variables (or policies) that enable the achievement of reliability beyond a certain threshold.

With this in mind, we assume that all members of the Triangle require that their collective reliability be at least 98%. This results in the following figure:

Figure 2: All decision variables across the three utilities. The dark lines represent the policies that are at least 98% reliable.

Figure 2 has one clear takeaway: Pareto-optimality does not indicate satisfactory performance. In addition, setting this threshold make the effects of each decision variable clearer. It can be seen that regional reliability is significantly affected by both Durham and Raleigh’s infrastructure trigger (INF). Desirable levels of regional reliability can be achieved when Durham sets a high INF value. Conversely, Raleigh can set lower INF values to benefit from satisfactory reliability. Figure 2 also shows having Durham set a high insurance trigger (IT) may benefit the regional in terms of reliability.

However, there are caveats. Higher INF and IT values for Durham implies that the financial burden of investment and insurance payments are being borne by Raleigh and Cary, as Durham is able to withstand more risk without having to trigger an investment or infrastructure payment. This may affect how each member utility perceives their individual risks and benefits by forming a cooperative contract.

The code to plot these figures can be found under ‘refset_parallel.py’ in the repository.

Robustness analysis and what’s next

So how is setting a threshold value of regional reliability significant?

Within the MORDM framework, robustness is defined using a multivariate satisficing metric (Gold et al, 2019). Depending on the requirements of the stakeholders, a set of criteria is defined that will then be used distinguish between success (all criteria are met) and failure (at least one criteria is not met). Using these criteria, the rest of Pareto-optimal policies are simulated across a number of uncertain SOWs. Each policy’s robustness is then represented by the percent of SOWs in which it meets the minimum performance criteria that has been set.

In this post, we processed the reference set and visualized its decision variable space with respect to each variable’s effect on the reliability performance objective. A similar process can be repeated across all utilities for all performance objectives.

Using the processed reference set, we will conduct multi-criterion robustness analysis using two criteria:

  1. Regional reliability should be at least 98%
  2. Regional restriction frequency should be less than or equal to 20%

We will also conduct sensitivity analysis to identify the the decision variables that most impact regional robustness. Finally, we will conduct scenario discovery to identify SOWs that may cause the policies to fail.

References

Gold, D. F., Reed, P. M., Trindade, B. C., & Characklis, G. W. (2019). Identifying actionable compromises: Navigating multi‐city robustness conflicts to discover cooperative safe operating spaces for regional water supply portfolios. Water Resources Research, 55(11), 9024–9050. https://doi.org/10.1029/2019wr025462

Trindade, B. C., Reed, P. M., & Characklis, G. W. (2019). DEEPLY UNCERTAIN PATHWAYS: Integrated multi-city Regional Water Supply Infrastructure Investment and portfolio management. Advances in Water Resources, 134, 103442. https://doi.org/10.1016/j.advwatres.2019.103442

A non-intimidating introduction to parallel computing with Numba

This blog post is adapted from material I learned during the 2021 San Diego Supercomputer Center (SDSC) Summer Institute. This was an introductory boot camp to high-performance computing (HPC), and one of the modules taught the application of Numba for in-line parallelization and speeding up of Python code.

What is Numba?

According to its official web page, Numba is a just-in-time (JIT) compiler that translates subsets of Python and NumPy code into fast machine code, enabling it to run at speeds approaching that of C or Fortran. This is becuase JIT compilation enables specific lines of code to be compiled or activated only when necessary. Numba also makes use of cache memory to generate and store the compiled version of all data types entered to a specific function, which eliminates the need for recompilation every time the same data type is called when a function is run.

This blog post will demonstrate a simple examples of using Numba and its most commonly-used decorator, @jit, via Jupyter Notebook. The Binder file containing all the executable code can be found here.

Note: The ‘@‘ flag is used to indicate the use of a decorator

Installing Numba and Setting up the Jupyter Notebook

First, in your command prompt, enter:

pip install numba

Alternatively, you can also use:

conda install numba

Next, import Numba:

import numpy as np
import numba
from numba import jit
from numba import vectorize

Great! Now let’s move onto using the @jit decorator.

Using @jit for executing functions on the CPU

The @jit decorator works best on numerical functions that use NumPy. It has two modes: nopython mode and object mode. Setting nopython=True tell the compiler to overlook the involvement of the Python interpreter when running the entire decorated function. This setting leads to the best performance. However, in the case when:

  1. nopython=True fails
  2. nopython=False, or
  3. nopython is not set at all

the compiler defaults to object mode. Then, Numba will manually identify loops that it can compile into functions to be run in machine code, and will run the remaining code in the interpreter.

Here, @jit is demonstrated on a simple matrix multiplication function:

# a function that does multiple matrix multiplication
@jit(nopython=True)
def matrix_multiplication(A, x):
    b = np.empty(shape=(x.shape[0],1), dtype=np.float64)
    for i in range(x.shape[0]):
        b[i] = np.dot(A[i,:], x)
    return b

Remember – the use of @jit means that this function has not been compiled yet! Compilation only happens when you call the function:

A = np.random.rand(10, 10)
x = np.random.rand(10, 1)
a_complicated_function(A,x)

But how much faster is Numba really? To find out, some benchmarking is in order. Jupyter Notebook has a handy function called %timeit that runs simple functions many times in a loop to get their average execution time, that can be used as follows:

%timeit matrix_multiplication(A,x)

# 11.4 µs ± 7.34 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

Numba has a special .py_func attribute that effectively allows the decorated function to run as the original uncompiled Python function. Using this to compare its runtime to that of the decorated version,

%timeit matrix_multiplication.py_func(A,x)

# 35.5 µs ± 3.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

From here, you can see that the Numba version runs about 3 times faster than using only NumPy arrays. In addition to this, Numba also supports tuples, integers, floats, and Python lists. All other Python features supported by Numba can be found here.

Besides explicitly declaring @jit at the start of a function, Numba makes it simple to turn a NumPy function into a Numba function by attaching jit(nopython=True) to the original function. This essentially uses the @jit decorator as a function. The function to calculate absolute percentage relative error demonstrates how this is done:

# Calculate percentage relative error
def numpy_re(x, true):
    return np.abs(((x - true)/true))*100

numba_re = jit(nopython=True)(numpy_re)

And we can see how the Number version is faster:

%timeit numpy_re(x, 0.66)
%timeit numba_re(x, 0.66)

where the NumPy version takes approximately 2.61 microseconds to run, while the Numba version takes 687 nanoseconds.

Inline parallelization with Numba

The @jit decorator can also be used to enable inline parallelization by setting its parallelization pass parallel=True. Parallelization in Numba is done via multi-threading, which essentially creates threads of code that are distributed over all the available CPU cores. An example of this can be seen in the code snippet below, describing a function that calculates the normal distribution of a set of data with a given mean and standard deviation:

SQRT_2PI = np.sqrt(2 * np.pi)

@jit(nopython=True, parallel=True)
def normals(x, means, sds):
    n = means.shape[0]
    result = np.exp(-0.5*((x - means)/sds)**2)
    return (1 / (sds * np.sqrt(2*np.pi))) * result

As usual, the function must be compiled:

means = np.random.uniform(-1,1, size=10**8)
sds = np.random.uniform(0.1, 0.2, size=10**8)

normals(0.6, means, sds)

To appreciate the speed-up that Numba’s multi-threading provides, compare the runtime for this with:

  1. A decorated version of the function with a disabled parallel pass
  2. The uncompiled, original NumPy function

The first example can be timed by:

normals_deco_nothread = jit(nopython=True)(normals.py_func)
%timeit normals_deco_nothread(0.6, means, sds)

# 3.24 s ± 757 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The first line of the code snippet first makes an uncompiled copy of the normals function, and then applies the @jit decorator to it. This effectively creates a version of normals that uses @jit, but is not multi-threaded. This run of the function took approximately 3.3 seconds.

For the second example, simply:

%timeit normals.py_func(0.6, means, sds)

# 7.38 s ± 759 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Now, compare both these examples to the runtime of the decorated and multi-threaded normals function:

%timeit normals(0.6, means, sds)

# 933 ms ± 155 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The decorated, multi-threaded function is significantly faster (933 ms) than the decorated function without multi-threading (3.24 s), which in turn is faster than the uncompiled original NumPy function (7.38 s). However, the degree of speed-up may vary depending on the number of CPUs that the machine has available.

Summary

In general, the improvements achieved by using Numba on top of NumPy functions are marginal for simple, few-loop functions. Nevertheless, Numba is particularly useful for large datasets or high-dimensional arrays that require a large number of loops, and would benefit from the one-and-done compilation that it enables. For more information on using Numba, please refer to its official web page.

Introduction to PyBorg – basic setup and running

PyBorg is a new secondary implementation of Borg, written entirely in Python using the Platypus optimization library. PyBorg was developed by Andrew Dircks based on the original implementation in C and it is intended primarily as a learning tool as it is less efficient than the original C version (which you can still use with Python but through the use of the plugin “wrapper” also found in the package). PyBorg can be found in the same repository where the original Borg can be downloaded, for which you can request access here: http://borgmoea.org/#contact

This blogpost is intended to demonstrate this new implementation. To follow along, first you need to either clone or download the BitBucket repository after you gain access.

Setting up the required packages is easy. In your terminal, navigate to the Python directory in the repository and install all prerequisites using python setup.py install. This will install all requirements (i.e. the Platypus library, numpy, scipy and six) for you in your current environment.

You can test that everything works fine by running the optimization on the DTLZ2 test function, found in dtlz2.py. The script creates an instance of the problem (as it is already defined in the Platypus library), sets it up as a ploblem for Borg to optimize and runs the algorithm for 10,000 function evaluations:

    # define a DTLZ2 problem instance from the Platypus library
    nobjs = 3
    problem = DTLZ2(nobjs)

    # define and run the Borg algorithm for 10000 evaluations
    algorithm = BorgMOEA(problem, epsilons=0.1)
    algorithm.run(10000)

A handy 3D scatter plot is also generated to show the optimization results.

The repository also comes with two other scripts dtlz2_runtime.py and dtlz2_advanced.py.
The first demonstrates how to use the Platypus hypervolume indicator at a specified runtime frequency to get learn about its progress as the algorithm goes through function evaluations:

The latter provides more advanced functionality that allows you define custom parameters for Borg. It also includes a function to generate runtime data from the run. Both scripts are useful to diagnose how your algorithm is performing on any given problem.

The rest of this post is a demo of how you can use PyBorg with your own Python model and all of the above. I’ll be using a model I’ve used before, which can be found here, and I’ll formulate it so it only uses the first three objectives for the purposes of demonstration.

The first thing you need to do to optimize your problem is to define it. This is done very simply in the exact same way you’d do it on Project Platypus, using the Problem class:

from fishery import fish_game
from platypus import Problem, Real
from pyborg import BorgMOEA

# define a problem
nVars = 6
nObjs = 3 

problem = Problem(nVars, nObjs) # first input is no of decision variables, second input is no of objectives
problem.types[:] = Real(0, 1) #defines the type and bounds of each decision variable
problem.function = fish_game #defines the model function

This assumes that all decision variables are of the same type and range, but you can also define them individually using, e.g., problem.types[0].

Then you define the problem for the algorithm and set the number of function evaluations:

algorithm = BorgMOEA(problem, epsilons=0.001) #epsilons for each objective
algorithm.run(10000) # number of function evaluations

If you’d like to also produce a runtime file you can use the detailed_run function included in the demo (in the files referenced above), which wraps the algorithm and runs it in intervals so the progress can be monitored. You can combine it with runtime_hypervolume to also track your hypervolume indicator. To use it you need to define the total number of function evaluations, the frequency with which you’d like the progress to be monitored and the name of the output file. If you’d like to calculate the Hypervolume (you first need to import it from platypus) you also need to either provide a known reference set or define maximum and minimum values for your solutions.

maxevals = 10000
frequency = 100
output = "fishery.data"
hv = Hypervolume(minimum=[-6000, 0, 0], maximum=[0, 1, 100])

nfe, hyp = detailed_run(algorithm, maxevals, frequency, output, hv)

My full script can be found below. The detailed_run function is an edited version of the default that comes in the demo to also include the hypervolume calculation.

from fishery import fish_game
from platypus import Problem, Real, Hypervolume
from pyborg import BorgMOEA
from runtime_diagnostics import detailed_run

# define a problem
nVars = 6 # no. of decision variables to be optimized
nObjs = 3

problem = Problem(nVars, nObjs) # first input is no of decision variables, second input is no of objectives
problem.types[:] = Real(0, 1)
problem.function = fish_game

# define and run the Borg algorithm for 10000 evaluations
algorithm = BorgMOEA(problem, epsilons=0.001)
#algorithm.run(10000)

# define detailed_run parameters
maxevals = 10000
frequency = 100
output = "fishery.data"
hv = Hypervolume(minimum=[-6000, 0, 0], maximum=[0, 1, 100])

nfe, hyp = detailed_run(algorithm, maxevals, frequency, output, hv)

# plot the results using matplotlib
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter([s.objectives[0] for s in algorithm.result],
           [s.objectives[1] for s in algorithm.result],
           [s.objectives[2] for s in algorithm.result])
ax.set_xlabel('Objective 1')
ax.set_ylabel('Objective 2')
ax.set_zlabel('Objective 3')
ax.scatter(-6000, 0, 0, marker="*", c='orange', s=50)
plt.show()

plt.plot(nfe, hyp)
plt.title('PyBorg Runtime Hypervolume Fish game')
plt.xlabel('Number of Function Evaluations')
plt.ylabel('Hypervolume')
plt.show()

It produces the following two figures:

MORDM Basics IV: Visualizing ROF-Storage Dynamics (finally)

The previous post described a simple, two-objective test case in which the city of Cary employed risk-of-failure (ROF) triggers as levers to adjust for its preferred tradeoff level between its objectives. The example given showed how ROF triggers allowed Cary to account for future uncertainty in its system inputs, thus enabling it to visualize how their risk appetite would affect their desired outcomes.

In meeting these objectives, different risk thresholds would have affected Cary’s response to extreme events such as floods and droughts, and its ability to fulfill demand. Simply analyzing the tradeoffs between objectives that result from a range of ROF trigger values only presents one side of the story. It is vital to visualize how these performance objectives and tradeoffs manifest in the system’s capacity (pun intended) to store enough water in times of scarcity, and by extension, its ability to fulfill its customers’ demand for water.

Using ROFs allow us to more concretely measure how the dynamics of both storage and demand fulfillment evolve and change over time for a given risk tolerance. In the long term, these dynamics will influence when and where new water infrastructure is built to cope with storage requirements and demand growth, but this is a topic for a future blog post. This week, we will focus on unpacking the dynamic evolution of storage and demand in response to different ROF trigger values.

As a quick refresher, our system is a water supply utility located in Cary, which is a city within the Research Triangle region in North Carolina (Trindade et al, 2017). Cary uses water-use restrictions when a weekly ROF exceeds the threshold of risk that Cary is willing to tolerate (α) during which only 50% of demand is met. More frequent water use restrictions help to maintain the reservoir levels and ensure reliability, which was defined in the previous blog post. However, the decision to implement restrictions (or not) will impact the storage levels of the system. With this in mind, we will first examine storage responds to the triggering of a water use restriction. For context, we consider a scenario in which Cary’s inflow timeseries is only 20% of the original levels. Figure 1 below shows the inflow, demand and storage timeseries for this scenario.

Figure 1: The hydrologic timeseries for Cary given that no water restrictions are implemented in a scenario where inflows are 20% of the original levels.

Cary’s challenge becomes apparent in Figure 1. While inflow decreases over time (fewer peaks), demand is steadily growing and has effectively tripled by the end of the period. This results in periods during which storage levels drop to zero, which occurs once past 2040. Also note that the frequency of low storage peaks have increased in the second half of the period. The following questions can thus be posed:

  1. How does the system’s ROF change with increasing demand and decreasing supply?
  2. How does risk tolerance affect the implementation of water-use restrictions during drought?
  3. How will the system reservoir levels respond to different levels of risk tolerance?
Figure 2: The length of the pink bars denote the nth-week during which the first water use restriction was implemented for a given α-value. This is an indicator of the responsiveness of the system to a drought, or decrease in storage levels. The blue line indicates the percent of storage filled with water.

To answer the first question, it is useful to identify how different values of α affect the first instance of a water-use restriction. Figure 2, generated using ‘rof_dynamics.py‘, demonstrates that lower risk tolerances result in earlier implementations of restrictions. This is reasonable, as an actor who more risk-averse will quickly implement water-use restrictions to maintain reliable levels of storage during a drought. However, an actor who is more willing to tolerate the change of low reservoir levels will delay implementing water use restrictions. The blue line juxtaposed on top of the bars indicates the inflows to the reservoir. After the first period of low flows between weeks 30-40, the plot shows that the amount of inflows do not recover, and is likely insufficient to fill the reservoir to initial levels. With a lower α, an actor is more likely to implement restrictions almost immediately after observing merely a few weeks of low inflows. In contrast, an actor who opts for a higher α will only resort to restrictions after seeing an extended period of low flows during which they can be more certain that restrictions are absolutely necessary.

Answering the second and third questions first require that periods of drought are more definitively quantified. To do this, the standardized streamflow indicator (SSI6) was used. The SSI6 is a method that identifies time periods during which the standardized inflow is less than the 6-month rolling mean (Herman et al, 2016). It detects a drought period when the value of the SSI6 < 0 for three consecutive months and SSI6 < -1 at least once during the three-month period. The juxtaposition of storage-restrictions and the periods of drought will allow us to see where restrictions were imposed and its impacts on reservoir levels for a given demand timeseries.

Figure 3 and Figure 4 are a visualization of how the system’s storage levels responds to drought (the red bars in the lower subplot) by implementing water-use restrictions (the dark red lines in the upper subplot) given α = 1% and α = 15% respectively. Predictably, restrictions coincide with periods of drought as defined by the SSI6. However, with a lower risk tolerance, period of restrictions are longer and more frequent. As Figure 3 shows, an actor with a lower risk tolerance may implement restrictions where only a slight risk of failure exists.

Figure 3: Storage dynamics given α=1%. (Upper subplot) The blue lines indicate the reservoir storage levels in billion gallons per week. The yellow lines are the weekly ROF values, or the likelihood that the percent of water stored will drop below 20% of the reservoir levels. The grey lines indicate where water use restrictions are implemented, and the red dashed line denotes α=2%. (Lower subplot) The zones are where droughts were detected using the SSI6 (Herman et al, 2016) method are highlighted in red.

Compared to α = 1%, an actor who is willing to tolerate higher ROF values (Figure 4 as an example) will implement restrictions less frequently and for shorter periods of time. Although this means that demands are less likely to get disrupted, this also puts water supplies at a higher risk of dropping to critical levels (<20%), as restrictions may not get implemented even during times of drought.

Figure 4: Storage dynamics given α=15%. (Upper subplot) The blue lines indicate the reservoir storage levels in billion gallons per week. The yellow lines are the weekly ROF values, or the likelihood that the percent of water stored will drop below 20% of the reservoir levels. The grey lines indicate where water use restrictions are implemented, and the red dashed line denotes α=15%. (Lower subplot) The zones are where droughts were detected using the SSI6 (Herman et al, 2016) method are highlighted in red.

There is one important thing to note when comparing Figures 3 and 4. When the periods water use restrictions coincide for both α-values (between 2040 and 2050), the actor with a lower tolerance implements water use restrictions at the beginning of both drought periods. This decision makes the biggest difference in terms of the reservoir storage levels. By implementing water use restrictions early and for a longer period of time, Cary’s reservoir levels are consistently kept at levels above 50% of full capacity (given full capacity of 7.45 BG). A different actor with higher risk tolerance resulted in water levels that dropped below the 30% of full capacity during periods of drought.

Although this seems undesirable, recall that the system is said to have failed if the capacity drops below 20% of full capacity. Herein lies the power of using an ROF metric – questions 2 and 3 can be answered by generating storage-restriction response figures as shown in the above figures, which allows an actor to examine the consequences of being varying levels of risk tolerance on their ability to fulfill demand while maintaining sufficient water levels. This ability can improve judgement on how much risk a utility can actually tolerate without adversely impacting the socioeconomic aspects of the systems dependent on a water supply utility. In addition, using ROFs enable a utility to better estimate when new infrastructure really needs to be built, instead of making premature investments as a result of unwarranted risk aversion.

To briefly summarize this blog post, we have shown how different risk tolerance levels affect the decisions made by an actor, and how these decisions in turn impact the system. Not shown here is the ability of an ROF to evolve over time given climate change and the building of new water supply infrastructure. In the next blog post, we will briefly discuss the role of ROFs in mapping out adaptation pathways for a utility, how ROFs form the basis of a dynamic and adaptive pathway and their associated operation policies, and connect this to the concept of the soft path (Gleick, 2002) in water supply management.

References

Gleick, P., 2002. Water management: Soft water paths. Nature, 418(6896), pp.373-373.

Herman, J., Zeff, H., Lamontagne, J., Reed, P. and Characklis, G., 2016. Synthetic Drought Scenario Generation to Support Bottom-Up Water Supply Vulnerability Assessments. Journal of Water Resources Planning and Management, 142(11), p.04016050.

Trindade, B., Reed, P., Herman, J., Zeff, H. and Characklis, G., 2017. Reducing regional drought vulnerabilities and multi-city robustness conflicts using many-objective optimization under deep uncertainty. Advances in Water Resources, 104, pp.195-209.

MORDM Basics III: ROF Triggers and Performance Objective Tradeoffs

We recently covered an introduction to the concept of risk of failure (ROF), ROF triggers and ROF table generation. To provide a brief recap, an ROF is the probability that the system will fail to meet its performance objectives, whereas an ROF trigger is a measure of the amount of risk that a stakeholder is willing to take before initiating mitigating or preventive action. We also discussed the computational drawbacks of iteratively evaluating the ROF for each hydrologic scenario, and generated ROF tables as a way to circumvent those drawbacks.

In this post, we will explore the use of ROF metrics and triggers as levers to adjust for preferred levels of tradeoffs between two tradeoffs. Once again, we will revisit Cary, a city located in the Research Triangle region of North Carolina whose stakeholders would like to develop a robust water management policy.

To clarify, we will be generating ROF metrics while evaluating the performance objectives and will not be using the ROF tables generated in the previous blog post. Hence, as stated Bernardo’s blog post, we will begin generating ROF metrics using data from the weeks immediately prior to the current week. This is because performance metrics reflect current (instead of historical) hydrologic dynamics. To use ROF tables for performance metrics, a table update must be performed. This is a step that will possibly be explored in a future methodological blog post.

The test case

The city of Cary (shown in the image below) is supplied by the Jordan Lake. It has 50 years of historical streamflow and evaporation rate data, which can be found in the first 2600 columns of the data files found in the GitHub repository. In addition, these files contain 45 years of synthetically-generated projected streamflow and evaporation data obtained from Cary’s stakeholders. They also have 45 years of projected demand, and would like to use a combination of historical and synthetic streamflow and evaporation to explore how their risk tolerance will affect their water utility’s performance over 45 years.

Cary is located in the red box shown in the figure above
(source: Trindade et. al., 2019).

Performance objectives

Two performance objectives have been identified as measures of Cary’s water utility’s performance:

Maximize reliability: Cary’s stakeholders would like to maximize the reliability of the city’s water supply. They have defined failure as at least one event in which the Jordan Lake reservoir levels drop below 20% of full capacity in a year. Reliability is calculated as the following:

Reliability = 1 – (Total number of failures over all realizations/Total number of realizations)

Minimize water use restrictions: Water use restrictions are triggered every time the ROF for a current week exceed the ROF trigger (or threshold) that has been set by Cary’s stakeholders. Since water use restrictions have negative political and social implications, the average number water use restrictions frequency per realization should be minimized and is calculated as follows:

Average water use restriction frequency = Total number of restrictions per realization per year / Total number of realizations

Visualizing tradeoffs

Here, we will begin with a moderate scenario in which the Jordan Lake reservoir is 40% full. We will examine the response of average reliability and restriction frequency over 1000 realizations for varying values of the ROF trigger.

Since the risk tolerance of a stakeholder will affect how often they choose to implement water use restrictions, this will, by extension, affect the volume of storage in the reservoir. Intuitively, a less risk-averse stakeholder would choose to prioritize supply reliability (i.e., consistent reservoir storage levels), resulting in them requiring more frequent water use restrictions. The converse is also true.

The code to generate this tradeoff plot can be found under tradeoff.py in the GitHub folder. This Python script contains the following helper functions:

  1. check_failure: Checks if current storage levels are below 20% of full reservoir capacity.
  2. rof_evaluation: Evaluates the weekly ROF metrics for current demands, streamflows, and evaporation rates.
  3. restriction_check: Checks if the current weekly ROF exceeds the threshold set by the stakeholder.
  4. storage_r: Calculates the storage based on the ROF metrics. If a restriction is triggered during, only 90% of total weekly demands are met for the the smaller of either the next 4 weeks (one month of water use restrictions) or the remaining days of the realization.
  5. reliability_rf_check: Checks the reliability and the restriction frequency over all realizations for a given ROF trigger.

Send help – what is going on here?

Picture yourself as Cary. Knowing that you cannot take certain actions without adversely affecting the performance of your other system objectives, you would like an intuitive, straightforward way to ‘feel around’ for your risk tolerance. More traditionally, this would be done by exploring different combinations of your system’s decision variables (DVs) – desired reservoir storage levels, water use restriction frequency, etc – to search for a policy that is both optimal and robust. However, this requires multiple iterations of setting and tuning these DVs.

In contrast, the use of ROF metrics is more akin to a ‘set and forget’ method, in which your risk appetite is directly reflected in the dynamic between your performance objectives. Instead of searching for specific (ranges of) DV values that map to a desired policy, ROF metrics allow you to explore the objective tradeoffs by setting a threshold of acceptable risk. There are a couple of conveniences that this affords you.

Firstly, the number of DVs can be reduced. The examples of DVs given previously simply become system inputs, and ROF trigger values instead become your DVs, with each ROF trigger an reflection of the risk threshold that an objective should be able to tolerate. Consequently, this allows a closed-loop system to be approximated. Once an ROF trigger is activated, a particular action is taken, which affects the performance of the system future timesteps. These ‘affected’ system states then become the input to the next timestep, which will be used to evaluate the system performance and if an ROF trigger should be activated.

An example to clear the air

The closed-loop approximation of Cary’s water supply system.

In the Python code shown above, there is only one DV – the ROF trigger for water use restrictions. If the ROF for the current week exceeds this threshold, Cary implements water use restrictions for the next 30 days. This in turn will impact the reservoir storage levels, maintaining a higher volume of water in the Jordan Lake and improving future water supply reliability. More frequent water restrictions implies a higher reliability, and vice versa. Changing the ROF trigger value can be thought of as a dial that changes the degree of tradeoff between your performance objectives (Gold et. al., 2019). The figure on the right summarizes this process:

This process also allows ROF triggers to account for future uncertainty in the system inputs (storage levels, streamflow, demand growth rates and evaporation rates) using present and historical observations of the data. This is particularly useful when making decisions under deep uncertainty (Marchau et. al., 2019) where the uncertainty in the system inputs and internal variability can be difficult to characterize. Weekly ROFs dynamically change to reflect a posteriori system variations, enabling adaptivity and preventing the decision lock-in (Haasnoot et. al., 2013) characteristic of more a priori methods of decision-making.

Summary

Here we have shown how setting different ROF triggers can affect a system’s performance objective tradeoffs. In simpler terms, a stakeholder with a certain policy preference can set an ROF trigger value that results in their desired outcomes. Using ROF triggers also allows stakeholders the ease and flexibility to explore a range of risk tolerance levels through simulations, and discover vulnerabilities (and even opportunities) that they may have previously not been privy to.

Coming up next, we will cover how ROF triggers can be used to approximate a closed-loop system by examining the changing storage dynamics under a range of ROF trigger values. To do this, we will generate inflow and storage time series, and examine where water use restrictions were activated under different ROF trigger values. These figures will also be used to indicate the effect of ROF triggers on a utility’s drought response.

References

Gold, D. F., Reed, P. M., Trindade, B. C., & Characklis, G. W. (2019). Identifying actionable compromises: Navigating multi‐city robustness conflicts to discover cooperative safe operating spaces for regional water supply portfolios. Water Resources Research, 55(11), 9024-9050. doi:10.1029/2019wr025462

Haasnoot, M., Kwakkel, J. H., Walker, W. E., & Ter Maat, J. (2013). Dynamic adaptive policy pathways: A method for crafting robust decisions for a deeply uncertain world. Global Environmental Change, 23(2), 485-498. doi:10.1016/j.gloenvcha.2012.12.006

Marchau, V., Walker, W. E., M., B. P., & Popper, S. W. (2019). Decision making under deep uncertainty: From theory to practice. Cham, Switzerland: Springer.

Trindade, B., Reed, P., & Characklis, G. (2019). Deeply uncertain pathways: Integrated multi-city regional water supply infrastructure investment and portfolio management. Advances in Water Resources, 134, 103442. doi:10.1016/j.advwatres.2019.103442

Automate remote tasks with Paramiko

This is a short blogpost to demonstrate a the Paramiko Python package. Paramiko allows you to establish SSH, SCP or SFTP connections within Python scripts, which is handy when you’d like to automate some repetitive tasks with on remote server or cluster from your local machine or another cluster you’re running from.

It is often used for server management tasks, but for research applications you could consider situations where we have a large dataset stored at a remote location and are executing a script that needs to transfer some of that data depending on results or new information. Instead of manually establishing SSH or SFTP connections, those processes could be wrapped and automated within your existing Python script.

To begin a connection, all you need is a couple lines:

import paramiko

ssh_client = paramiko.SSHClient()
ssh_client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh_client.connect(hostname='remotehose',username='yourusername',password='yourpassword')

The first line creates a paramiko SSH client object. The second line tells paramiko what to do if the host is not a known host (i.e., whether this host should be trusted or not)—think of when you’re setting up an SSH connection for the first time and get the message:

The authenticity of host ‘name’ can’t be established. RSA key fingerprint is ‘gibberish’. Are you sure you want to continue connecting (yes/no)?

The third line is what makes the connection, the hostname, username and password are usually the only necessary things to define.

Once a connection is established, commands can be executed with exec_command(), which creates three objects:

stdin, stdout, stderr = ssh_client.exec_command("ls")

stdin is write-only file which can be used for commands requiring input, stdout contains the output of the command, and stderr contains any errors produced by the command—if there are no errors it will be empty.

To print out what’s returned by the command, use can use stdout.readlines(). To add inputs to stdin, you can do so by using the write() function:

stdin, stdout, stderr = ssh.exec_command(“sudo ls”)
stdin.write(‘password\n’)

Importantly: don’t forget to close your connection, especially if this is an automated script that opens many of them: ssh_client.close().

To transfer files, you need to establish an SFTP or an SCP connection, in a pretty much similar manner:

ftp_client=ssh_client.open_sftp()
ftp_client.get('/remote/path/to/file/filename','/local/path/to/file/filename')
ftp_client.close()

get() will transfer a file to a local directory, put(), used in the same way, will transfer a file to a remote directory.

Writing sharable Python code part II: Unit Testing

When writing Python functions that may be used by others, it is important to demonstrate that your functions work as intended. In my last post I discussed how a proper function specification establishes a sort of contract between developers and users of functions that delineates errors in implementation (user error) from bugs in the code (developer error). While this contract is provides an important chain of accountability, it does not actually contain any proof that the function works as advertised. This is where unit testing comes in. A unit test simply runs a function over a suite of test cases (set of inputs that produce known results) to verify performance. As its name implies, a single unit test is meant to test one basic component of a code. Large or complex codes will have many sets of unit tests, each testing different elements (usually individual functions). Unit testing provides users with proof that your code works as intended, and serves as a powerful tool for finding and removing any errors in your code.

In this post I’ll provide guidance on the development of unit tests and demonstrate an example for a simple python function. Material in this post represents my interpretation of content taught by Professor Walker White at Cornell, in a Python Fundamentals course that I proctored in the summer of 2020.

An example script

To illustrate unit testing I’ll examine the a function called “check_satisficing.py” which tests whether a set of performance objectives meets a set of satisficing criteria (for background on satisficing, see this post). The function returns a boolean and has three parameters:

  • objs: a numpy array containing the two objective values
  • criteria: a numpy array two criteria,
  • directions: a list of strings that specify whether the criteria is meant to be a lower or upper bound.

Note that the actual code for this function only takes eight lines, the rest is the function specification and precondition checks.

def check_satisficing(objs, criteria, directions):
    """
    Returns whether a set of objectives meets a set of satisficing criteria
    
    Value return has type bool
    
    Parameters:
        objs: objective values from a given solution
        criteria: satisficing criteria for each objective
        directions: a list of strings containing "ge" or "le" for each 
            criteria, where ge indicates satisfication for values greater than
            or equal to the criteria and le indicates satisfaction for values 
            less than or equal to the criteria
    
    
    Examples:
        objs = [.5, .5], criteria = [0.4, 0.4], directions = ['ge', 'ge'] 
            returns True
        objs = [.5, .5], criteria = [0.4, 0.4], directions = ['le', 'le'] 
            returns False     
        objs = [.4, .4], criteria = [0.5, 0.5], directions = ['le', 'le'] 
            returns True
        objs = [.5, .5], criteria = [0.4, 0.4], directions = ['ge', 'le'] 
            returns False
        objs = [.5, .5], criteria = [0.4, 0.4], directions = ['le', 'ge'] 
            returns False
        objs = [.5, .5], criteria = [0.5, 0.5], directions = ['ge', 'ge'] 
            returns True
        objs = [.5, .5], criteria = [0.5, 0.5], directions = ['le', 'le'] 
            returns True
    
    Preconditions:
        objs is a numpy array of floats with length 2
        criteria is a numpy array of floats with length 2
        directions is a list of strings with length 2 containing either 
            "ge" or "le"
    """
    
    # check preconditions
    assert len(objs) == 2, 'objs has length ' + repr(len(objs)) + ', should be 2'
    assert len(criteria) == 2, 'criteria has length ' + repr(len(criteria)) + \
    ', should be 2'
    assert len(directions) == 2, 'directions has length ' + \
    repr(len(directions)) + ', should be 2'
    
    # check to make sure
    for i in range(2):
        assert type(objs[i])== np.float64, 'objs element ' + str(i) + ': ' + \
        repr(objs[i]) + ', is not a numpy array of floats'
        assert type(criteria[i])== np.float64, 'criteria element ' + str(i) + \
        ': ' + repr(criteria[i]) + ', is not a numpy array of floats'
        assert type(directions[i])== str, 'directions element ' + str(i) + \
        ': ' + repr(directions[i]) + ', is not a string'
        assert directions[i] == 'ge' or directions[i] == 'le', 'directions ' + \
        str(i) + ' is ' + repr(directions[i]) + ', should be either "ge" or "le"' 
    
    
    # loop through objectives and check if each meets the criteria
    meets_criteria = True
    for i in range(2):
        if directions[i] == 'ge':
            meets_criteria = meets_criteria and (objs[i] >= criteria[i])
        else:
            meets_criteria = meets_criteria and (objs[i] <= criteria[i])
    
    return meets_criteria

Developing test cases

If you read my last post, you may notice that this specification includes an extra section called “Examples”. These examples show the user how the function is supposed to perform. They also represent the suite of test cases used to validate the function. Creating test cases is more of an art than a science, and test cases will be unique to each function you write. However, there is a set of basic rule you can follow to guide your implementation of unit testing which I’ll outline below.

  1. Pick the simplest cases first. In this example, the simplest cases are when both objectives are both above or below the criteria
  2. Move to more complex cases. In this example, a more complex case could be when one objective is above and the other below, or vice versa
  3. Think about “weird” possibilities. One “weird” case for this code could be when one or both objectives are equal to the criteria
  4. Never test a precondition violation. Precondition violations are outside the scope of the function and should not be included in unit tests

Test cases should be exhaustive and even simple codes may have many test cases. In my example above I provide seven, can you think of any more that are applicable to this code?

Implementing a testing strategy

After you’ve developed your test cases, it’s time to implement your unit test. For demonstration purposes I’ve written my own unit test code which can be used to test the cases developed above. This function simply utilizes assert statements to check if each test input generates the correct output. A danger of writing your own testing function is that the test function itself may have errors. In practice, it’s easiest to use an established tool such as PyTest to perform unit testing (for in-depth coverage of PyTest see Bernardo’s post from 2019).

def test_check_satisficing():
    """
    Unit test for the function check_satisficing
    """
    import numpy as np
    from check_satisficing import check_satisficing
    
    print("Testing check_satisficing")
    
    
    # test case 1:
    objs1 = np.array([0.5, 0.5])
    criteria1 = np.array([0.4, 0.4])
    directions1 = ['ge','ge']
    result1 = True
    
    assert (check_satisficing(objs1, criteria1, directions1)) == result1, \
    'Test 1 failed ' + repr(objs1) + ', ' + repr(criteria1) + ', ' + \
    repr(directions1) + ' returned False, should be True'
    
    # test case 2:
    objs2 = np.array([0.5, 0.5])
    criteria2 = np.array([0.4, 0.4])
    directions2 = ['ge','le']
    result2 = False
    
    assert (check_satisficing(objs2, criteria2, directions2)) == result2, \
    'Test 2 failed ' + repr(objs2) + ', ' + repr(criteria2) + ', ' + \
    repr(directions2) + ' returned True, should be False'
    
    
     # test case 3:
    objs3 = np.array([0.4, 0.4])
    criteria3 = np.array([0.5, 0.5])
    directions3 = ['le','le']
    result3= True
    
    assert (check_satisficing(objs3, criteria3, directions3)) == result3, \
    'Test 3 failed ' + repr(objs3) + ', ' + repr(criteria3) + ', ' + \
    repr(directions3) + ' returned False, should be True'
    
    
     # test case 4:    
    objs4 = np.array([0.5, 0.5])
    criteria4 = np.array([0.4, 0.4])
    directions4 = ['ge','le']
    result4 = False
    
    assert (check_satisficing(objs4, criteria4, directions4)) == result4, \
    'Test 4 failed ' + repr(objs4) + ', ' + repr(criteria4) + ', ' + \
    repr(directions4) + ' returned True, should be False'
    
    
    # test case 5    
    objs5 = np.array([0.5, 0.5])
    criteria5 = np.array([0.4, 0.4])
    directions5 = ['le','ge']
    result5 = False
    
    assert (check_satisficing(objs5, criteria5, directions5)) == result5, \
    'Test 5 failed ' + repr(objs5) + ', ' + repr(criteria5) + ', ' + \
    repr(directions5) + ' returned True, should be False'
    
    
    # test case 6: 
    objs6 = np.array([0.5, 0.5])
    criteria6 = np.array([0.5, 0.5])
    directions6 = ['ge','ge']
    result6 = True
    
    assert (check_satisficing(objs6, criteria6, directions6)) == result6, \
    'Test 6 failed ' + repr(objs6) + ', ' + repr(criteria6) + ', ' + \
    repr(directions6) + ' returned False, should be True'
    
    # test case 7: 
    objs7 = np.array([0.5, 0.5])
    criteria7 = np.array([0.5, 0.5])
    directions7 = ['le','le']
    result7 = True
    
    assert (check_satisficing(objs7, criteria7, directions7)) == result7, \
    'Test 7 failed ' + repr(objs7) + ', ' + repr(criteria7) + ', ' + \
    repr(directions7) + ' returned False, should be True'
    
    
    print("check_satisficing has passed all tests!")

    return    

Concluding thoughts

When developing a new Python function, it’s good practice to code the test cases while you’re writing the function and test each component during the development cycle. Coding in this matter will allow you to identify and fix errors early an can save a lot of time and headache. Creating test cases during code development may also improve the quality of your code by helping you conceptualize and avoid potential bugs.

MORDM Basics I: Synthetic Streamflow Generation

In this post, we will break down the key concepts underlying synthetic streamflow generation, and how it fits within the Many Objective Robust Decision Making (MORDM) framework (Kasprzyk, Nataraj et. al, 2012). This post is the first in a series on MORDM which will begin here: with generating and validating the data used in the framework. To provide some context as to what we are about to attempt, please refer to this post by Jon Herman.

What is synthetic streamflow generation?

Synthetic streamflow generation is a non-parametric, direct statistical approach used to generate synthetic streamflow timeseries from a reasonably long historical record. It is used when there is a need to diversify extreme event scenarios, such as flood and drought, or when we want to generate flows to reflect a shift in the hydrologic regime due to climate change. It is favored as it relies on a re-sampling of the historical record, preserving temporal correlation up to a certain degree, and results in a more realistic synthetic dataset. However, its dependence on a historical record also implies that this approach requires a relatively long historical inflow data. Jon Lamontagne’s post goes into further detail regarding this approach.

Why synthetic streamflow generation?

An important step in the MORDM framework is scenario discovery, which requires multiple realistic scenarios to predict future states of the world (Kasprzyk et. al., 2012). Depending solely on the historical dataset is insufficient; we need to generate multiple realizations of realistic synthetic scenarios to facilitate a comprehensive scenario discovery process. As an approach that uses a long historical record to generate synthetic data that has been found to preserve seasonal and annual correlation (Kirsch et. al., 2013; Herman et. al., 2016), this method provides us with a way to:

  1. Fully utilize a large historical dataset
  2. Stochastically generate multiple synthetic datasets while preserving temporal correlation
  3. Explore many alternative climate scenarios by changing the mean and the spread of the synthetic datasets

The basics of synthetic streamflow generation in action

To better illustrate the inner workings of synthetic streamflow generation, it is helpful to use a test case. In this post, the historical dataset is obtained from the Research Triangle Region in North Carolina. The Research Triangle region consists of four main utilities: Raleigh, Durham, Cary and the Orange County Water and Sewer Authority (OWASA). These utilities are receive their water supplies from four water sources: the Little River Reservoir, Lake Wheeler, Lake Benson, and the Jordan Lake (Figure 1), and historical streamflow data is obtained from ten different stream gauges located at each of these water sources. For the purpose of this example, we will be using 81 years’ worth of weekly streamflow data available here.

Figure 1: The Research Triangle region (Trindade et. al., 2019).

The statistical approach that drive synthetic streamflow generation is called the Kirsch Method (Kirsch et. al., 2013). In plain language, this method does the following:

  1. Converts the historical streamflows from real space to log space, and then standardize the log-space data.
  2. Bootstrap the log-space historical matrix to obtain an uncorrelated matrix of historical data.
  3. Obtain the correlation matrix of the historical dataset by performing Cholesky decomposition.
  4. Impose the historical correlation matrix upon the uncorrelated matrix obtained in (2) to generate a standardized synthetic dataset. This preserves seasonal correlation.
  5. De-standardize the synthetic data, and transform it back into real space.
  6. Repeat steps (1) to (5) with a historical dataset that is shifted forward by 6 months (26 weeks). This preserves year-to-year correlation.

This post by Julie Quinn delves deeper into the Kirsch Method’s theoretical steps. The function that executes these steps can be found in the stress_dynamic.m Matlab file, which in turn is executed by the wsc_main_rate.m file by setting the input variable p = 0 as shown on Line 27. Both these files are available on GitHub here.

However, this is simply where things get interesting. Prior to this, steps (1) to (6) would have simply generated a synthetic dataset based on only historical statistical characteristics as validated here in Julie’s second blog post on a similar topic. Out of the three motivations for using synthetic streamflow generation, the third one (exploration of multiple scenarios) has yet to be satisfied. This is a nice segue into out next topic:

Generating multiple scenarios using synthetic streamflow generation

The true power of synthetic streamflow generation lies in its ability to generate multiple climate (or in this case, streamflow) scenarios. This is done in stress_dynamic.m using three variables:

Input variableData type
pThe lowest x% of streamflows
nA vector where each element ni is the number of copies of the p-lowest streamflow years to be added to the bootstrapped historical dataset.
mA vector where each element mi is the number of copies of the (1-p)-highest streamflow years to be added to the bootstrapped historical dataset.
Table 1: The input variables to the stress_dynamic function.

These three variables bootstrap (increase the length of) the historical record while allow us to perturb the historical streamflow record streamflows to reflect an increase in frequency or severity of extreme events such as floods and droughts using the following equation:

new_hist_years = old_historical_years + [(p*old_historical_years)*ni ] + (old_hist_years – [(p*old_historical_years)mi])

The stress_dynamic.m file contains more explanation regarding this step.

This begs the question: how do we choose the value of p? This brings us to the topic of the standardized streamflow indicator (SSI6).

The SSI6 is the 6-month moving average of the standardized streamflows to determine the occurrence and severity of drought on the basis of duration and frequency (Herman et. al., 2016). Put simply, this method determines the occurrence of drought if the the value of the SSI6 < 0 continuously for at least 3 months, and SSI6 < -1 at least once during the 6-month interval. The periods and severity (or lack thereof) of drought can then be observed, enabling the decision on the length of both the n and m vectors (which correspond to the number of perturbation periods, or climate event periods). We will not go into further detail regarding this method, but there are two important points to be made:

  1. The SSI6 enables the determination of the frequency (likelihood) and severity of drought events in synthetic streamflow generation through the values contained in p, n and m.
  2. This approach can be used to generate flood events by exchanging the values between the n and m vectors.

A good example of point (2) is done in this test case, in which more-frequent and more-severe floods was simulated by ensuring that most of the values in m where larger than those of n. Please refer to Jon Herman’s 2016 paper titled ‘Synthetic drought scenario generation to support bottom-up water supply vulnerability assessments’ for further detail.

A brief conceptual letup

Now we have shown how synthetic streamflow generation satisfies all three factors motivating its use. We should have three output folders:

  • synthetic-data-stat: contains the synthetic streamflows based on the unperturbed historical dataset
  • synthetic-data-dyn: contains the synthetic streamflows based on the perturbed historical dataset

Comparing these two datasets, we can compare how increasing the likelihood and severity of floods has affected the resulting synthetic data.

Validation

To exhaustively compare the statistical characteristics of the synthetic streamflow data, we will perform two forms of validation: visual and statistical. This method of validation is based on Julie’s post here.

Visual validation

Done by generating flow duration curves (FDCs) . Figure 2 below compares the unperturbed (left) and perturbed (right) synthetic datasets.

Figure 2: (Above) The FDC of the unperturbed historical dataset (pink) and its resulting synthetic dataset (blue). (Below) The corresponsing perturbed historical and synthetic dataset.

The bottom plots in Figure 2 shows an increase in the volume of weekly flows, as well as an smaller return period, when the the historical streamflows were perturbed to reflect an increasing frequency and magnitude of flood events. Together with the upper plots in Figure 2, this visually demonstrates that the synthetic streamflow generation approach (1) faithfully reconstructs historical streamflow patterns, (2) increases the range of possible streamflow scenarios and (3) can model multiple extreme climate event scenarios by perturbing the historical dataset. The file to generate this Figure can be found in the plotFDCrange.py file.

Statistical validation

The mean and standard deviation of the perturbed and unperturbed historical datasets are compared to show if the perturbation resulted in significant changes in the synthetic datasets. Ideally, the perturbed synthetic data would have higher means and similar standard deviations compared to the unperturbed synthetic data.

Figure 3: (Above) The unperturbed synthetic (blue) and historical (pink) streamflow datasets for each of the 10 gauges. (Below) The perturbed counterpart.

The mean and tails of the synthetic streamflow values of the bottom plots in Figure 3 show that the mean and maximum values of the synthetic flows are significantly higher than the unperturbed values. In addition, the spread of the standard deviations of the perturbed synthetic streamflows are similar to that of its unperturbed counterpart. This proves that synthetic streamflow generation can be used to synthetically change the occurrence and magnitude of extreme events while maintaining the periodicity and spread of the data. The file to generate Figure 3 can be found in weekly-moments.py.

Synthetic streamflow generation and internal variability

The generation of multiple unperturbed realizations of synthetic streamflow is vital for characterizing the internal variability of a system., otherwise known as variability that arises from natural variations in the system (Lehner et. al., 2020). As internal variability is intrinsic to the system, its effects cannot be eliminated – but it can be moderated. By evaluating multiple realizations, we can determine the number of realizations at which the internal variability (quantified here by standard deviation as a function of the number of realizations) stabilizes. Using the synthetic streamflow data for the Jordan Lake, it is shown that more than 100 realizations are required for the standard deviation of the 25% highest streamflows across all years to stabilize (Figure 4). Knowing this, we can generate sufficient synthetic realizations to render the effects of internal variability insignificant.

Figure 4: The highest 25% of synthetic streamflows for the Jordan Lake gauge.

The file internal-variability.py contains the code to generate the above figure.

How does this all fit within the context of MORDM?

So far, we have generated synthetic streamflow datasets and validated them. But how are these datasets used in the context of MORDM?

Synthetic streamflow generation lies within the domain of the second part of the MORDM framework as shown in Figure 5 above. Specifically, synthetic streamflow generation plays an important role in the design of experiments by preserving the effects of deeply uncertain factors that cause natural events. As MORDM requires multiple scenarios to reliably evaluate all possible futures, this approach enables the simulation of multiple scenarios, while concurrently increasing the severity or frequency of extreme events in increments set by the user. This will allow us to evaluate how coupled human-natural systems change over time given different scenarios, and their consequences towards the robustness of the system being evaluated (in this case, the Research Triangle).

Figure 4: The taxonomy of robustness frameworks. The bold-outlined segments highlight where MORDM fits within this taxonomy (Herman et. al., 2015).

Typically, this evaluation is performed in two main steps:

  1. Generation and evaluation of multiple realizations of unperturbed annual synthetic streamflow. The resulting synthetic data is used to generate the Pareto optimal set of policies. This step can help us understand how the system’s internal variability affects future decision-making by comparing it with the results in step (2).
  2. Generation and evaluation of multiple realizations of perturbed annual synthetic streamflow. These are the more extreme scenarios in which the previously-found Pareto-optimal policies will be evaluated against. This step assesses the robustness of the base state under deeply uncertain deviations caused by the perturbations in the synthetic data and other deeply uncertain factors.

Conclusion

Overall, synthetic streamflow generation is an approach that is highly applicable in the bottom-up analysis of a system. It preserves historical characteristics of a streamflow timeseries while providing the flexibility to modify the severity and frequency of extreme events in the face of climate change. It also allows the generation of multiple realizations, aiding in the characterization and understanding of a system’s internal variability, and a more exhaustive scenario discovery process.

This summarizes the basics of data generation for MORDM. In my next blog post, I will introduce risk-of-failure (ROF) triggers, their background, key concepts, and how they are applied within the MORDM framework.

References

Herman, J. D., Reed, P. M., Zeff, H. B., & Characklis, G. W. (2015). How should robustness be defined for water systems planning under change? Journal of Water Resources Planning and Management, 141(10), 04015012. doi:10.1061/(asce)wr.1943-5452.0000509

Herman, J. D., Zeff, H. B., Lamontagne, J. R., Reed, P. M., & Characklis, G. W. (2016). Synthetic drought scenario generation to support bottom-up water supply vulnerability assessments. Journal of Water Resources Planning and Management, 142(11), 04016050. doi:10.1061/(asce)wr.1943-5452.0000701

Kasprzyk, J. R., Nataraj, S., Reed, P. M., & Lempert, R. J. (2013). Many objective robust decision making for complex environmental systems undergoing change. Environmental Modelling & Software, 42, 55-71. doi:10.1016/j.envsoft.2012.12.007

Kirsch, B. R., Characklis, G. W., & Zeff, H. B. (2013). Evaluating the impact of alternative hydro-climate scenarios on transfer agreements: Practical improvement for generating synthetic streamflows. Journal of Water Resources Planning and Management, 139(4), 396-406. doi:10.1061/(asce)wr.1943-5452.0000287

Mankin, J. S., Lehner, F., Coats, S., & McKinnon, K. A. (2020). The value of initial condition large ensembles to Robust Adaptation Decision‐Making. Earth’s Future, 8(10). doi:10.1029/2020ef001610

Trindade, B., Reed, P., Herman, J., Zeff, H., & Characklis, G. (2017). Reducing regional drought vulnerabilities and multi-city robustness conflicts using many-objective optimization under deep uncertainty. Advances in Water Resources, 104, 195-209. doi:10.1016/j.advwatres.2017.03.023