Performing runtime diagnostics using MOEAFramework

Performing runtime diagnostics using MOEAFramework

In this blog post, we will be reviewing how to perform runtime diagnostics using MOEAFramework. This software has been used in prior blog posts by Rohini and Jazmin to perform MOEA diagnostics across multiple MOEA parameterizations. Since then, MOEAFramework has undergone a number of updates and structure changes. This blog post will walk through the updated functionality of running MOEAFramework (version 4.0) via the command line to perform runtime diagnostics across 20 seeds using one set of parameters. We will be using the classic 3-objective DTLZ2 problem optimized using NSGAII, both of which are in-built into MOEAFramework.

Before we begin, some helpful tips and system configuration requirements:

  • Ensure that you have the latest version of Java installed (as of April 2024, this is Java Version 22). The current version of MOEAFramework was compiled using class file version 61.0, which was made available in Java Version 17 (find the complete list of Java versions and their associated class files here). This is the minimum requirement for being able to run MOEAFramework.
  • The following commands are written for a Linux interface. Download a Unix terminal emulator like Cygwin if needed.

Another helpful tip: to see all available flags and their corresponding variables, you can use the following structure:

java -cp MOEAFramework-4.0-Demo.jar org.moeaframework.mainClass.subClass.functionName --help

Replace mainClass, subClass, and functionName with the actual class, subclass, and function names. For example,

java -cp MOEAFramework-4.0-Demo.jar org.moeaframework.analysis.tools.SampleGenerator --help

You can also replace --help with -h (if the last three alphabets prove too much to type for your weary digits).

Runtime diagnostics across 20 different seeds for one set of parameters

Generating MOEA parameters and running the optimization

To run NSGAII using one set of parameters, make sure to have a “parameters file” saved as a text file containing the following:

populationSize 10.0 250.999
maxEvaluations 10000 10000
sbx.rate 0.0 1.0
sbx.distributionIndex 0.0 100.0
pm.rate 0.0 1.0
pm.distributionIndex 0.0 100.0

For the a full list of parameter files for each of the in-built MOEAFramework algorithms, please see Jazmin’s post here.

In this example, I have called it NSGAII_Params.txt. Note that maxEvaluations is set to 10,000 on both its lower and upper bounds. This is because we want to fix the number of function evaluations completed by NSGAII. Next, in our command line, we run:

java -cp MOEAFramework-4.0-Demo.jar org.moeaframework.analysis.tools.SampleGenerator --method latin --numberOfSamples 1 --parameterFile NSGAII_Params.txt --output NSGAII_Latin

The output NSGAII_Latin file should contain a single-line that can be opened as a text file. It should have six tab-delineated values that correspond to the six parameters in the input file that you have generated. Now that you have your MOEA parameter files, let’s begin running some optimizations!

First, make a new folder in your current directory to store your output data. Here, I am simply calling it data.

mkdir data

Next, optimize the DTLZ2 3-objective problem using NSGAII:

for i in {1..20}; do java -cp MOEAFramework-4.0-Demo.jar org.moeaframework.analysis.tools.RuntimeEvaluator --parameterFile NSGAII_Params.txt --input NSGAII_Latin --problem DTLZ2_3 --seed $i --frequency 1000 --algorithm NSGAII --output data/NSGAII_DTLZ2_3_$i.data; done

Here’s what’s going down:

  • First, you are performing a runtime evaluation of the optimization of the 3-objective DTLZ2 problem using NSGAII
  • You are obtaining the decision variables and objective values at every 1,000 function evaluations, effectively tracking the progress of NSGAII as it attempts to solve the problem
  • Finally, you are storing the output in the data/ folder
  • You then repeat this for 20 seeds (or for as many as you so desire).

Double check your .data file. It should contain information on your decision variables and objective values at every 1,000 NFEs or so, seperated from the next thousand by a “#”.

Generate the reference set

Next, obtain the only the objective values at every 1,000 NFEs by entering the following into your command line:

for i in {1..20}; do java -cp MOEAFramework-4.0-Demo.jar org.moeaframework.analysis.tools.ResultFileMerger --problem DTLZ2_3 --output data/NSGAII_DTLZ2_3_$i.set --epsilon 0.01,0.01,0.01 data/NSGAII_DTLZ2_3_$i.data; done

Notice that we have a new flag here – the --epsilon flag tells MOEAFramework that you only want objective values that are at least 0.01 better than other non-dominated objective values for a given objective. This helps to trim down the size of the final reference set (coming up soon) and remove solutions that are only marginally better (and may not be decision-relevant in the real-world context).

On to generating the reference set – let’s combine all objectives across all seeds using the following command line directive:

for i in {1..20}; do java -cp MOEAFramework-4.0-Demo.jar org.moeaframework.analysis.tools.ReferenceSetMerger --output data/NSGAII_DTLZ2_3.ref -epsilon 0.01,0.01,0.01 data/NSGAII_DTLZ2_3_$i.set; done

Your final reference set should now be contained within the NSGAII_DTLZ2_3.ref file in the data/ folder.

Generate the runtime metrics

Finally, let’s generate the runtime metrics. To avoid any mix-ups, let’s create a folder to store these files:

mkdir data_metrics

And finally, generate our metrics!

or i in {1..20}; do java -cp MOEAFramework-4.0-Demo.jar org.moeaframework.analysis.tools.ResultFileEvaluator --problem DTLZ2_3 --epsilon 0.01,0.01,0.01 --input data/NSGAII_DTLZ2_3_$i.data --reference data/NSGAII_DTLZ2_3.ref --output data_metrics/NSGAII_DTLZ2_3_$i.metrics; done

If all goes well, you should see 20 files (one each for each seed) similar in structure to the one below in your data_metrics/ folder:

The header values are the names of each of the MOEA performance metrics that MOEAFramework measures. In this blog post, we will proceed with visualizing the hypervolume over time across all 20 seeds.

Visualizing runtime diagnostics

The following Python code first extracts the metric that you would like to view, and saves the plot as a PNG file in the data_metrics/ folder:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style('whitegrid')

# define constants 
num_seeds = 20
NFE = 10000 
freq = 1000
num_output = int(NFE/freq)

algorithm = 'NSGAII'
problem = 'DTLZ2_3'
folder_name = 'data_metrics/'
metric_name = 'Hypervolume'
# create matrix of hypervolume runtimes 
hvol_matrix = np.zeros((num_seeds, num_output), dtype=float)
for seed in range(num_seeds):
    runtime_df = pd.read_csv(f'{folder_name}{algorithm}_{problem}_{seed+1}.metrics', delimiter=' ', header=0)
    if metric_name == 'Hypervolume':
        hvol_matrix[seed] = runtime_df['#Hypervolume'].values
    else:
        hvol_matrix[seed] = runtime_df[metric_name].values

# plot the hypervolume over time
fig, ax = plt.subplots(figsize=(10, 6))

ax.fill_between(np.arange(freq, NFE+freq, freq), np.min(hvol_matrix, axis=0), np.max(hvol_matrix, axis=0), color='paleturquoise', alpha=0.6)
ax.plot(np.arange(freq, NFE+freq, freq), np.min(hvol_matrix, axis=0), color='darkslategrey', linewidth=2.0)
ax.plot(np.arange(freq, NFE+freq, freq), np.max(hvol_matrix, axis=0), color='darkslategrey', linewidth=2.0)
ax.plot(np.arange(freq, NFE+freq, freq), np.mean(hvol_matrix, axis=0), color='darkslategrey', linewidth=2.0 ,label='Mean', linestyle='--')

ax.set_xlabel('NFE')
ax.set_xlim([freq, NFE+freq])
ax.set_ylabel(metric_name)
ax.set_ylim([0, 1])
ax.set_title(f'{metric_name} over time')
ax.legend(loc='upper left')

plt.savefig(f'{folder_name}{algorithm}{problem}_{metric_name}.png')

If you correctly implemented the code, you should be able to be view the following figure that shows how the hypervolume attained by the NSGAII algorithm improves steadily over time.

In the figure above, the colored inner region spans the hypervolume attained across all 20 seeds, with the dotted line representing the mean hypervolume over time. The solid upper and lower bounding lines are the maximum and minimum hypervolume achieved every 1,000 NFEs respectively. Note that, in this specific instance, NSGAII only achieves about 50% of the total hypervolume of the overall objective space. This implies that a higher NFE (a longer runtime) is required for NSGAII to further increase the hypervolume achieved. Nonetheless, the rate of hypervolume increase is gradually decreasing, indicating that this particular parameterization of NSGAII is fast approaching its maximum possible hypervolume, which additional NFEs only contributing small improvements to performance. It is also worth noting the narrow range of hypervolume values, especially as the number of NFEs grows larger. This is representative of the reliability of the NGSAII algorithm, demonstrating that is can somewhat reliably reproduce results across multiple different seeds.

Summary

This just about brings us to the end of this blog post! We’ve covered how to perform MOEA runtime diagnostics and plot the results. If you are curious, here are some additional things to explore:

  • Plot different performance metrics against NFE. Please see Joe Kasprzyk’s post here to better understand the plots you generate.
  • Explore different MOEAs that are built into MOEAFramework to see how they perform across multiple seeds.
  • Generate multiple MOEA parameter samples using the in-built MOEAFramework Latin Hypercube Sampler to analyze the sensitivity of a given MOEA to its parameterization.
  • Attempt examining the runtime diagnostics of Borg MOEA using the updated version of MOEAFramework.

On that note, make sure to check back for updates as MOEAFramework is being actively reviewed and improved! You can view the documentation of Version 4.0 here and access its GitHub repository here.

Happy coding!

Weather Regime-Based Stochastic Weather Generation (Part 2/2)

In this post on the Water Programming Blog, we continue to explore the application of the stochastic weather generator (available on GitHub) for climate-change scenario developments. This is the second installment of a two-part series of blog posts, and readers are strongly encouraged to familiarize themselves with different components of the stochastic weather generator, as explained in Part 1 by Rohini Gupta (The Reed Research Group).

Here, we will begin by offering a concise overview of developing climate change scenarios and how these scenarios are integrated into the weather generation model. Following this, we will proceed to interpret the impact of these climatic change conditions on key statistical insights concerning occurrences of floods and droughts. Through these examples, the implications of these climatic shifts on water resources management and flood risk analysis will become evident.

Climate Change Perturbations

In this stochastic weather generator, we specifically focus on two aspects of climate change scenario developments, including 1) thermodynamic and 2) dynamic perturbations.

1) Thermodynamic Change

Thermodynamic climate change, often referred to as temperature-driven change, is primarily driven by changes in the Earth’s energy balance and temperature distribution. This warming affects various aspects of the climate system, such as intensifying precipitation extremes, melting snowpacks and ice sheets, rising sea levels, altered weather patterns, and shifts in ecosystems. The primary driver of temperature-driven climate change is the increase in regional-to-global average temperatures due to the enhanced greenhouse effect. As temperatures rise due to natural and anthropogenic forcings, they trigger a cascade of interconnected impacts throughout the climate system.

In the stochastic weather generator, scenarios of temperature change are treated simply by adding trends to simulated temperature data for each location across the spatial domain. However, scenarios of thermodynamic precipitation intensification are modeled using a quantile mapping technique through scaling the distribution of daily precipitation in a way that replicates the effects of warming temperatures on precipitation as the moisture holding capacity of the atmosphere increases. In the context of California, previous studies have demonstrated that as temperatures rise, the most severe precipitation events (often associated with Atmospheric Rivers landfalling) are projected to increase in frequency, while the intensity of smaller precipitation events is expected to decrease (Gershunov et al., 2019). This alteration effectively stretches the distribution of daily precipitation, causing extreme events to become more pronounced while reducing the occurrence and strength of lighter precipitation events. We replicate this phenomenon by making adjustments to the statistical characteristics and distribution percentiles of precipitation (e.g., Pendergrass and Hartmann, 2014). To further elaborate on this, we select a scaling factor for the 99th percentile of nonzero precipitation and then modify the gamma-GPD mixed distribution to enforce this chosen scaling factor. For instance, in a scenario with a 3°C temperature warming and a 7% increase in extreme precipitation per °C (matching the theoretical Clausius-Clapeyron rate of increase in atmospheric water holding capacity due to warming; Najibi and Steinschneider (2023)), the most extreme precipitation events are projected to increase by approximately 22.5% (1.073). We adjust the gamma-GPD models fitted to all locations to ensure this percentage increase in the upper tail of the distribution. Assuming that mean precipitation remains constant at baseline levels, this adjustment will cause smaller precipitation values in the gamma-GPD model to decrease, compensating for the increase in extreme events through stretching the distribution of nonzero precipitation.

Lines 33-40 from ‘config.simulations.R’ show the user-defined changes to implement the thermodynamic scenarios based on temperature in Celsius (tc: e.g. 1 °C), percent change in extreme precipitation quantile (pccc: e.g. 7% per °C), and percent change in average precipitation (pmuc: e.g. 12.5% decrease) inputs. Needless to say that the stochastic weather generator runs at baseline mode if tc=0 and pmuc=0.

    ##-------------Define perturbations-------------##
    ##climate changes and jitter to apply:
    change.list <- data.frame("tc"=  c(0), # {e.g., 0, 1, 2, ...} (changes in temperature)
                              "jitter"= c(TRUE),
                              "pccc"= c( 0.00), # {e.g., 0, 0.07, 0.14, ...} (changes for precipitation extreme quantile -- CC)
                              "pmuc"= c( 0.00)# {e.g., 0, -.125, .125, ...} (changes in precipitation mean)
    )
    ##----------------------------------------------##

2) Dynamic Change

Dynamic climate change, also known as circulation-driven change, is driven by shifts in atmospheric and oceanic circulation patterns. These circulation patterns are influenced by a variety of factors, including temperature gradients, differences in air pressure, and Earth’s rotation. Changes in these patterns can lead to alterations in weather patterns, precipitation distribution, and regional climate characteristics. One well-known example of dynamic climate change is the phenomenon of El Niño and La Niña, which involve changes in ocean temperatures and atmospheric pressure in the Pacific Ocean. These events can significantly impact local-to-global weather patterns, causing droughts, heavy rainfall, and other extreme weather events (Pfahl et al., 2017).

Dynamic changes impact the evolution of weather patterns and can modify the occurrence frequency of these patterns. This influence can occur through direct adjustments to the transition probabilities between different weather regimes, or indirectly by modifying the covariates that govern the progression of these weather regimes. In Steinschneider et al. (2019), a Niño 3.4 index is used to force weather regime evolution and is systematically adjusted to create more frequent El Niño and La Niña events. In Gupta et al. (in review), a 600-year long sequence of tree-ring reconstructed principal components of weather regime occurrence are used as an alternative covariate to better capture natural variability inherent in the weather regimes.

In the most recent version of the stochastic weather generator, we developed a novel non-parametric approach to simulation of weather regimes, allowing for future dynamic change scenarios with altered (customized) weather regime probabilities. Assuming that the historical time series of water regimes is divided into distinct, consecutive segments without overlaps, each segment has a duration of D years, and there is a total of ND segments considered there (here, D=4 and ND=18). In the non-parametric method, each segment (indexed as n=1 to ND) is assigned a sampling probability denoted as pn. To generate a new sequence of daily weather regimes spanning any desired number of years, the procedure involves resampling (with replacement) the nth D-year segment of daily weather regimes using the corresponding probability pn. This process is repeated until the required number of years of simulated weather regimes has been attained. If needed, the last segment can be trimmed to ensure the precise desired duration of simulated weather regimes.

In the baseline scenario for the weather generator with no dynamic climate change (only thermodynamic change), each segment is considered equally likely (i.e., no changes to large-scale circulation patterns).

However, the probabilities pn can be adjusted to alter the frequencies of each of the identified weather regimes in the final simulation, enabling the generation of dynamic climate change scenarios (i.e., scenarios in which the frequencies of different atmospheric flow patterns change compared to their historical frequencies). This is achieved using a linear program. The goal of the linear model (not shown) is to identify new sampling probabilities pn that, when used in the non-parametric simulation approach above, create a sequence of weather regimes with long-term average frequencies that approach some vector of target probabilities for those identified weather regimes.

Lines 91-126 from ‘config.simulations.R’ show the user-defined changes to implement a non-parametric scenario with equal probabilities (0: no change to the historical sequence of weather regimes) to ten weather regimes, i.e., dynamic scenario #0; and a 30% increase in weather regime number three (a dry weather condition) in California, i.e., dynamic scenario #1.

##Choose below whether through parametric or non-parametric way to create the simulated WRs ##
    use.non_param.WRs <- TRUE #{TRUE, FALSE}: TRUE for non-parametric, FALSE for parametric simulated WRs

    dynamic.scenario  <- 0 # {0, 1, 2}: 0: no dynamic change; 1: dynamic scenario #1 (30% increase in WR3); or 2: dynamic scenario #2 (linear trend)

    if (use.non_param.WRs){      #----------- 1+2 dynamic scenarios ----------#
      if (dynamic.scenario==0){
        ##===> Attempt #0 (thermodynamic only; no change to freq of WRs) ===##
        # #specify target change (as a percent) for WR probabilities
        WR_prob_change <- c(0,0,0,0,0,0,0,0,0,0) # between 0 and 1
        # #how close (in % points) do the WR frequencies (probabilities) need to be to the target
        lp.threshold <- 0.00001
        # #how much change do we allow in a sub-period sampling probability before incurring a larger penalty in the optimization
        piecewise_limit <- .02
        
      }else if(dynamic.scenario==1){
        ##===> Attempt #1 (dynamic scenario #1) ===##
        # #specify target change (as a percent) for WR probabilities (if, increasing WR3 in future)
        WR_prob_change <- c(0,0,.3,0,0,0,0,0,0,0) # between 0 and 1
        # #how close (in % points) do the WR frequencies (probabilities) need to be to the target
        lp.threshold <- 0.007
        # #how much change do we allow in a sub-period sampling probability before incurring a larger penalty in the optimization
        piecewise_limit <- .02
        
      }else if(dynamic.scenario==2){
        ##===> Attempt #2 (dynamic scenario #2) ===##
        # specify target change (as a percent) for WR probabilities (if, continuing their current trends in future)
        WR_prob_change <- c(-0.09969436,  0.27467048,  0.33848792,
                            -0.28431861, -0.23549986,  0.03889970,
                            -0.05628958, 0.38059153, -0.16636739, -0.17995965) # between 0 and 1
        # how close (in % points) do the WR frequencies (probabilities) need to be to the target
        lp.threshold <- 0.008
        # how much change do we allow in a sub-period sampling probability before incurring a larger penalty in the optimization
        piecewise_limit <- .02
      }
    }

Stochastic Weather Generation for Climate Change Scenarios

The stochastic weather generator is utilized to generate two climate change scenarios as defined above. The description of specific configurations for each scenario is listed as follows:

  • Thermodynamic Scenario: 3°C increase in temperature, 7% per °C increase in precipitation extremes, no change in average precipitation.
  • Dynamic Scenario: 30% increase in occurrence frequency of one weather regime only, labeled as ‘WR3’, which exhibits a ridge directly over the northwest US, i.e., blocking moisture flow over California, and resulting in dry conditions during the cold season there.

Thus, we generated 1008 years of simulated precipitation and temperature for each 12 sites in the Tuolumne River Basin in California (similar to Part 1) following these two scenarios. Below is a list of figures to understand better the impact of each scenario on precipitation and temperature statistical distributions and important climate extremes at basin scale.

The two Figures below present the cumulative distribution function (CDF) of the generated scenario for precipitation (left) and temperature (right) based on the thermodynamic and dynamic change, respectively. The observed time-series of precipitation and temperature across these 12 sites is also illustrated.

As seen above, although the 3°C warming is clearly manifested in the alteration of simulated temperature’s CDF, it is hard to notice any drastic shifts in the overall CDF of precipitation time series, as the bulk of distribution has not been altered (remember the average precipitation remained constant although its extreme quantile scaled up by ~ 22.5%).

Similarly, while the CDF of precipitation demonstrates a slight shift towards a drier condition, we notice a large shift in tail of temperature distribution.

Now, we go ahead and examine a set of important indexes for climate risk assessment of water systems. The Figure below presents the 1-day precipitation maxima derived from the generated scenario for precipitation based on the thermodynamic (left) and dynamic (right) change.

In the plot above depicted for thermodynamic climate change, the median 1-day precipitation extremes at the basin scale throughout the entire synthetic weather generation vs. historical period demonstrates a 25.5% increase in its magnitude, which is consistent with the event-based precipitation scaling by 22.5% at each site. However, such metric has almost remained unchanged in the dynamic climate change scenario.

Finally, the Figure below shows the 5-year drought derived from the generated scenario for water-year precipitation total, based on the thermodynamic (left) and dynamic (right) change.

The boxplots presented above related to the thermodynamic scenario, revealing a consistent median 5-year drought magnitude, as anticipated (no shift in the distribution of average precipitation bulk). In contrast, the dynamic climate change scenario exhibits a substantial exacerbation, with the 5-year drought magnitude worsening by around 9% compared to the historical records.

There is plenty more to explore! The stochastic weather generator is suitable to quickly simulate a long sequence of weather variables that reflect any climate change of interest. Keep an eye out for upcoming additions to the repository in the coming months, and do not hesitate to contact us or create a GitHub issue if you need clarification.

References

Gershunov, A., Shulgina, T., Clemesha, R.E.S. et al. (2019). Precipitation regime change in Western North America: The role of Atmospheric Rivers. Scientific Reports, 9, 9944. https://doi.org/10.1038/s41598-019-46169-w.

Gupta, R.S., Steinschneider S., Reed, P.M. Understanding Contributions of Paleo-Informed Natural Variability and Climate Changes on Hydroclimate Extremes in the Central Valley Region of California. Authorea. March 13, 2023. https://doi.org/10.22541/essoar.167870424.46495295/v1

Najibi, N., and Steinschneider, S. (2023). Extreme precipitation-temperature scaling in California: The role of Atmospheric Rivers, Geophysical Research Letters, 50(14), 1–11, e2023GL104606. https://doi.org/10.1029/2023GL104606.

Pendergrass, A.G., and Hartmann, D.L. (2014). Two modes of change of the distribution of rain, Journal of Climate, 27(22), 8357-8371. https://doi.org/10.1175/JCLI-D-14-00182.1

Pfahl, S., O’Gorman, P.A., Fischer, E.M. (2017). Understanding the regional pattern of projected future changes in extreme precipitation, Nature Climate Change, 7 (6), 423-427. http://dx.doi.org/10.1038/nclimate3287

Steinschneider, S., Ray, P., Rahat, S. H., & Kucharski, J. (2019). A weather‐regime‐based stochastic weather generator for climate vulnerability assessments of water systems in the western United States. Water Resources Research, 55(8), 6923-6945. https://doi.org/10.1029/2018WR024446

Figure Library Part 2 – Visualizations Supporting Synthetic Streamflow Diagnostics

Motivation

This is the second post in a series which was started by Dave last week and seeks to make a library of plotting functions for common visualization tasks. In this post, I share some plotting functions I have developed for synthetic streamflow diagnostic visualizations.

A thorough review of synthetic streamflow verification is outside the scope of this post, but in short we want to verify that the synthetic timeseries have statistical properties similar to the historic streamflow. Additionally, we want to see that the synthetic timeseries reflect the range of internal variability present in the basin by showing that a large synthetic ensemble envelopes the limited historic observations.

The following functions can be used to compare the following properties of historic and synthetic streamflow:

  • Flow magnitude
  • Non-exceedance probabilities (flow duration curves)
  • Autocorrelation
  • Spatial correlation across locations

Diagnostic plotting functions

  • plot_flow_ranges()
    • Range and median of flow values in historic and synthetic flow timeseries.
  • plot_fdc_ranges()
    • The range of annual flow duration curves (FDCs) for historic and synthetic streamflow timeseries.
  • plot_autocorrelation()
    • Comparison of historic and synthetic autocorrelation across a range of lag values.
  • plot_spatial_correlation()
    • Heatmap visualizations of Pearson correlation between different flow locations in the historic data and a single synthetic realization.

These functions are designed to accept historic (Qh) and synthetic (Qs) data with the following properties:

  • Qh : A pd.Series containing a single historic streamflow timeseries. The index of this data must be type pd.DatetimeIndex.
  • Qs : A pd.DataFrame containing many synthetic streamflow timeseries. The columns should each contain a single synthetic realization and the index must be type pd.DatetimeIndex. The column names do not matter.

Additionally, each function (excluding plot_fdc_ranges()) has a timestep keyword argument which changes the temporal scale of the flow being considered. Options are ‘daily’, ‘weekly’, or ‘monthly’, and the pandas.groupby() function is used to aggregate data accordingly to allow for comparison of statistics at different temporal scales. The default is ‘daily’.

Streamflow data

The historic data used in this example is taken from the Delaware River Basin. Specifically, streamflow data from the USGS gauge at Lordville, NY, on the upper portion of the Delaware River is used.

The synthetic streamflow was generated using the Kirsch-Nowak [1, 2] synthetic streamflow generator. For more detail on this generator, see Julie’s post here: Open Source Streamflow Generator Part 1.

I created 100 synthetic realizations of 30-year, daily streamflow timeseries at the Lordville gauge.

The historic and synthetic streamflow timeseries are assumed to start at the same date. Most of the plotting functions below do not explicitly consider the date associated with the timeseries, however comparisons between flow statistics across some range of time assume that the timeseries are aligned for the same dates.


Flow magnitude range

This first plot simply shows the range (max, min) and median of flows for both historic and synthetic datasets.

def plot_flow_ranges(Qh, Qs, 
                     timestep = 'daily',
                     units = 'cms', y_scale = 'log',
                     savefig = False, fig_dir = '.',
                     figsize = (7,5), colors = ['black', 'orange'],
                     title_addon = ""):
    """Plots the range of flow for historic and syntehtic streamflows for a specific timestep scale.
    
    Args:
        Qh (pd.Series): Historic daily streamflow timeseries. Index must be pd.DatetimeIndex. 
        Qs (pd.DataFrame): Synthetic daily streamflow timeseries realizations. Each column is a unique realization. Index must be pd.DatetimeIndex.
        timestep (str, optional): The timestep which data should be aggregated over. Defaults to 'daily'. Options are 'daily', 'weekly', or 'monthly'.
        units (str, optional): Streamflow units, for axis label. Defaults to 'cms'.
        y_scale (str, optional): Scale of the y-axis. Defaults to 'log'.
        savefig (bool, optional): Allows for png to be saved to fig_dir. Defaults to False.
        fig_dir (str, optional): Location of saved figure output. Defaults to '.' (working directory).
        figsize (tuple, optional): The figure size. Defaults to (4,4).
        colors (list, optional): List of two colors for historic and synthetic data respectively. Defaults to ['black', 'orange'].
        title_addon (str, optional): Text to be added to the end of the title. Defaults to "".
    """

    # Assert formatting matches expected
    assert(type(Qh.index) == pd.DatetimeIndex), 'Historic streamflow (Qh) should have pd.DatatimeIndex.'
    assert(type(Qs.index) == pd.DatetimeIndex), 'Synthetic streamflow (Qh) should have pd.DatatimeIndex.'

    # Handle conditional datetime formatting
    if timestep == 'daily':
        h_grouper = Qh.index.dayofyear
        s_grouper = Qs.index.dayofyear
        x_lab = 'Day of the Year (Jan-Dec)'
    elif timestep == 'monthly':
        h_grouper = Qh.index.month
        s_grouper = Qs.index.month
        x_lab = 'Month of the Year (Jan-Dec)'
    elif timestep == 'weekly':
        h_grouper = pd.Index(Qh.index.isocalendar().week, dtype = int)
        s_grouper = pd.Index(Qs.index.isocalendar().week, dtype = int)
        x_lab = 'Week of the Year (Jan-Dec)'
    else:
        print('Invalid timestep input. Options: "daily", "monthly", "weekly".')
        return

    # Find flow ranges
    s_max = Qs.groupby(s_grouper).max().max(axis=1)
    s_min = Qs.groupby(s_grouper).min().min(axis=1)
    s_median = Qs.groupby(s_grouper).median().median(axis=1)
    h_max = Qh.groupby(h_grouper).max()
    h_min = Qh.groupby(h_grouper).min()
    h_median = Qh.groupby(h_grouper).median()
  
    ## Plotting  
    fig, ax = plt.subplots(figsize = figsize, dpi=150)
    xs = h_max.index
    ax.fill_between(xs, s_min, s_max, color = colors[1], label = 'Synthetic Range', alpha = 0.5)
    ax.plot(xs, s_median, color = colors[1], label = 'Synthetic Median')
    ax.fill_between(xs, h_min, h_max, color = colors[0], label = 'Historic Range', alpha = 0.3)
    ax.plot(xs, h_median, color = colors[0], label = 'Historic Median')
    
    ax.set_yscale(y_scale)
    ax.set_ylabel(f'{timestep.capitalize()} Flow ({units})', fontsize=12)
    ax.set_xlabel(x_lab, fontsize=12)
    ax.legend(ncols = 2, fontsize = 10, bbox_to_anchor = (0, -.5, 1.0, 0.2), loc = 'upper center')    
    ax.grid(True, linestyle='--', linewidth=0.5)
    ax.set_title(f'{timestep.capitalize()} Streamflow Ranges\nHistoric & Synthetic Timeseries at One Location\n{title_addon}')
    plt.tight_layout()
    
    if savefig:
        plt.savefig(f'{fig_dir}/flow_ranges_{timestep}.png', dpi = 150)
    return

Assuming I have loaded a pd.DataFrame named Qs which contains synthetic timeseries in each column, and a single historic timeseries Qh, I can quickly plot daily (or weekly or monthly) flow ranges using:

# Useage
plot_flow_ranges(Qh, Qs, timestep='daily')

This figure provides a lot of useful information about our synthetic ensemble:

  • The synthetic ensembles envelope the limited historic values, and include more extreme high and low flows
  • The seasonal trends in the synthetics align with the historic
  • The median of the synthetic ensemble aligns very closely with the median of the historic data.

Flow duration curve ranges

The plot_fdc_ranges() function will calculate an FDC for each year of the historic and synthetic data, and plot the ranges that these FDCs take. It will also "total" FDCs calculated for the entirety of the historic or synthetic records.

def plot_fdc_ranges(Qh, Qs, 
                    units = 'cms', y_scale = 'log',
                    savefig = False, fig_dir = '.',
                    figsize = (5,5), colors = ['black', 'orange'],                   
                    title_addon = ""):
    """Plots the range and aggregate flow duration curves for historic and synthetic streamflows.
    
    Args:
        Qh (pd.Series): Historic daily streamflow timeseries. Index must be pd.DatetimeIndex. 
        Qs (pd.DataFrame): Synthetic daily streamflow timeseries realizations. Each column is a unique realization. Index must be pd.DatetimeIndex.
        units (str, optional): Streamflow units, for axis label. Defaults to 'cms'.
        y_scale (str, optional): Scale of the y-axis. Defaults to 'log'.
        savefig (bool, optional): Allows for png to be saved to fig_dir. Defaults to False.
        fig_dir (str, optional): Location of saved figure output. Defaults to '.' (working directory).
        figsize (tuple, optional): The figure size. Defaults to (4,4).
        colors (list, optional): List of two colors for historic and synthetic data respectively. Defaults to ['black', 'orange'].
        title_addon (str, optional): Text to be added to the end of the title. Defaults to "".
    """
    
    ## Assertions
    assert(type(Qs) == pd.DataFrame), 'Synthetic streamflow should be type pd.DataFrame.'
    assert(type(Qh.index) == pd.DatetimeIndex), 'Historic streamflow (Qh) should have pd.DatatimeIndex.'
    assert(type(Qs.index) == pd.DatetimeIndex), 'Synthetic streamflow (Qh) should have pd.DatatimeIndex.'

    
    # Calculate FDCs for total period and each realization
    nonexceedance = np.linspace(0.0001, 0.9999, 50)
    s_total_fdc = np.quantile(Qs.values.flatten(), nonexceedance)
    h_total_fdc = np.quantile(Qh.values.flatten(), nonexceedance) 
    
    s_fdc_max = np.zeros_like(nonexceedance)
    s_fdc_min = np.zeros_like(nonexceedance)
    h_fdc_max = np.zeros_like(nonexceedance)
    h_fdc_min = np.zeros_like(nonexceedance)

    annual_synthetics = Qs.groupby(Qs.index.year)
    annual_historic = Qh.groupby(Qh.index.year)

    for i, quant in enumerate(nonexceedance):
            s_fdc_max[i] = annual_synthetics.quantile(quant).max().max()
            s_fdc_min[i] = annual_synthetics.quantile(quant).min().min()
            h_fdc_max[i] = annual_historic.quantile(quant).max()
            h_fdc_min[i] = annual_historic.quantile(quant).min()
    
    ## Plotting
    fig, ax = plt.subplots(figsize=figsize, dpi=200)
    
    #for quant in syn_fdc_quants:
    ax.fill_between(nonexceedance, s_fdc_min, s_fdc_max, color = colors[1], label = 'Synthetic Annual FDC Range', alpha = 0.5)
    ax.fill_between(nonexceedance, h_fdc_min, h_fdc_max, color = colors[0], label = 'Historic Annual FDC Range', alpha = 0.3)

    ax.plot(nonexceedance, s_total_fdc, color = colors[1], label = 'Synthetic Total FDC', alpha = 1)
    ax.plot(nonexceedance, h_total_fdc, color = colors[0], label = 'Historic Total FDC', alpha = 1, linewidth = 2)

    ax.grid(True, linestyle='--', linewidth=0.5)
    ax.set_yscale(y_scale)
    ax.set_ylabel(f'Flow ({units})')
    ax.set_xlabel('Non-Exceedance Probability')
    ax.legend(fontsize= 10)
    ax.grid(True, linestyle='--', linewidth=0.5)
    
    plt.title(f'Flow Duration Curve Ranges\nHistoric & Synthetic Streamflow\n{title_addon}')
    if savefig:
        plt.savefig(f'{fig_dir}/flow_duration_curves_{title_addon}.png', dpi=200)
    plt.show()
    return

## Usage
plot_fdc_ranges(Qh, Qs)

Similar to the flow range plot above, we see that the synthetic ensemble is nicely enveloping the historic data; there are synthetic instances where the extreme flows in a particular year are more extreme than historic observations. However, the FDC of the total synthetic ensemble (if all synthetics were combined) is almost exactly equal to the total historic FDC.

Autocorrelation

Hydrologic systems are known to have memory, or correlation between streamflow at different points in time. We want to make sure that our synthetic streamflow timeseries have similar memory or autocorrelation.

The plot_autocorrelation() function will calculate autocorrelation for the historic record, and each synthetic realization for a range of different lag values. In addition to the required Qh and Qs arguments this function requires lag_range which should be a list or array of lag values to be considered.

def plot_autocorrelation(Qh, Qs, lag_range, 
                         timestep = 'daily',
                         savefig = False, fig_dir = '.',
                         figsize=(7, 5), colors = ['black', 'orange'],
                         alpha = 0.3):
    """Plot autocorrelation of historic and synthetic flow over some range of lags.

    Args:
        Qh (pd.Series): Historic daily streamflow timeseries. Index must be pd.DatetimeIndex. 
        Qs (pd.DataFrame): Synthetic daily streamflow timeseries realizations. Each column is a unique realization. Index must be pd.DatetimeIndex.
        lag_range (iterable): A list or range of lag values to be used for autocorrelation calculation.
        
        timestep (str, optional): The timestep which data should be aggregated over. Defaults to 'daily'. Options are 'daily', 'weekly', or 'monthly'.
        savefig (bool, optional): Allows for png to be saved to fig_dir. Defaults to False.
        fig_dir (str, optional): Location of saved figure output. Defaults to '.' (working directory).
        figsize (tuple, optional): The figure size. Defaults to (4,4).
        colors (list, optional): List of two colors for historic and synthetic data respectively. Defaults to ['black', 'orange'].
        alpha (float, optional): The opacity of synthetic data. Defaults to 0.3.
    """
    
    ## Assertions
    assert(type(Qs) == pd.DataFrame), 'Synthetic streamflow should be type pd.DataFrame.'
    assert(type(Qh.index) == pd.DatetimeIndex), 'Historic streamflow (Qh) should have pd.DatatimeIndex.'
    assert(type(Qs.index) == pd.DatetimeIndex), 'Synthetic streamflow (Qh) should have pd.DatatimeIndex.'

    if timestep == 'monthly':
        Qh = Qh.resample('MS').sum()
        Qs = Qs.resample('MS').sum()
        time_label = 'months'
    elif timestep == 'weekly':
        Qh = Qh.resample('W-SUN').sum()
        Qs = Qs.resample('W-SUN').sum()
        time_label = f'weeks'
    elif timestep == 'daily':
        time_label = f'days'
        
    # Calculate autocorrelations
    autocorr_h = np.zeros(len(lag_range))
    confidence_autocorr_h = np.zeros((2, len(lag_range)))
    for i, lag in enumerate(lag_range):
        h_corr = pearsonr(Qh.values[:-lag], Qh.values[lag:])
        autocorr_h[i] = h_corr[0]
        confidence_autocorr_h[0,i] = h_corr[0] - h_corr.confidence_interval().low
        confidence_autocorr_h[1,i] = h_corr.confidence_interval().high - h_corr[0]
    
    autocorr_s = np.zeros((Qs.shape[1], len(lag_range)))
    for i, realization in enumerate(Qs.columns):
        autocorr_s[i] = [pearsonr(Qs[realization].values[:-lag], Qs[realization].values[lag:])[0] for lag in lag_range]


    ## Plotting
    fig, ax = plt.subplots(figsize=figsize, dpi=200)

    # Plot autocorrelation for each synthetic timeseries
    for i, realization in enumerate(Qs.columns):
        if i == 0:
            ax.plot(lag_range, autocorr_s[i], alpha=1, color = colors[1], label=f'Synthetic Realization')
        else:
            ax.plot(lag_range, autocorr_s[i], color = colors[1], alpha=alpha, zorder = 1)
        ax.scatter(lag_range, autocorr_s[i], alpha=alpha, color = 'orange', zorder = 2)

    # Plot autocorrelation for the historic timeseries
    ax.plot(lag_range, autocorr_h, color=colors[0], linewidth=2, label='Historic', zorder = 3)
    ax.scatter(lag_range, autocorr_h, color=colors[0], zorder = 4)
    ax.errorbar(lag_range, autocorr_h, yerr=confidence_autocorr_h, 
                color='black', capsize=4, alpha=0.75, label ='Historic 95% CI')

    # Set labels and title
    ax.set_xlabel(f'Lag ({time_label})', fontsize=12)
    ax.set_ylabel('Autocorrelation (Pearson)', fontsize=12)
    ax.set_title('Autocorrelation Comparison\nHistoric Timeseries vs. Synthetic Ensemble', fontsize=14)

    ax.legend(fontsize=10)
    ax.grid(True, linestyle='--', linewidth=0.5)
    plt.subplots_adjust(left=0.12, right=0.95, bottom=0.12, top=0.92)

    if savefig:
        plt.savefig(f'{fig_dir}/autocorrelation_plot_{timestep}.png', dpi=200, bbox_inches='tight')
    plt.show()
    return

Let’s visualize the autocorrelation of daily streamflow for a maximum of 100-day lag.

## Usage
plot_autocorrelation(Qh, Qs, lag_range=range(1,100, 5), timestep='daily') 

Spatial Correlation

The plot_spatial_correlation() function generates a heatmap visualization of the Pearson correlation matrix for streamflow at different locations in both the historic and synthetic datasets.

The plot_spatial_correlation() function is different from the other three functions shown above, since it requires a set of timeseries data for multiple different locations.

Rather than providing a set of many different realizations for a single site we need to provide a single realization of synthetics across all sites of interest, with a timeseries for each site in each column. This new input is Qs_i, and must the same columns as Qh (they both must contain a timeseries for each location.

def plot_correlation(Qh, Qs_i,
                        timestep = 'daily',
                        savefig = False, fig_dir = '.',
                        figsize = (8,4), color_map = 'BuPu',
                        cbar_on = True):
    """Plots the side-by-side heatmaps of flow correlation between sites for historic and synthetic multi-site data.
    
    Args:
        Qh (pd.DataFrame): Historic daily streamflow timeseries at many different locations. The columns of Qh and Qs_i must match, and index must be pd.DatetimeIndex. 
        Qs (pd.DataFrame): Synthetic daily streamflow timeseries realizations. Each column is a unique realization. The columns of Qh and Qs_i must match, and index must be pd.DatetimeIndex.
        timestep (str, optional): The timestep which data should be aggregated over. Defaults to 'daily'. Options are 'daily', 'weekly', or 'monthly'.
        savefig (bool, optional): Allows for png to be saved to fig_dir. Defaults to False.
        fig_dir (str, optional): Location of saved figure output. Defaults to '.' (working directory).
        figsize (tuple, optional): The figure size. Defaults to (4,4).
        color_map (str, optional): The colormap used for the heatmaps. Defaults to 'BuPu.
        cbar_on (bool, optional): Indictor if the colorbar should be shown or not. Defaults to True.
    """
    
    ## Assertions
    assert(type(Qh) == type(Qs_i) and (type(Qh) == pd.DataFrame)), 'Both inputs Qh and Qs_i should be pd.DataFrames.'
    assert(np.mean(Qh.columns == Qs_i.columns) == 1.0), 'Historic and synthetic data should have same columns.'
    
    if timestep == 'monthly':
        Qh = Qh.resample('MS').sum()
        Qs_i = Qs_i.resample('MS').sum()
    elif timestep == 'weekly':
        Qh = Qh.resample('W-SUN').sum()
        Qs_i = Qs_i.resample('W-SUN').sum()
    
    ## Plotting
    fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=figsize, dpi = 250)
    
    # Heatmap of historic correlation
    h_correlation = np.corrcoef(Qh.values.transpose())
    sns.heatmap(h_correlation, ax = ax1, 
                square = True, annot = False,
                xticklabels = 5, yticklabels = 5, 
                cmap = color_map, cbar = False, vmin = 0, vmax = 1)

    # Synthetic correlation
    s_correlation = np.corrcoef(Qs_i.values.transpose())
    sns.heatmap(s_correlation, ax = ax2,
                square = True, annot = False,
                xticklabels = 5, yticklabels = 5, 
                cmap = color_map, cbar = False, vmin = 0, vmax = 1)
    
    for axi in (ax1, ax2):
        axi.set_xticklabels([])
        axi.set_yticklabels([])
    ax1.set(xlabel = 'Site i', ylabel = 'Site j', title = 'Historic Streamflow')
    ax2.set(xlabel = 'Site i', ylabel = 'Site j', title = 'Synthetic Realization')
    title_string = f'Pearson correlation across different streamflow locations\n{timestep.capitalize()} timescale\n'
        
    # Adjust the layout to make room for the colorbar
    if cbar_on:
        plt.tight_layout(rect=[0, 0, 0.9, 0.85])
        cbar_ax = fig.add_axes([0.92, 0.1, 0.02, 0.6])  # [left, bottom, width, height]
        cbar = fig.colorbar(ax1.collections[0], cax=cbar_ax)
        cbar.set_label("Pearson Correlation")
    plt.suptitle(title_string, fontsize = 12)
    
    if savefig:
        plt.savefig(f'{fig_dir}/spatial_correlation_comparison_{timestep}.png', dpi = 250)
    plt.show()
    return

To use this function, I need to load a different pd.DataFrame containing synthetic timeseries at all of my study locations.

## Usage
single_synthetic_all_sites = pd.read_csv('synthetic_streamflow_single_realization_all_sites.csv', sep=',', index_col=0, parse_dates=True)

plot_correlation(Qh_all_sites, single_synthetic_all_sites, timestep='weekly')

Across the different streamflow sites, we see that the synthetic timeseries do a decent job at preserving spatial correlation, although correlation is decreased for sites which are not strongly correlated in the historic record.

Conclusions

I hope these functions save you some time in your own work, and allow you expand your synthetic diagnostic considerations.

References

[1] Kirsch, B. R., Characklis, G. W., & Zeff, H. B. (2013). Evaluating the impact of alternative hydro-climate scenarios on transfer agreements: Practical improvement for generating synthetic streamflows. Journal of Water Resources Planning and Management139(4), 396-406.

[2] Nowak, K., Prairie, J., Rajagopalan, B., & Lall, U. (2010). A nonparametric stochastic approach for multisite disaggregation of annual to daily streamflow. Water Resources Research46(8).

A time-saving tip for simulating multiple scenarios

A time-saving tip for simulating multiple scenarios

Last week I was reading a paper by Thomlinson, Arnott, and Harou (2020), A water resource simulator in Python. In this, they describe a method of evaluating many simulation scenarios (or hydrologic realizations) simultaneously, and claim that this approach results in significant improvements in computational efficiency.

I was struck because I have been constructing similar multi-scenario simulations using a different approach, by evaluating one scenario at a time.

After having thought I perfected my code, was it possible that a slight modification to the implementation could significantly improve the efficiency? (Spoiler: Yes! By many hours, in fact.)

I decided to post this here (A) as a simple demonstration in comparing runtime metrics of two different algorithm approaches, and (B) as a reminder to remain open to the possibility of simple yet significant improvements in code performance.

A GitHub repository containing all of the relevant Python scripts used below can be accessed here.

Introduction

There are many reasons why an analyst might want to perform simulation of many different scenarios. For example, when simulating a water resource policy which is affected by stochastic exogenous forces, it is preferable to evaluate the policy under many different realizations of the stochastic variable. The performance of that policy can then be measured as an aggregation of the performance under the different realizations.

How one decides to simulate the different realizations can have significant impacts on the efficiency of the program.

Here, I compare two different implementations for many-scenario evaluation, and demonstrate a comparison of the methods using the shallow lake problem (Carpenter et al., 1999).

The two implementation techniques are shown graphically in the figure below.

Figure: Visual representation of the two different multi-scenario simulation implementations discusses in this post. (This figure was inspired by a similar figure contained in a Pywr presentation made available by James Thomlinson)

In the past, I (and others who came before me) have used what I refer to as a serial implementation, where the realizations are looped-through one-by-one, evaluating the model over the entire simulation time period before moving on to the next realization. Now, I am contrasting that with what I am calling the block implementation. For the block implementation, the simulation is evaluated for all realizations before progressing to the subsequent time step.

When viewed this way, it may seem obvious that the block simulation will perform more efficiently. The model is applied to the columns rather than the individual elements of a matrix of realizations or scenarios. However, hindsight is 20-20, and if you’re a water resource modeler with an engineering background (such as myself) this may not be apparent to you when constructing the model.

Now, onto the demonstration.

The Lake Problem

The shallow lake problem, as introduced by Carpenter et al. (1999), was used as a case study for design of multi-objective pollution policies by Professor Julie Quinn and has been explored in depth on this blog. While a comprehensive understanding of the lake problem, and the corresponding model, is not necessary for understanding this post, I will provide a brief summary here.

The problem centers around a hypothetical town located near a shallow lake. The town would like to discharge nutrient pollution (e.g., waste water treatment plant effluent) into the lake, while balancing the economic benefits (i.e., cost savings associated with the discharge) and environmental objectives (i.e., preserving the ecological health of the lake).

The pollution dynamics of the lake are described by a non-linear model representing pollution fluxes into and out of the lake:

X_{t+1} = X_t + a_t + Y_t + \frac{X_t^q}{1+X_t^q} - bX_t

Where X_t is the pollution concentration in the lake, a_t is the anthropogenically pollution discharged from the town, Y_t \approx Log-Normal(\mu, \sigma) is the natural pollution inflow into the lake and follows a log-normal distribution, q is the rate at which pollution is recycled from the sediment into the water column, and b is the rate at which the nutrient pollution is removed from the lake due to natural processes.

The non-linearity of the nutrient dynamics shown above result in the presence of an unstable equilibrium or ecological tipping point beyond which the lake becomes permanently eutrophic (ecologically unhealthy).

The problem then becomes identifying pollution policies that prescribe a preferable amount of pollution, a_t during each year while not crossing the ecological tipping point.

The lake model

A model can be constructed which simulates a pollution policy and measures corresponding environmental and economic performance objectives over a 100-year period. In her 2017 publication, Prof. Quinn shows how an adaptive pollution policy, which determines pollution discharges rates conditional upon the current lake pollution level, can result in greater system performance and robustness. However, for the sake of simplicity in this demonstration I am modeling an intertemporal pollution policy which consists of a set of annual pollution discharge quantities (release_decision in the following code). I selected one pollution policy from a set of pareto-approximate policies generated previously. To learn more about how optimal pollution discharge policies can be generated for this model, see the blog post here.

Since natural nutrient pollution inflow into the lake is stochastic, it is best to evaluate a single policy for many different realizations of natural inflow. The objective performance can then be measured as the summary (e.g., mean, sum, or max) of the performance across the different realizations.

Below, I am going to show two different ways of implementing multi-realization simulations corresponding to the two approaches described in the Introduction.

I’ve separated the common and and unique features of the two different implementations, to highlight the differences. Both models begin by defining a set of parametric constants, initializing variables for later use, and by calculating the ecological tipping point (critical pollution threshold):

def lake_model(release_decision):

    # Choose the number of stochastic realizations
    n_realizations = 100

    #Initialize conditions
    n_years = 100
    n_objs = 4
    n_time = 100

    # Constants
    reliab_threshold = 0.85
    inertia_threshold = 0.02
    b = 0.42
    q = 2.0
    delta = 0.98
    X0 = 0
    alpha = 0.4
    mu = 0.5
    sigma = np.sqrt(10**-0.5)

    # Initialize variables
    years_inertia_met = np.zeros(n_realizations)
    years_reliability_met = np.zeros(n_realizations)
    expected_benefits = np.zeros(n_realizations)
    average_annual_concentration = np.zeros(n_realizations)
    objs = [0.0]*n_objs

    # Identify the critical pollution threshold
    #Function defining x-critical. X-critical exists for flux = 0.
    def flux(x):
            f = pow(x,q)/(1+pow(x,q)) - b*x
            return f

    # solve for the critical threshold with unique q and b
    Xcrit = sp.bisect(flux, 0.01, 1.0)
    
    ...

Serial simulation

For the serial model evaluation, lake quality over a 100-year period is evaluated for one realization at a time. A single realization of stochastic nutrient pollution inflow, Y_t is generated and the policy is evaluated under these conditions. This is repeated 100 times for 100 unique realizations of natural inflow.


def serial_lake_model(release_decision):

    ...

    # Begin evaluation; evaluate one realization at a time
    for s in range(n_realizations):

        ### Generate a single inflow_realization
        inflow_realizations = np.random.lognormal(mean = mu, sigma = sigma, size = n_time)

        # Initialize a new matrix of lake-state variables
        lake_state = np.zeros(n_years)

        # Set the initial lake pollution level
        lake_state[0] = X0

        for t in range(n_years-1):

            lake_state[t+1] = lake_state[t] + release_decision[t] + inflow_realizations[t] + (lake_state[t]**q)/(1 + lake_state[t]**q)

            expected_benefits[s] += alpha * release_decision[t] * delta**(t+1)

            # Check if policy meets inertia threshold
            if t>= 1:
                years_inertia_met[s] += (abs(release_decision[t-1] - release_decision[t]) < inertia_threshold) * 1

            # Check if policy meets reliability threshold
            years_reliability_met[s] += (lake_state[t] < Xcrit) * 1

        average_annual_concentration[s] = np.mean(lake_state)

    # Calculate objective scores
    objs[0] = -np.mean(expected_benefits)
    objs[1] = max(average_annual_concentration)
    objs[2] = -np.sum(years_inertia_met / (n_years - 1))/n_realizations
    objs[3] = -np.sum(years_reliability_met)/(n_realizations * n_years)

    return objs

This is the way I have written this program in the past, and the way it has been presented previously in this blog.

Block Simulation

I’ve now re-written the model using the block simulation method, as shown in Figure 1. Rather than taking a single timeseries of natural inflow at a time, I produce a matrix of 100-realizations and perform calculations one-timestep (or column) at a time.

I recommend you pause to compare the above and below code segments, to make sure that you have a solid understanding of the differences in implementation before looking at the results.

def matrix_lake_model(release_decision):

    ...
    
    ### Create a matrix contaiining all inflow realizations
    inflow_realizations = np.zeros((n_realizations, n_time))
    for s in range(n_realizations):
        inflow_realizations[s, :] = np.random.lognormal(mean = mu, sigma = sigma, size = n_time)

    # Begin evaluation; evaluate all realizations one time step at a time
    for t in range(n_years-1):

        lake_state[:,t+1] = lake_state[:,t] + release_decision[t] + inflow_realizations[:,t] + (lake_state[:,t]**q)/(1 + lake_state[:,t]**q)

        expected_benefits[:] += alpha * release_decision[t] * delta**(t+1)

        # Check if policy meets inertia threshold
        if t>= 1:
            years_inertia_met += (abs(release_decision[t-1] - release_decision[t]) < inertia_threshold) * 1

        # Check if policy meets reliability threshold
        years_reliability_met += (lake_state[:,t] < Xcrit) * 1

    # Calculate objective scores
    objs[0] = -np.mean(expected_benefits)
    objs[1] = max(np.mean(lake_state, axis = 1))
    objs[2] = -np.sum(years_inertia_met / (n_years - 1))/n_realizations
    objs[3] = -np.sum(years_reliability_met)/(n_realizations * n_years)

    return objs

Running the same pollution policy in both of the above models, I first made sure that the output objective scores were identical. Having verified that, I set up some simple timing tests.

Comparison of evaluation time efficiency

I used the built-in time module in Python to perform runtime tests to compare the two implementation methods. The time module contains the function perf_counter() which makes a precise recording of the time at which it is called.

Calling perf_counter() before and after model evaluation provides a simple way of measuring the evaluation time. An example of this time measurement is shown below:

import time
import lake_model

start = time.perf_counter()
model_output = lake_model(release_decision, n_realizations)
end = time.perf_counter()
    
model_runtime = end - start

When performing an optimization of the pollution release policy using the Borg multi-objective evolutionary algorithm (MOEA), it is standard for our group to simulate each policy for 100-realizations of hydrologic stochasticity. The objective scores are calculated as some aggregation of the performance across those realizations.

With this in mind, I first measured the difference in runtime between the two simulation methods for 100-realizations of stochasticity.

I found the model using the block implementation, evaluating all realizations one time-step at a time, was 0.048 seconds faster then evaluating the 100-realizations in series. That may seem marginal at first, but how does this difference in speed scale when performing many thousands of model evaluations throughout an optimization?

Previously, when optimizing the release decisions for the lake problem, I allowed the Borg MOEA to run for a maximum of 50,000 model evaluations per seed, and repeated this process for 20 random seeds… at a savings of 0.048 seconds per evaluation, I would have saved 13.3 hours of computation time if I had used the block implementation technique!

This difference grows exponentially with the number of realizations performed in each evaluation. In the figure below, I am showing runtime of the block implementation relative to the serial implementation (I.e., \frac{T_{simul}}{T_{serial}}) for an increasingly large number of realizations:

Figure: Runtime of the matrix implementation technique relative to the serial technique, for a range of realization numbers.

Conclusions

In a stochastic world, evaluation of water resource policies under a broad ensemble of scenarios allows for the selection of more policies that are more robust under uncertain factors. Writing code that is computationally burdensome can limit the size of the ensemble, and ultimately limit the analysis.

When working with high performance computing resources, researchers are often limited in allotted compute-hours, or the time spent processing a program on the machine. Given this context, it is highly valuable to identify time-saving measures such as the block implementation presented here.

Ultimately, this experimentation was pleasantly humbling. I hope that this serves as motivation to approach one of your existing models with a fresh perspective. At the least, I hope to keep you from making the same costly mistake as me.

I want to shout out Thomlinson, Arnott, and Harou again for writing the paper that sparked this re-assessment, check out their water resource simulator for solving resource allocation problems, Pywr, here.

A non-intimidating introduction to parallel computing with Numba

This blog post is adapted from material I learned during the 2021 San Diego Supercomputer Center (SDSC) Summer Institute. This was an introductory boot camp to high-performance computing (HPC), and one of the modules taught the application of Numba for in-line parallelization and speeding up of Python code.

What is Numba?

According to its official web page, Numba is a just-in-time (JIT) compiler that translates subsets of Python and NumPy code into fast machine code, enabling it to run at speeds approaching that of C or Fortran. This is becuase JIT compilation enables specific lines of code to be compiled or activated only when necessary. Numba also makes use of cache memory to generate and store the compiled version of all data types entered to a specific function, which eliminates the need for recompilation every time the same data type is called when a function is run.

This blog post will demonstrate a simple examples of using Numba and its most commonly-used decorator, @jit, via Jupyter Notebook. The Binder file containing all the executable code can be found here.

Note: The ‘@‘ flag is used to indicate the use of a decorator

Installing Numba and Setting up the Jupyter Notebook

First, in your command prompt, enter:

pip install numba

Alternatively, you can also use:

conda install numba

Next, import Numba:

import numpy as np
import numba
from numba import jit
from numba import vectorize

Great! Now let’s move onto using the @jit decorator.

Using @jit for executing functions on the CPU

The @jit decorator works best on numerical functions that use NumPy. It has two modes: nopython mode and object mode. Setting nopython=True tell the compiler to overlook the involvement of the Python interpreter when running the entire decorated function. This setting leads to the best performance. However, in the case when:

  1. nopython=True fails
  2. nopython=False, or
  3. nopython is not set at all

the compiler defaults to object mode. Then, Numba will manually identify loops that it can compile into functions to be run in machine code, and will run the remaining code in the interpreter.

Here, @jit is demonstrated on a simple matrix multiplication function:

# a function that does multiple matrix multiplication
@jit(nopython=True)
def matrix_multiplication(A, x):
    b = np.empty(shape=(x.shape[0],1), dtype=np.float64)
    for i in range(x.shape[0]):
        b[i] = np.dot(A[i,:], x)
    return b

Remember – the use of @jit means that this function has not been compiled yet! Compilation only happens when you call the function:

A = np.random.rand(10, 10)
x = np.random.rand(10, 1)
a_complicated_function(A,x)

But how much faster is Numba really? To find out, some benchmarking is in order. Jupyter Notebook has a handy function called %timeit that runs simple functions many times in a loop to get their average execution time, that can be used as follows:

%timeit matrix_multiplication(A,x)

# 11.4 µs ± 7.34 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

Numba has a special .py_func attribute that effectively allows the decorated function to run as the original uncompiled Python function. Using this to compare its runtime to that of the decorated version,

%timeit matrix_multiplication.py_func(A,x)

# 35.5 µs ± 3.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

From here, you can see that the Numba version runs about 3 times faster than using only NumPy arrays. In addition to this, Numba also supports tuples, integers, floats, and Python lists. All other Python features supported by Numba can be found here.

Besides explicitly declaring @jit at the start of a function, Numba makes it simple to turn a NumPy function into a Numba function by attaching jit(nopython=True) to the original function. This essentially uses the @jit decorator as a function. The function to calculate absolute percentage relative error demonstrates how this is done:

# Calculate percentage relative error
def numpy_re(x, true):
    return np.abs(((x - true)/true))*100

numba_re = jit(nopython=True)(numpy_re)

And we can see how the Number version is faster:

%timeit numpy_re(x, 0.66)
%timeit numba_re(x, 0.66)

where the NumPy version takes approximately 2.61 microseconds to run, while the Numba version takes 687 nanoseconds.

Inline parallelization with Numba

The @jit decorator can also be used to enable inline parallelization by setting its parallelization pass parallel=True. Parallelization in Numba is done via multi-threading, which essentially creates threads of code that are distributed over all the available CPU cores. An example of this can be seen in the code snippet below, describing a function that calculates the normal distribution of a set of data with a given mean and standard deviation:

SQRT_2PI = np.sqrt(2 * np.pi)

@jit(nopython=True, parallel=True)
def normals(x, means, sds):
    n = means.shape[0]
    result = np.exp(-0.5*((x - means)/sds)**2)
    return (1 / (sds * np.sqrt(2*np.pi))) * result

As usual, the function must be compiled:

means = np.random.uniform(-1,1, size=10**8)
sds = np.random.uniform(0.1, 0.2, size=10**8)

normals(0.6, means, sds)

To appreciate the speed-up that Numba’s multi-threading provides, compare the runtime for this with:

  1. A decorated version of the function with a disabled parallel pass
  2. The uncompiled, original NumPy function

The first example can be timed by:

normals_deco_nothread = jit(nopython=True)(normals.py_func)
%timeit normals_deco_nothread(0.6, means, sds)

# 3.24 s ± 757 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The first line of the code snippet first makes an uncompiled copy of the normals function, and then applies the @jit decorator to it. This effectively creates a version of normals that uses @jit, but is not multi-threaded. This run of the function took approximately 3.3 seconds.

For the second example, simply:

%timeit normals.py_func(0.6, means, sds)

# 7.38 s ± 759 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Now, compare both these examples to the runtime of the decorated and multi-threaded normals function:

%timeit normals(0.6, means, sds)

# 933 ms ± 155 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The decorated, multi-threaded function is significantly faster (933 ms) than the decorated function without multi-threading (3.24 s), which in turn is faster than the uncompiled original NumPy function (7.38 s). However, the degree of speed-up may vary depending on the number of CPUs that the machine has available.

Summary

In general, the improvements achieved by using Numba on top of NumPy functions are marginal for simple, few-loop functions. Nevertheless, Numba is particularly useful for large datasets or high-dimensional arrays that require a large number of loops, and would benefit from the one-and-done compilation that it enables. For more information on using Numba, please refer to its official web page.

Visualizing the diversity of a Pareto front approximation using RadVis

During many-objective search we seek to find approximate Pareto fronts that are convergent, meaning they are close to the “true” Pareto front, and diverse, meaning they contain solutions that they uniformly cover the full range of the “true” front. This post will focus on examining the latter criteria for high dimensional problems using Radial Visualization (RadVis; Hoffman et al., 1997), a visualization technique that transforms a solution set into radial coordinates. I’ll discuss the theory behind RadVis, provide some example code, and discuss potential applications, limitations and extensions.

Background

When performing runtime diagnostics (a subject of two of my recent posts, which you can find here and here), we often asses MOEA performance using hypervolume because it captures both the convergence and diversity of the approximate Pareto front. Though tracking hypervolume provides us with an aggregate measure of diversity, it does not provide information on how the solutions are spread across the objective space. For low dimensional problems (two or three objectives), the diversity of the approximation set can easily be discerned through visualization (such as 2-D or 3-D scatter plots). For many-objective problems however, this becomes challenging. Tools such as parallel coordinate plots, bubble plots and glyph plots can allow us to visualize the solutions within an approximation set in their full dimensionality, but they are limited in their ability to convey information on diversity in high dimensional space. An alternative strategy for examining diversity within a solution set is to transform the objective space into a form coordinate system that highlights diversity. One common transformation is to view the set using radial coordinates.

RadVis

RadVis uses an analogy from physics to plot many dimensional data. Each of n-objectives are plotted uniformly around the circumference of a unit circle while, each solution is represented as a point that is attached to all objectives by a set of n hypothetical springs (one spring is attached to each objective-solution pair). A solution’s objective value for the ith objective defines the spring constant exerted by the ith spring. A point’s location thus represents the equilibrium point between all springs (Hoffman et al., 1997). For example, a point that has equal objective values for all n-objectives will lie directly in the center of the plot, while a point that maximizes one objective and minimizes all others will lie on the circumference of the circle at the location specified for the maximized objective.

Below, I’ve plotted RadVis plots for a 5 objective instance of DTLZ2 at runtime snapshots from 1,000 function evaluations (NFE), 5,000 NFE, 10,000 NFE and 50,000 NFE. Notice how the approximate front for 1,000 NFE is sparse and skewed toward one side, while the front for 50,000 NFE is fairly uniform and centered. DTLZ2 is a test problem with a known true Pareto front which is continuous and uniformly distributed, using this information, we can see that the algorithm has likely improved its approximation set over the course of the run.

RadVis representations of the approximate Pareto set of a 5 objective instance of the DTLZ2 test problem. Each subplot represents a different runtime snapshot.

I coded this example using the Yellowbrick machine learning visualization toolkit from sklearn. Code for this example has been added to my runtime diagnostics Github repository, which can be found here.

Caution! RadVis provides no information on convergence

It’s important to note that RadVis provides no information about solution convergence since the location of each point is dependent on equilibrium between all objectives. A set of solutions that performs poorly across all objectives will look similar to a set of solutions that performs well across all objectives. For this reason, RadVis should never be used alone to assess the quality of a Pareto front approximation. To include convergence information in RadVis, Ibrahim et al,. (2016) and Ibrahim et al., 2018 have proposed 3d-RadVis and 3D-RadVis antenna, extensions of RadVis methodology which augment Radvis by providing convergence information to decision makers. These methodologies may be subject to future posts.

References

Hoffman, P., Grinstein, G., Marx, K., Grosse, I., & Stanley, E. (1997, October). DNA visual and analytic data mining. In Proceedings. Visualization’97 (Cat. No. 97CB36155) (pp. 437-441). IEEE.

Ibrahim, A., Rahnamayan, S., Martin, M. V., & Deb, K. (2016, July). 3D-RadVis: Visualization of Pareto front in many-objective optimization. In 2016 IEEE Congress on Evolutionary Computation (CEC) (pp. 736-745). IEEE.

Ibrahim, A., Rahnamayan, S., Martin, M. V., & Deb, K. (2018). 3D-RadVis Antenna: Visualization and performance measure for many-objective optimization. Swarm and evolutionary computation, 39, 157-176.

Beyond Hypervolume: Dynamic visualization of MOEA runtime

Multi-objective evolutionary algorithms have become an essential tool for decision support in water resources systems. A core challenge we face when applying them to real world problems is that we don’t have analytic solutions to evaluate algorithmic performance, i.e. since we don’t know what solutions are possible before hand, we don’t have a point of reference to assess how well our algorithm is performing. One way we can gain insight into algorithmic performance is by examining runtime dynamics. To aid our understanding of the dynamics of the Borg MOEA, I’ve written a small Python library to read Borg runtime files and build a dynamic dashboard that visualizes algorithmic progress.

The Borg MOEA produces runtime files which track algorithmic parameters and the evolving Pareto approximate set over an optimization run. Often we use these data to calculate performance metrics, which provide information on the relative convergence of an approximation set and the diversity of solutions within it (for background on metrics see Joe’s post here). Commonly, generational distance, epsilon indicator and hypervolume are used to examine quality of the approximation set. An animation of these metrics for the 3 objective DTLZ2 test problem is shown in Figure 1 below.

Figure 1: Runtime metrics for the DTLZ2 test problem. The x-axis is number of function evaluations, the y-axis is the each individual metric

While these metrics provide a helpful picture of general algorithmic performance, they don’t provide insight into how the individual objectives are evolving or Borg’s operator dynamics.

Figure 2 shows a diagnostic dashboard of the same 3 objective DTLZ2 test problem run. I used the Celluloid python package to animate the figures. I like this package because it allows you to fully control each frame of the animation.

Figure 2: DTLZ2 runtime dynamics. The tree objectives are shown in a scatter plot and a parallel axis plot. The third figure plots hypervolume over the optimization run and the bottom figure shows Borg’s operator dynamics. (a higher resolution version of this file can be found here: https://www.dropbox.com/s/0nus5xhwqoqtghk/DTLZ2_runtime.gif?dl=0)

One thing we can learn from this dashboard is that though hypervolume starts to plateau around 3500 NFE, the algorithm is still working to to find solutions that represent an adequately diverse representation of the Pareto front. We can also observe that for this DTLZ2 run, the SPX and SBX operators were dominant. SBX is an operator tailored to problems with independent decision variables, like DTLZ2, so this results make sense.

I’m planning on building off this dashboard to include a broader suite of visualization tools, including pairwise scatter plots and radial plots. The library can be found here: https://github.com/dgoldri25/runtimeDiagnostics

If anyone has suggestions or would like to contribute, I would love to hear from you!

Determining the appropriate number of samples for a sensitivity analysis

Sensitivity analysis aims at assessing the relative contributions of the different sources of uncertainty to the variability in the output of a model. There are several mathematical techniques available in the literature, with variance-based approaches being the most popular, and variance-based indices the most widely used, especially “total effects” indices. Literature also provides several estimators/approximators for these indices (reviews can be found here [1]), which typically need N = n × (M + 2) model evaluations (and resulting outputs), where M is the number of uncertain inputs and n is some factor of proportionality that is selected ad hoc, depending on the complexity of the model (e.g. linear or not, monotonic or not, continuous or not). [Note: Literature typically refers to n as the number of samples and to N as the number of model evaluations, and this is how I’ll be using them also.]

The generation of this set of model runs of size N is by far the most computationally demanding step in the calculation of variance-based indices, and a major limitation to their applicability, as it’s typically in the order of magnitude of thousands [2] (n typically >1000). For example, consider a model with 5 uncertain inputs that takes 1 minute to run. To appropriately estimate sensitivity indices for such a model, we would potentially need about N=7000 model evaluations, which would take almost 5 days on a personal computer, excluding the time for the estimator calculation itself (which is typically very short).

The aim is therefore to pick the minimum n needed to ensure our index calculation is reliable. Unfortunately, there is no hard and fast rule on how to do that, but the aim of this post is to illuminate that question a little bit and provide some guidance. I am going off the work presented here [3] and elsewhere, and the aim is to perform the estimation of sensitivity indices repeatedly, using an increasing number of n until the index values converge.

I will be doing this using a fishery model, which is a nonlinear and nonmonotonic model with 9 parameters. Based on previous results suggesting that 3 of these parameters are largely inconsequential to the variability in the output, I’ll be fixing them to their nominal values. I’ll be using SALib to perform the analysis. My full script can be found here, but I’ll only be posting the most important snippets of code in this post.

First, I set up my SALib ‘problem’ and create arrays to store my results:

# Set up dictionary with system parameters
problem = {
  'num_vars': 6,
  'names': ['a', 'b', 'c','h',
            'K','m'],
  'bounds': [[ 0.002, 2],
             [0.005, 1],
             [0.2, 1],
             [0.001, 1],
             [100, 5000],
             [0.1, 1.5]]
}

# Array with n's to use
nsamples = np.arange(50, 4050, 50)

# Arrays to store the index estimates
S1_estimates = np.zeros([problem['num_vars'],len(nsamples)])
ST_estimates = np.zeros([problem['num_vars'],len(nsamples)])

I then loop through all possible n values and perform the sensitivity analysis:

# Loop through all n values, create sample, evaluate model and estimate S1 & ST
for i in range(len(nsamples)):
    print('n= '+ str(nsamples[i]))
    # Generate samples
    sampleset = saltelli.sample(problem, nsamples[i],calc_second_order=False)
    # Run model for all samples
    output = [fish_game(*sampleset[j,:]) for j in range(len(sampleset))]
    # Perform analysis
    results = sobol.analyze(problem, np.asarray(output), calc_second_order=False,print_to_console=False)
    # Store estimates
    ST_estimates[:,i]=results['ST']
    S1_estimates[:,i]=results['S1']

I can then plot the evolution of these estimates as n increases:

Evolution of first order and total order indices with increasing number of samples (n)

What these figures tell us is that choosing an n below 1000 for this model would potentially misestimate our indices, especially the first order ones (S1). As n increases, we see the estimates becoming less noisy and converging to their values. For more complex models, say, with more interactive effects, the minimum n before convergence could actually be a lot higher. A similar experiment by Nossent et al. [3] found that convergence was reached only after n=12,000.

An observation here is that the values of the total indices (ST) are higher than those of the the first order indices (S1), which makes sense, as ST includes both first order effects (captured by S1) and second order effects (i.e. interactions between the factors). Another observation here is that the parameters with the most significant effects (m and K) converge much faster than parameters with less impact on the output (a and b). This was also observed by Nossent et al. [3].

Finally, sensitivity analysis is often performed for the purposes of factor prioritization, i.e. determining (often rank-ordering) the most important parameters for the purposes of, for example, deciding where to devote most calibration efforts in the model or most further analysis to reduce the uncertainty in a parameter. The figures below show the evolution of that rank-ordering as we increase n.

Evolution of parameter ranking based on first order and total order indices with increasing number of samples (n)

These figures show that with a number of samples that is too small, we could potentially misclassify a factor as important or unimportant when it actually is not.

Now, one might ask: how is this useful? I’m trying to minimize my n, but you’ve just made me run way too many model evaluations, multiple times, just to determine how I could have done it faster? Isn’t that backwards?

Well, yes and no. I’ve devoted this time to run this bigger experiment to get insight on the behavior of my model. I have established confidence in my index values and factor prioritization. Further, I now know that n>1500 would probably be unnecessary for this system and even if the model itself or my parameter ranges change. As long as the parameter interactions, and model complexity remain relatively the same, I can leverage this information to perform future sensitivity analyses, with a known minimum n needed.

References:
[1] A. Saltelli, P. Annoni, I. Azzini, F. Campolongo, M. Ratto, and S. Tarantola, “Variance based sensitivity analysis of model output. Design and estimator for the total sensitivity index,” Computer Physics Communications, vol. 181, no. 2, pp. 259–270, Feb. 2010, doi: 10.1016/j.cpc.2009.09.018.
[2] F. Pianosi and T. Wagener, “A simple and efficient method for global sensitivity analysis based on cumulative distribution functions,” Environmental Modelling & Software, vol. 67, pp. 1–11, May 2015, doi: 10.1016/j.envsoft.2015.01.004.
[3] J. Nossent, P. Elsen, and W. Bauwens, “Sobol’ sensitivity analysis of a complex environmental model,” Environmental Modelling & Software, vol. 26, no. 12, pp. 1515–1525, Dec. 2011, doi: 10.1016/j.envsoft.2011.08.010.

Factor prioritization and factor fixing: how to know what’s important

There have been several blogposts on sensitivity analysis (SA) on this blog, focusing primarily on tools to perform it (e.g., SALib) and visualize outputs. Today I’ll be providing some more information on how to decide which factors are most important in affecting our output and which are largely inconsequential. Picking what is actually important for what we care about is obviously largely subjective and case-dependent, but this post is meant to provide some support to that exercise. I will performing a Global Sensitivity Analysis of a system resulting in a rank-ordering of the most important factors driving variability in the output (i.e., factor prioritization), which can be used to decide which are the least influential factors that can be fixed to simplify the model (i.e., factor fixing) [1].

The scripts I’ll be using can be found here, and I’ll be using a fishery model to demonstrate, as a simplified representation of a socio-ecological system we’re trying to manage. The procedure I’ll be following has been based on the work found in [2-4].

The idea is this:
I generate 1000 samples of uncertain factors that might be driving variability in my outcome (let’s call this Set 1). I apply a certain SA method on the samples and the outcomes and get sensitivity indices for each of my factors, ranking them from most important to least. Where do I draw the line between important and not important?
We can create a Set 2, using only the T most important factors from our Set 1 sample, and fixing all other factors to their default values.
We can also create a Set 3, now fixing the T most important factors to defaults and using the sampled values of all other factors from Set 1.

If we classified our important and unimportant factors correctly, then the correlation coefficient between the model outputs of Set 2 and Set 1 should approximate 1 (since we’re fixing all factors that don’t matter), and the correlation coefficient between outputs from Set 3 and Set 1 should approximate 0 (since the factors we sampled are inconsequential to the output).

Here’s how it’s done using SALib and the Delta Method (in the interest of space I’ll only share the most important snippets of code, you need the full scripts to make it run, which are in this repository) :

First we set up our problem using SALib nomenclature, generate 1000 samples using all factors (which will be our Set 1) and run the model for all 1000 samples. Finally we analyze our output using the Delta method. (This should take a couple minutes to run on your personal computer.)

# Set up dictionary with system parameters
problem = {
  'num_vars': 9,
  'names': ['a', 'b', 'c', 'd','h',
            'K','m','sigmaX','sigmaY'],
  'bounds': [[ 0.002, 2],
             [0.005, 1],
             [0.2, 1],
             [0.05, 0.2],
             [0.001, 1],
             [100, 5000],
             [0.1, 1.5],
             [0.001, 0.01],
             [0.001, 0.01]]
}

defaultvalues = np.array([0.005, 0.5, 0.5, 0.1, 0.1, 2000, 0.7, 0.004, 0.004])

# Generate samples
nsamples = 1000
X_Set1 = latin.sample(problem, nsamples) # This is Set 1

# Run model for all samples
output = [fish_game(*X_Set1[j,:]) for j in range(nsamples)]

# Perform analysis
results = delta.analyze(problem, X_Set1, np.asarray(output), print_to_console=True)

This will produce output like below, telling as the Delta indices of each of the sampled parameters, the confidence internals of those, the First order Sobol indices of the parameters, and their equivalent confidence intervals.

Parameter delta delta_conf S1 S1_conf
a 0.102206 0.021648 0.052453 0.033510
b 0.139056 0.018379 0.065019 0.022922
c 0.090550 0.016505 0.006749 0.007823
d 0.076542 0.005375 0.003923 0.009140
h 0.097057 0.016910 0.021070 0.009275
K 0.267461 0.020434 0.190670 0.057397
m 0.252351 0.040149 0.315562 0.031664
sigmaX 0.076175 0.014001 0.005930 0.005333
sigmaY 0.075390 0.015346 0.004970 0.011557

Without further analysis, one simple way of determining whether a parameter is unimportant is to check whether the confidence interval of its value overlaps 0 (i.e., subtract delta_conf from delta). For our particular results, this doesn’t seem to be the case for any of our delta values, though it does happen for some of the S1 values (c, d, sigmaY). You can refer to this post for discussion on what this might mean.
Looking at the delta values, we can clearly see two factors coming up top (K and m), followed by b, and a closely behind it. The rest of the parameters are reduced in their importance in small decrements after that. So where should we draw the line of importance? Another simple way is to use a threshold (say, 0.1) as a cutoff value [3], but one could argue over including a and not h, given how close their indices are and the wider confidence interval of a (see also the appendix below on this).

But, let’s continue with our analysis. What I am doing below is the following. First, I sort the factors from most to least important based on my results for the delta indices. Then, I create my Sets 2 and 3 on which I’ll be iteratively replacing the values of important factors with either those from Set 1 or with defaults. Finally, I loop through all possible numbers of important factors (1 to 9), generate Sets 2 and 3, calculate outputs for all samples in each, and calculate their correlation with the outputs from Set 1. (This should take 20-30 minutes to run on your personal computer.)

# Sort factors by importance
factors_sorted = np.argsort(results['delta'])[::-1]

# Set up DataFrame of default values to use for experiment
X_defaults = np.tile(defaultvalues,(nsamples, 1))

# Create initial Sets 2 and 3
X_Set2 = np.copy(X_defaults)
X_Set3 = np.copy(X_Set1)

for f in range(1, len(factors_sorted)+1):
    ntopfactors = f
    
    for i in range(ntopfactors): #Loop through all important factors
        X_Set2[:,factors_sorted[i]] = X_Set1[:,factors_sorted[i]] #Fix use samples for important
        X_Set3[:,factors_sorted[i]] = X_defaults[:,factors_sorted[i]] #Fix important to defaults
    
    # Run model for all samples    
    output_Set2 = [fish_game(*X_Set2[j,:]) for j in range(nsamples)]
    output_Set3 = [fish_game(*X_Set3[j,:]) for j in range(nsamples)]
    
    # Calculate coefficients of correlation
    coefficient_S1_S2 = np.corrcoef(output,output_Set2)[0][1]
    coefficient_S1_S3 = np.corrcoef(output,output_Set3)[0][1]

I can also plot the outputs from each iteration, which should look something like this (this is animated to show all figures, in the interest of space):

The figures above tell us the following:
If we choose one important factor (K) and fix all other parameters our outputs don’t really capture the variability of outcomes produced when considering all nine (this is also a case against one-at-a-time type analyses). The coefficient of correlation between Sets 1 and 2 is pretty low (0.44) suggesting we’re still missing important parameters. We’re doing a better job by actually fixing our most important parameter and varying all others (figure on the right, with R=0.763).
Adding the second most important factor (m), shifts things significantly to the right direction, by increasing our coefficient on the right and reducing the one on the left to R=0.203.
There is only a slight improvement with the addition of the third factor (b), but with the inclusion of the fourth (a), our reduced model is already looking very close to the full, with R=0.94. Our counter model excluding these four factors (on the right) also has a very low coefficient of R=0.025.
One could consider this performance sufficient, with the model reduced to four parameters instead of nine. Further adding parameter h and then c would further improve the values to a near perfect match between Set 2 and Set 1, but this is where subjectivity takes over, depending on the cost of adding these variables and how much we care about fidelity in this case.
It is also clear that it is likely safe to fix the last three parameters, as in this case they don’t have any consequential effects on our outcomes.

References:
[1] Saltelli, Andrea, et al.  Global Sensitivity Analysis: The Primer. (2008)
[2] T. H. Andres, “Sampling methods and sensitivity analysis for large parameter sets,” Journal of Statistical Computation and Simulation, vol. 57, no. 1–4, pp. 77–110, Apr. 1997, doi: 10.1080/00949659708811804.
[3] Y. Tang, P. Reed, T. Wagener, and K. van Werkhoven, “Comparing sensitivity analysis methods to advance lumped watershed model identification and evaluation,” Hydrology and Earth System Sciences, vol. 11, no. 2, pp. 793–817, Feb. 2007, doi: https://doi.org/10.5194/hess-11-793-2007.
[4] J. Nossent, P. Elsen, and W. Bauwens, “Sobol’ sensitivity analysis of a complex environmental model,” Environmental Modelling & Software, vol. 26, no. 12, pp. 1515–1525, Dec. 2011, doi: 10.1016/j.envsoft.2011.08.010.


Appendix:
Another way to identify a threshold of importance to classify parameters, is to add a dummy parameter to your model, that does nothing. Reperforming my SA for this same system including the dummy, produces this:

Parameter delta delta_conf S1 S1_conf
a 0.105354 0.019236 0.040665 0.020949
b 0.144955 0.023576 0.050471 0.014810
c 0.075516 0.009578 0.003889 0.006113
d 0.081177 0.011604 0.004186 0.007235
h 0.101583 0.010008 0.032759 0.021343
K 0.261329 0.022876 0.174340 0.038246
m 0.258345 0.024750 0.325690 0.052234
sigmaX 0.071862 0.008620 0.001681 0.006720
sigmaY 0.077337 0.009344 0.003131 0.006918
dummy 0.072546 0.008313 0.004176 0.009567

Even though the dummy does absolutely nothing in our model, it was still given a non-zero delta index by the analysis (0.07). One could use this as the cutoff value of non-importance and choose to fix parameters c, sigmaX, and sigmaY.

MOEAFramework Training Part 4: Processing Metrics and Creating Visualizations

Part 4 wraps up the MOEAFramework training by taking the metrics generated in Part 3 and visualizing them to gain general insight about algorithm behavior and assess strengths and weaknesses of the algorithms.

The .metrics files stored in the data_metrics folder look like the following:

Picture1

Metrics are reported every 1000 NFE and a .metrics file will be created for each seed of each parameterization of each algorithm. There are different ways to proceed with merging/processing metrics depending on choice of visualization. Relevant scripts that aren’t in the repo can be found in this zipped folder along with example data.

Creating Control Maps

When creating control maps, one can average metrics across seeds for each parameterization or use best/worst metrics to try to understand the best/worse performance of the algorithm. If averaging metrics, it isn’t unusual to find that all metrics files may not have the same number of rows, therefore rendering it impossible to average across them. Sometimes the output is not reported as specified. This is not unusual and rather just requires you to cut down all of your metric files to the greatest number of common rows. These scripts are found in ./MOEA_Framework_Group/metrics

1.Drag your metrics files from data_metrics into ./MOEA_Framework_Group/metrics and change all extensions to .txt (use ren .metrics .txt in command prompt to do so)

2.Use Cutting_Script.R to find the maximum number of rows that are common among all seeds. This will create new metric files in the folder, Cut_Files

Now these files can be averaged and grouped with their corresponding parameter values.

1.Seed_Merge.R: Creates a text file with the average hypervolume for each parameterization for each algorithm (i.e. hypervolume_Borg.txt)

2.Add Borg_Samples.txt and NSGAII_Samples.txt to the folder

3.Make_Final_Table.R: takes the population values from the sample file and the hypervolume values for each parameterization and puts them into a final matrix in a form to be accepted by the control map code.

In order to create control maps, you need to first find the reference set hypervolume, because all metrics are normalized to this overall hypervolume. This can be done using the following command and the HypervolumeEval java class written by Dave Hadka.

$ java -cp MOEAFramework-2.12-Demo.jar HypervolumeEval ./data_ref/lake.ref >> lake_ref.hypervolume

Finally, use Control_Map_Borg.py and Control_Map_NSGAII.py make your control maps.

Picture1

Control Maps for Borg and NSGAII

Initial population size on the x-axis can be regarded as a proxy for different parameterizations and the y-axis shows the number of NFE. The color represents the percentage of the overall reference hypervolume that is achieved. Control maps highlight a variety of information about an algorithm: Controllability (sensitivity to parameterization), Effectiveness (quality of approximation sets), and Efficiency (how many NFE it takes to achieve high quality solutions)

Ideally, we would want to see a completely dark blue map that would indicate that the algorithm is able to find high quality solutions very quickly for any parameterization. We can see that this is not the case for either of the algorithms above. Any light streaks indicate that for that particular parameterization, the algorithm had a harder time achieving high quality solutions. Borg is generally robust to parameterization and as seen, if allowed more NFE, it will likely result in a more even blue plot.

Creating Attainment Plots

To create attainment plots:

1.Drag metrics back from Cut_Files back into the data_metrics_new directory on the Cube.

2.Use the average_metrics.sh script to average out the metrics to obtain a set of average metrics across seeds for each algorithm.

3.Concatenate parameter files using: cat NSGAII_lake_*.average >> NSGAII_Concatenate_Average_Metrics.txt

4.Use Example_Attain.m to find the best metric values and to calculate the probability of attainment of the best metrics.

5.Create attainment vectors with build_attainment_matrix.py

6.Plot with color_mesh.py

Picture1

Attainment plots for each metric and algorithm

Attainment plots highlight the following:

Reliability of the algorithm: In general, we would like to see minimal attainment variability which would suggest that our algorithm reliably produces solutions of high quality across seeds. The white circles show the algorithm’s single best run for each metric. The gradient shows the probability of obtaining the best metric value. You can see here that the algorithms are able to reliably obtain high generational distance metrics. However, remember that generational distance is an easy metric to meet. For the other two metrics, one can see that while NSGAII obtains the best metric values, Borg has a slightly higher reliability of obtaining high hypervolume values which arguably is a more important quality that demonstrates robustness in the algorithm.

There are some extra visualizations that can be made to demonstrate algorithmic performance.

Reference Set Contribution

How much of the reference set is contributed by each algorithm?

1.Copy MOEAFramework jar file into data_ref folder

2.Add a # at the end of the individual algorithm sets

3.java -cp MOEAFramework-2.12-Demo.jar org.moeaframework.analysis.sensitivity.SetContribution -e 0.01,0.01,0.0001,0.001 -r lake.ref Borg_lake.set NSGAII_lake.set > lake_set_contribution.txt

ReferenceSet.PNG

lake_set_contribution.txt, as seen above, reports the percentage (as a decimal) of the reference set that is contributed by each algorithm and includes unique and non-unique solutions that could have been found by both algorithms. Typically, these percentages are shown in a bar chart and would effectively display the stark difference between the contribution made by Borg and NSGAII in this case.

Random Seed Analysis

A random seed analysis is a bit of a different experiment and requires just one parameterization, the default parameterization for each algorithm. Usually around 50 seeds of the defaults are run and the corresponding average hypervolume is shown as solid line while the 5th and 95th percentile confidence interval across seeds is shown as shading. Below is an example of such a plot for a different test case:

Picture3.jpg

Random seed analysis plot of hypervolume as a function of NFE

This style of plot is particularly effective at showcasing default behavior, as most users are likely to use the algorithms “straight out of the box”. Ideally, the algorithms have monotonically increasing hypervolume that reaches close to 1 in a small number of functional evaluations and also thin shading to indicate low variability among seeds. Any fluctuations in hypervolume indicates deterioration in the algorithm, which is a result of losing non-dominated solutions and is an undesirable quality of an algorithm.