A quick and straightforward introduction to LIME

In this blog post, we will be discussing the many household uses of citrus aurantiifolio, or the common lime.

Just kidding, we’ll be talking about a completely different type of LIME, namely Local Interpretable Model-Agnostic Explanations (at this point you may be thinking that the former’s scientific name becomes the easier of the two to say). After all, this is the WaterProgramming blog.

On a more serious note though, LIME is one of the two widely-known model agnostic explainable AI (xAI) methods, alongside Shapley Additive Explanations (SHAP). This post is intended to be an introduction to LIME in particular, and we will be setting up the motivation for using xAI methods as well as a brief example application using the North Carolina Research Triangle system.

Before we proceed, here’s a quick graphic to get oriented:

The figure above mainly demonstrates three main concepts: Artificial Intelligence (AI), the methods used to achieve AI (one of which includes Machine Learning, or ML), and the methods to explain how such methods achieved their version of AI (explainable AI, or more catchily known as xAI). For more explanation on the different categories of AI, and their methods, please refer to these posts by IBM’s Data Science and AI team and this SpringerLink book by Sarker (2022) respectively.

Model-agnostic vs model-specific

Model-specific methods

As shown in the figure, model-specific xAI methods are techniques that can only be used on the specific model that it was designed for. Here’s a quick rundown of the type of model and their available selection of xAI methods:

  • Decision trees
    • Decision tree visualization (e.g. Classification and Regression Tree (CART) diagrams)
    • Feature importance rankings
  • Neural networks (NN), including deep neural networks
    • Coefficient (feature weight) analysis
    • Neural network neuron/layer activation visualization
    • Attention (input sequence) mechanisms
    • Integrated gradients
    • DeepLIFT
    • GradCAM
  • Reinforcement learning
    • Causal models

Further information on the mathematical formulation for these methods can be found in Holzinger et al (2022). Such methods account for the underlying structure of the model that they are used for, and therefore require some understanding of the model formulation to fully comprehend (e.g. interpreting NN activation visualizations). While less flexible than their agnostic counterparts, model-specific xAI methods provide more granular information on how specific types of information is processed by the model, and therefore how changing input sequences, or model structure, can affect model predictions.

Model-agnostic methods

Model-agnostic xAI (as it’s name suggests) relies solely on analyzing the input-output sets of a model, and therefore can be applied to a wide range of machine learning models regardless of model structure or type. It can be thought of as (very loosely) as sensitivity analysis, but applied to AI methods (for more information on this discussion, please refer to Scholbeck et al. (2023) and Razavi et al. (2021)). SHAP and LIME both fall under this set of methods, and approximately abide by the following process: perturb the input then identify how the output is changed. Note that this set of methods provide little insight as to the specifics of model formulation and how it affects model predictions. Nonetheless, it affords a higher degree of flexibility, and does not bind you to one specific model.

Why does this matter?

Let’s think about this in the context of water resources systems management and planning. Assume you are a water utility responsible for ensuring that you reliably deliver water to 280,000 residents on a daily basis. In addition, you are responsible for planning the next major water supply infrastructure project. Using a machine learning model to inform your short- and long-term management and planning decisions without interrogating how it arrived at its recommendations implicitly assumes that the model will make sensible decisions that balance all stakeholders’ needs while remaining equitable. More often than not, this assumption can be incorrect and may lead to (sometimes funny but mostly) unintentional detrimental cascading implications on the most vulnerable (for some well-narrated examples of how ML went awry, please refer to Brian Christian’s “The Alignment Problem”).

Having said that, using xAI as a next step in the general framework of adopting AI into our decision-making processes can help us better understand why a model makes its predictions, how those predictions came to be, and their degree of usefulness to a decision maker. In this post, I will be demonstrating the use of LIME to answer the following questions.

The next section will establish the three components we will need to apply LIME to our problem:

  1. An input (feature) set and the training dataset
  2. The model predictions
  3. The LIME explainer

A quick example using the North Carolina Research Triangle

The input (feature) set and training dataset

The Research Triangle region in North Carolina consists of six main water utilities that deliver water to their namesake cities: OWASA (Orange County), Durham, Cary, Raleigh, Chatham, and Pittsboro (Gold et al., 2023). All utilities have collaboratively decided that they will each measure their system robustness using a satisficing metric (Starr 1962; Herman et al. 2015), where they satisfy their criteria for robustness if the following criteria are met:

  1. Their reliability meets or exceeds 98%
  2. Their worst-case cost of drought mitigation actions amount to no more than 10% of their annual volumetric revenue
  3. They do not restrict demand more than 20% of the time

If all three criteria are met, then they are considered “successful” (represented as a 1). Otherwise, they have “failed” (represented as a 0). We have 1,000 training data points of success or failure, each representing a state of the world (SOW) in which a utility fails or succeeds to meet their satisficing critieria. This is our training dataset. Each SOW consists of a unique combination of features that include inflow, evaporation and water demand Fourier series coefficients, as well as socioeconomic, policy and infrastructure construction factors. This is our feature set.

Feel free to follow along this portion of the post using the code available in this Google Colab Notebook.

The model prediction

To generate our model prediction, let’s first code up our model. In this example, we will be using Extreme Gradient Boosting, otherwise known as XGBoost (xgboost) as our prediction model, and the LIME package. Let’s first install the both of them:

pip install lime
pip install xgboost

We will also need to import all the needed libraries:

import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns
import lime
import lime.lime_tabular
import xgboost as xgb
from copy import deepcopy

Now let’s set up perform XGBoost! We will first need to upload our needed feature and training datasets:

satisficing = pd.read_csv('satisficing_all_utilites.csv')
du_factors = pd.read_csv('RDM_inputs_final_combined_headers.csv')  

# get all utility and feature names
utility_names = satisficing.columns[1:]
du_factor_names = du_factors.columns

There should be seven utility names (six, plus one that represents the entire region) and thirteen DU factors (or feature names). In this example, we will be focusing only on Pittsboro.

# select the utility 
utility = 'Pittsboro'

# convert to numpy arrays to be passed into xgboost function
du_factors_arr = du_factors.to_numpy()
satisficing_utility_arr = satisficing[utility].values

# initialize the figure object
fig, ax = plt.subplots(1, 1, figsize=(5, 5))  
perform_and_plot_xbg(utility, ax, du_factors_arr, satisficing_utility_arr, du_factor_names)

Note the perform_and_plot_xgb function being used – this function is not shown here for brevity, but you can view the full version of this function in this Google Colab Notebook.

The figure above is called a factor map that shows mid-term (Demand2) and long-term (Demand3) demand growth rates on its x- and y-axes respectively. The green denotes the range of demand growth rates where XGBoost has predicted that Pittsboro will successfully meet its satisficing criteria, and brown is otherwise. Each point is a sample from the original training dataset, where the color (white is 1 – success, and red is 0 – failure) denotes whether Pittsboro actually meets its satisficing criteria. In this case we can see that Pittsboro quickly transitions in failure when its mid- and long-term demand are both higher than expected (indicated by the 1.0-value on both axes).

The LIME explainer

Before we perform LIME, let’s first select an interesting point using the figure above.

In the previous section, we can see that the XGBoost algorithm predicts that Pittsboro’s robustness is affected most by mid-term (Demand2) and long-term (Demand3) demand growth. However, there is a point (indicated using the arrow and the brown circle below) where this prediction did not match the actual data point.

To better understand why then, this specific point was predicted to be a “successful” SOW where the true datapoint had it labeled as a “failure” SOW, let’s take a look at how the XGBoost algorithm made its decision.

First, let’s identify the index of this point:

# select an interesting point 
interesting_point_range_du = du_factors[(du_factors['Demand2'] < 0) & (du_factors['Demand3'] < 0)].index
interesting_point_satisficing = satisficing_utility_arr[interesting_point_range_du]
interesting_point_range_sat = np.where(satisficing_utility_arr == 0)
interesting_point = np.intersect1d(interesting_point_range_du, interesting_point_range_sat)[0]

This will return an index of 704. Next, we’ll run LIME to break down the how this (mis)classification was made:

# instantiate the lime explainer
explainer = lime_tabular.LimeTabularExplainer(du_factors_arr, 
                                              mode='classification', 
                                              feature_names=du_factor_names)

# explain the interesting instance
exp = explainer.explain_instance(du_factors_arr[interesting_point_selected], 
                                 xgb_model.predict_proba, 
                                 num_features=len(du_factor_names))

exp.show_in_notebook(show_table=True)

The following code, if run successfully, should result in the following figure.

Here’s how to interpret it:

  • The prediction probability bars on the furthest left show the model’s prediction. In this case, the XGBoost model classifies this point as a “failure” SOW with 94% confidence.
  • The tornado plot in the middle show the feature contributions. In this case, it shows the degree to which each SOW feature influenced the decision. In this case, the model misclassified the data point as a “success” although it was a failure as our trained model only accounts for the top two overall features that influence the entire dataset to plot the factor map, and did not account for short-term demand growth rates (Demand1) and the permitting time required for constructing the water treatment plant (JLWTP permit).
  • The table on the furthest right is the table of the values of all the features of this specific SOW.

Using LIME has therefore enabled us to identify the cause of XGBoost’s misclassification, allowing us to understand that the model needed information on short-term demand and permitting time to make the correct prediction. From here, it is possible to further dive into the types of SOWs and their specific characteristics that would cause them to be more vulnerable to short-term demand growth and infrastructure permitting time as opposed to mid- and long-term demand growth.

Summary

Okay, so I lied a bit – it wasn’t quite so “brief” after all. Nonetheless, I hope you learned a little about explainable AI, how to use LIME, and how to interpret its outcomes. We also walked through a quick example using the good ol’ Research Triangle case study. Do check out the Google Colab Notebook if you’re interested in how this problem was coded.

With that, thank you for sticking with me – happy learning!

References

Amazon Web Services. (1981). What’s the Difference Between AI and Machine Learning? Machine Learning & AI. https://aws.amazon.com/compare/the-difference-between-artificial-intelligence-and-machine-learning/

Btd. (2024, January 7). Explainable AI (XAI): Model-specific interpretability methods. Medium. https://baotramduong.medium.com/explainable-ai-model-specific-interpretability-methods-02e23ebceac1

Christian, B. (2020). The alignment problem: Machine Learning and human values. Norton & Company.

Gold, D. F., Reed, P. M., Gorelick, D. E., & Characklis, G. W. (2023). Advancing Regional Water Supply Management and infrastructure investment pathways that are equitable, robust, adaptive, and cooperatively stable. Water Resources Research, 59(9). https://doi.org/10.1029/2022wr033671

Herman, J. D., Reed, P. M., Zeff, H. B., & Characklis, G. W. (2015). How should robustness be defined for water systems planning under change? Journal of Water Resources Planning and Management, 141(10). https://doi.org/10.1061/(asce)wr.1943-5452.0000509

Holzinger, A., Saranti, A., Molnar, C., Biecek, P., & Samek, W. (2022). Explainable AI methods – A brief overview. xxAI – Beyond Explainable AI, 13–38. https://doi.org/10.1007/978-3-031-04083-2_2

IBM Data and AI Team. (2023, October 16). Understanding the different types of artificial intelligence. IBM Blog. https://www.ibm.com/blog/understanding-the-different-types-of-artificial-intelligence/

Sarker, I. H. (2022, February 10). AI-based modeling: Techniques, applications and research issues towards automation, intelligent and Smart Systems – SN Computer Science. SpringerLink. https://link.springer.com/article/10.1007/s42979-022-01043-x#Sec6

Scholbeck, C. A., Moosbauer, J., Casalicchio, G., Gupta, H., Bischl, B., & Heumann, C. (2023, December 20). Position paper: Bridging the gap between machine learning and sensitivity analysis. arXiv.org. http://arxiv.org/abs/2312.13234

Starr, M. K. (1963). Product design and decision theory. Prentice-Hall, Inc.

Exploring time-evolving vulnerability with the newly published interactive tutorial in the Addressing Uncertainty in MultiSector Dynamics Research eBook

We recently published two new Jupyter Notebook tutorials as technical appendices to our eBook on Addressing Uncertainty in MultiSector Dynamics Research developed as part of the Integrated MultiSector, Multiscale Modeling (IM3) project, supported by the Department of Energy Office of Science’s MultiSector Dynamics Program. The eBook provides an overview of diagnostic modeling, perspectives on model evaluation, and a framework for basic methods and concepts used in sensitivity analysis. The technical appendices demonstrate key concepts introduced in the text and provide example Python code to use as a starting point for future analysis. In this post, I’ll discuss the concepts introduced in one of the new Python tutorials, Time-evolving scenario discovery for infrastructure pathways. This post will focus on the notebook’s connections to topics in the main text of the eBook rather than detailing the code demonstrated in the notebook. For details on the test case and code used in the tutorial, see the tutorial I posted last year.

In this post, I’ll first give a brief overview of the example water supply planning test case used in the tutorial, then discuss the methodological steps used to explore uncertainty and vulnerability in the system. The main topics discussed in this post are the design of experiments, factor mapping, and factor prioritization.

The Bedford-Greene water supply test case

The Bedford-Greene infrastructure investment planning problem (Figure 1) is a stylized water resources test case designed to reflect the challenges faced when evaluating infrastructure systems that evolve over time and are subject to uncertainty system inputs. The Bedford-Greene system contains two water utilities developing an infrastructure investment and management strategy to confront growing water demands and climate change. The test case was chosen for an eBook tutorial because it contains complex and evolving dynamics driven by strongly coupled human-natural system interactions. To explore these dynamics, the tutorial walks through an exploratory modeling experiment that evaluates how a large ensemble of uncertainties influences system performance and how these influences evolve over time.

In the Bedford-Greene system, modelers do not know or cannot agree upon the probability distributions of key system inputs, a condition referred to as “deep uncertainty” (Kwakkel et al., 2016). In the face of deep uncertainties, we perform an exploratory experiment to understand how a large ensemble of future scenarios may generate vulnerability for the water supply system.

Figure 1: The Bedford-Greene Water Resources Test Case

Setting up the computational experiment

The first step in the exploratory experiment is to define which factors in the mathematical model of the system are considered uncertain and how to sample the uncertainty space. The specific uncertain factors and their variability can be elicited through expert opinion, historical observation, values in literature, or physical meaning. In the Bedford-Greene system, we define 13 uncertain factors drawn from literature on real-world water supply systems (Gorelick et al., 2022). The uncertain factors used in this tutorial can be found in Table 1.

Factor NamePlausible RangeDescription
Near-Term Demand Growth Rate Mult.-0.25 to 2.0A scaling factor on projected demand growth over the first 15 years of the planning period
Mid-Term Demand Growth Rate Mult.-0.25 to 2.0A scaling factor on projected demand growth over the second 15 years of the planning period
Long-Term Demand Growth Rate Mult.-0.25 to 2.0A scaling factor on projected demand growth over the third15 years of the planning period
Bond Term0.8 to 1.2A scaling factor on the number of years over which infrastructure capital costs are repaid as debt service
Bond Interest Rate0.6 to 1.2A scaling factor that adjusts the fixed interest rate on bonds for infrastructure
Discount Rate0.8 to 1.2The rate at which infrastructure investment costs are discounted over time
Restriction Efficacy0.8 to 1.2A scaling factor on how effective water use restrictions are at reducing demands
Infrastructure Permitting Period0.75 to 1.5A scaling factor on estimated permitting periods for infrastructure projects
Infrastructure Construction Time1 to 1.2A scaling factor on estimated construction times for infrastructure projects
Inflow Amplitude0.8 to 1.2A sinusoidal scaling factor to apply non-stationary conditions to reservoir inflows
Inflow Frequency0.2 to 0.5A sinusoidal scaling factor to apply non-stationary conditions to reservoir inflows
Inflow Phase-pi/2 to pi/2A sinusoidal scaling factor to apply non-stationary conditions to reservoir inflows
Table 1: Deep Uncertainties sampled for the Bedford-Greene System

After the relevant uncertainties and their plausible ranges have been identified, the next step is to define a sampling strategy. A sampling strategy is often referred to as a design of experiments, a term that dates back to the work of Ronald Fisher in the context of laboratory or field-based experiments (Fischer, 1936). The design of experiments is a methodological choice that should be carefully considered before starting computational experiments. The design of experiments should be chosen to balance the computational cost of the exploratory experiment with the amount of information needed to accurately characterize system vulnerability. An effective design of experiments allows us to explore complex relationships within the model and evaluate the interactions of system inputs. Five commonly used designs of experiments are overviewed in Chapter 3.3 of the main text of the eBook.

In the Bedford-Greene test case, we employ a Latin Hypercube Sampling (LHS) strategy, shown in Figure 3d. With this sampling technique for the 13 factors shown in Table 1, a 13-dimensional hypercube is generated, with each factor divided into an equal number of levels to obtain 2,000 different samples of future scenarios. 2,000 samples were chosen here based on testing from similar water supply test cases (Trindade et al., 2020) for other analyses, but the sampling size must be determined on a case-by-case basis. The LHS design guarantees sampling from every level of the uncertainty space without overlaps and will generate a diverse coverage of the entire space. When the number of samples is much greater than the number of uncertain factors, LHS effectively approximates the more computationally expensive full factorial sampling scheme shown in Figure 3a without needing to constrain samples to discrete levels for each factor, as done in fractional factorial sampling, shown in Figure 3c. For more details on each sampling scheme and general information on design of experiments, see Chapter 3.3 of the eBook.

Figure 2: Alternative designs of experiments reproduced from Figure 3.3 of the eBook main text. a) full factorial design sampling of three factors at four levels with a total of 64 samples; b) the exponential growth of a necessary number of samples when applying full factorial design at four levels; c) fractional factorial design of three factors at four levels at a total of 32 samples; d) Latin Hypercube sample of three factors with uniform distributions for a total of 32 samples.

A final step in our experimental setup is to determine which model outputs are relevant to model users. In the Bedford-Greene test case, we specify five performance criteria along with performance thresholds that the water utilities would like their infrastructure investment policy to meet under all future conditions. The performance criteria and thresholds are shown in Table 2. These values are based on water supply literature, and relevant criteria and thresholds should be determined on a case-by-case basis.

Performance criteriaThreshold
Reliability< 99%
Restriction Frequency>20%
Worst-case cost>10 % annual revenue
Peak financial cost> 80% annual revenue
Stranded assets> $5/kgal unit cost of expansion
Table 2: Performance criteria and thresholds

Discovering consequential scenarios

To explore uncertainty in the Bedford-Greene system, we run the ensemble of uncertainties developed by our design of experiments through a water resources systems model and examine the outputs of each sampled combination of uncertainty. In this system, we’re interested in understanding 1) which uncertainties have the most impact on system vulnerability, 2) which combinations of uncertainties lead to consequential outcomes for the water supply system, and 3) how vulnerability evolves over time. Chapter 3.2 of the eBook introduces diagnostic approaches that can help us answer these questions. In this tutorial, we utilize gradient-boosted trees, a machine-learning algorithm that uses an ensemble of shallow trees to generate an accurate classifier (for more on boosting, see Bernardo’s post). Gradient-boosted trees are particularly well suited to infrastructure investment problems because they are able to capture non-linear and non-differential boundaries in the uncertainty space, which often occur as a result of discrete capacity expansions. Gradient-boosted trees are also resistant to overfitting, easy to interpret, and provide a simple means of ranking the importance of uncertainties. For more background on gradient-boosted trees for scenario discovery, see this post from last year.

Gradient-boosted trees provide a helpful measure of feature importance, the percentage decrease in the impurity of the ensemble of trees associated with each factor. We can use this measure to examine how each uncertainty contributes to the ability of the region’s infrastructure investment and management policy. Infrastructure investments fundamentally alter the water utilities’ storage-to-capacity ratios and levels of debt burden, which will impact their vulnerability over time. To account for these changes, we examine feature importance over three different time periods. The results of our exploratory modeling process are shown in Figure 3. We observe that the importance of various uncertainties evolves over time for both water utilities. For example, while near-term demand growth is a key factor for both utilities in all three time periods, restriction effectiveness is a key uncertainty for Bedford in the near- and mid-term but not in the long-term, likely indicating that infrastructure investment reduces the utility’s need to rely on water use restrictions. Greene is not sensitive to restriction effectiveness in the near-term or long-term, but is very sensitive in the mid-term. This likely indicates that the utility uses restrictions as a bridge to manage high demands before infrastructure investments have been fully constructed.

Figure 3: factor importance for the two utilities. Darker colors indicate that uncertainties have higher predictive value for discovering consequential scenarios.

To learn more about how vulnerability for the two water utilities evolves, we use factor mapping (eBook Chapter 3.2) to delineate regions of the uncertainty space that lead to consequential model outputs. The factor maps in Figures 4 and 5 complement the factor ranking in Figure 3 by providing additional information about which combinations of uncertainties generate vulnerability for the two utilities. While near-term demand growth and restriction effectiveness appear to generate vulnerability for Bedford in the near-term, Figure 4 reveals that the vast majority of sampled future states of the world meet the performance criteria. When evaluated using a 22-year planning horizon, however, failures emerge as a consequence of high demand and low restriction effectiveness. When evaluated across a 45-year planning horizon, the utility appears extremely vulnerable to high demand, indicating that the infrastructure investment policy is likely insufficient to maintain water supply reliability.

Figure 4: Factor maps for Bedford

Greene’s factor maps tell a different story. In the near-term, the utility is vulnerable to high-demand scenarios. In the mid-term, the vulnerable regions have transformed, and two failure modes are apparent. First, the utility is vulnerable to a combination of high near-term demand and low restriction effectiveness, indicating the potential for water supply reliability failures. Second, the utility is vulnerable to low-demand scenarios, highlighting a potential financial failure from over-investment in infrastructure. When analyzed across the 45-year planning horizon, the utility is vulnerable to only low-demand futures, indicating a severe financial risk from over-investment. These factor maps provide important context to the factor priorities shown in Figure 3. While the factor prioritization does highlight the importance of demand growth for Greene, it does not indicate which ranges of uncertainty generate vulnerability. Evaluating the system across time reveals that though the utility is always sensitive to demand growth, the consequences of demand growth and the range that generates vulnerability completely transform over the planning period.

Figure 5: Factor maps for Greene

Concluding thoughts

The purpose of this post was to provide additional context to the eBook tutorial on time-evolving scenario discovery. The Bedford-Greene test case was chosen because it represents a tightly coupled human natural system with complex and nonlinear dynamics. The infrastructure investments made by the two water utilities fundamentally alter the system’s state dynamics over time, necessitating an approach that can capture how vulnerability evolves. Through a carefully designed computational experiment, and scenario discovery using gradient-boosted trees, we discover multiple failure modes for both water utilities, which can help regional decision-makers monitor policy performance and adapt to changing conditions. While each application will be different, the code in this tutorial can be used as a starting point for applying this methodology to other human-natural systems. As with all tutorials in the eBook, the Jupyter notebook ends with a section on how to apply this methodology to your problem.

References

Kwakkel, J. H., Walker, W. E., & Haasnoot, M. (2016). Coping with the wickedness of public policy problems: approaches for decision making under deep uncertainty. Journal of Water Resources Planning and Management, 142(3), 01816001.

Fisher, R.A. (1936). Design of experiments. Br Med J, 1(3923):554–554

Trindade, B. C., Gold, D. F., Reed, P. M., Zeff, H. B., & Characklis, G. W. (2020). Water pathways: An open source stochastic simulation system for integrated water supply portfolio management and infrastructure investment planning. Environmental Modelling & Software, 132, 104772.

Time-evolving scenario discovery for infrastructure pathways

Our recently published eBook, Addressing Uncertainty in Multisector Dynamics Research, provides several interactive tutorials for hands on training in model diagnostics and uncertainty characterization. This blog post is a preview of an upcoming extension of these trainings featuring an interactive tutorial on time-evolving scenario discovery for the development of adaptive infrastructure pathways for water supply planning. This post builds off the prior tutorial on gradient-boosted trees for scenario discovery.

I’ll first introduce a styled water supply test case featuring two water utilities seeking to develop a cooperative infrastructure investment and management policy over a 45-year planning horizon. I’ll then demonstrate how the utilities can explore evolving vulnerability across the planning period. All code for this demo is available on Github. The code is written in Python, but the workflow is model agnostic and can be paired with simulation models in any language.

Background

The Bedford-Greene metropolitan area (Figure 1) is a stylized water resources test case containing two urban water utilities seeking to develop an infrastructure investment and management strategy to confront growing demands and changing climate. The utilities have agreed to jointly finance and construct a new water treatment plant on Lake Classon, a large regional resource. Both utilities have also identified a set of individual infrastructure options to construct if necessary.

Figure 1: The Bedford-Greene metropolitan area

The utilities have formulated a cooperative and adaptive regional water supply management strategy that uses a risk-of-failure (ROF) metric to trigger both short-term drought mitigation actions (water use restrictions and treated transfers between utilities) and long-term infrastructure investment decisions (Figure 2a). ROFs represent a dynamic measure of the utilities’ evolving capacity-to-demand ratios. Both utilities have specified a set of ROF thresholds to trigger drought mitigation actions and plan to actively monitor their short-term ROF on a weekly basis. When a utility’s ROF crosses a specified threhold, the utility will implement drought mitigation actions in the following week. The utilities will also monitor long-term ROF on an annual basis, and trigger infrastructure investment if long-term risk crosses a threshold dictated by the policy. The utilities have also specified a construction order for available infrastructure options.

Figure 2: a) the regional infrastructure investment and management policy. b) adaptive pathways generated across 2,000 deeply uncertain SOWs

The utilities performed a Monte Carlo simulation to evaluate how this policy responds to a wide array of future states of the world (SOWs), each representing a different sample of uncertainties including demand growth rates, changes to streamflows, and financial variables.

The ROF-based policies respond to each SOW by generating a unique infrastructure pathway – a sequence of infrastructure investment decisions over time. Infrastructure pathways across 2,000 SOWs are shown in Figure 2b. Three clusters summarizing infrastructure pathways are plotted as green lines which represent the median week that options are triggered. The frequency that each option is triggered across all SOWs is plotted as the shading behind the lines. Bedford relies on the Joint Water Treatment facility and short-term measures (water use restrictions and transfers) to maintain supply reliability. Greene constructs the Fulton Creek reservoir in addition to the Joint Treatment plant and does not strongly rely on short-term measures to combat drought.

The utilities are now interested in evaluating the robustness of their proposed policy, characterizing how uncertainties generate vulnerability and understanding how this vulnerability may evolve over time.

Time-evolving robustness

To measure the robustness of the infrastructure investment and management policy, the two utilities employ a satisficing metric, which measures the fraction of SOWs where the policy is able to meet a set of performance criteria. The utilities have specified five performance criteria that measure the policy’s ability to maintain both supply reliability and financial stability. Performance criteria are shown in Table 1.

Performance criteriaThreshold
Reliability< 99%
Restriction Frequency>20%
Worst-case cost>10 % annual revenue
Peak financial cost> 80% annual revenue
Stranded assets> $5/kgal unit cost of expansion
Table 1: Satisficing criteria

Figure 3 shows the evolution of robustness over time for the two utilities. While the cooperative policy is very robust after the first ten years of the planning horizon, it degrades sharply for both utilities over time. Bedford meets the performance criteria in nearly 100% of sampled SOWs after the first 10 years of the planning horizon, but its robustness is reduced to around 30% by the end of the 45-year planning period. Greene has a robustness of over 90% after the first 10 years and degrades to roughly 60% after 45 years. These degradations suggest that the cooperative infrastructure investment and management policy is insufficient to successfully maintain long-term performance in challenging future scenarios. But what is really going on here? The robustness metric aggregates performance across the five criteria shown in Table 1, giving us a general picture of evolving performance, but leaving questions about the nature of the utilities’ vulnerability.

Figure 3: Robustness over time

Figure 4 provides some insight into how utility vulnerability evolves over time. Figure 4 shows the fraction of failure SOWs that can be attributed to each performance criterion when performance is measured in the near term (next 10 years), mid-term (next 22 years), and long-term (next 45 years). Figure 4 reveals that the vulnerability of the two utilities evolves in very different ways over the planning period. Early in the planning period, all of Bedford’s failures can be attributed to supply reliability. As the planning horizon progresses, Bedford’s failures diversify into failures in restriction frequency and worst-case drought management cost, indicating that the utility is generally unable to manage future drought. Bedford likely needs more infrastructure investment than is specified by the policy to maintain supply reliability.

In contrast to Bedford’s performance, Greene begins with vulnerability to supply reliability, but its vulnerability shifts over time to become dominated by failures in peak financial cost and stranded assets – measures of the utility’s financial stability. This shift indicates that while the infrastructure investments specified by the cooperative policy mitigate supply failures by the end of the planning horizon, these investments drive the utility into financial failure in many future scenarios.

Figure 4: Failures over time

Factor mapping and factor ranking

To understand how and why the vulnerability evolves over time, we perform factor mapping. Figure 5 below, shows the uncertainty space projected onto the two most influential factors for Bedford, across three planning horizons. Each point represents a sampled SOW, red points represent SOWs that resulted in failure, while white points represent SOWs that resulted in success. The color in the background shows the predicted regions of success and failure from the boosted trees classification.

Figure 4 indicates that Bedford’s vulnerability is primarily driven by rapid and sustained demand growth and this vulnerability increases over time. When evaluated using a 22-year planning horizon, the utility only appears vulnerable to extreme values of near-term demand growth, combined with low values of restriction effectiveness. This indicates that the utility is relying on restrictions to mitigate supply failures, and is vulnerable when they do not work as anticipated. When evaluated over the 45-year planning horizon, Bedford’s failure is driven by rapid and sustained demand growth. If near-term demand grows faster than anticipated (scaling factor > 1.0 on the horizontal axis), the utility will likely fail to meet its performance criteria. If near-term demand is lower than anticipated, the utility may still fail to meet performance criteria if under conditions of high mid-term demand growth. These results provide further evidence that the infrastructure investment and management policy is insufficient to meet Bedford’s long-term water supply needs.

Figure 5: Time-evolving factor maps for Bedford

Greene’s vulnerability (Figure 6) evolves very differently from Bedford’s. Greene is vulnerable to high-demand scenarios in the near term, indicating that its current infrastructure is insufficient to meet rapidly growing demands. Greene can avoid this failure under scenarios where the construction permitting time multiplier is the lowest, indicating that new infrastructure investment can meet the utility’s near-term supply needs. When evaluated across a 22-year planning horizon, the utility fails when near-term demand is high and restriction effectiveness is low, a similar failure mode to Bedford. However, the 22-year planning horizon reveals a second failure mode – low demand growth. This failure mode is responsible for the stranded assets failures shown in Figure 3. This failure mode increases when evaluated across the 45-year planning horizon, and is largely driven by low-demand futures when the utility does not generate the revenue to cover debt service payments needed to fund infrastructure investment.

Figure 6: Time-evolving factor maps for Greene

The factor maps in Figures 5 and 6 only show the two most influential factors determined by gradient boosted trees, however, the utilities are vulnerable to other sampled uncertainties. Figure 7 shows the factor importance as determined by gradient boosted trees for both utilities across the three planning horizons. While near-term demand growth is important for both utilities under all three planning horizons, the importance of other factors evolves over time. For example, restriction effectiveness plays an important role for Greene under the 22-year planning horizon but disappears under the 45-year planning horizon. In contrast, the bond interest rate is important for predicting success over the 45-year planning horizon, but does not appear important over the 10- or 22-year planning horizons. These findings highlight how assumptions about the planning period can have a large impact on modeling outcomes.

Figure 7: Factor rankings over time

MORDM IX: Discovering scenarios of consequence

In the previous blog post, we performed a simple sensitivity analysis using the Delta Moment-Independent Global Sensitivity Analysis method to identify the decision variables that will most likely cause changes in performance should their recommended values change. In this post, we will end out the MORDM series by performing scenario discovery to identify combinations of socioeconomic and/or hydroclimatic scenarios that result in the utilities being unable to meet their satisficing using Gradient Boosted Trees (Boosted Trees).

Revisiting the concept of the satisficing criteria

The satisficing criteria method is one of three approaches for defining robust strategies as outlined by Lempert and Collins (2007). It defines a robust strategy as a portfolio or set of actions that, according to a set of predetermined criteria outlined by the decision-maker(s), perform reasonably well across a wide range of possible scenarios. It does not have to be optimal (Simon, 1959). In the context of the Research Triangle, this approach is quantified using the domain criterion method (Starr, 1962) where robustness is calculated as the fraction of states of the world (SOWs) in which a portfolio meets or exceeds the criteria (Herman et al, 2014). This approach was selected for two reasons: the nature of the visualization used to communicate alternatives and uncertainty for the test case, and the elimination of the need to know the probability distributions of the uncertain SOWs.

The remaining two approaches are the regret-based approach (choosing a solution that minimizes performance degradation relative to degree of uncertainty in SOWs) and the open options approach (choosing a portfolio that keeps as many options open as possible).

For the Research Triangle, the satisficing criteria are as follows:

  1. Supply reliability (Reliability) should be at least 98% in all SOWs.
  2. The frequency of water-use restrictions (Restriction frequency) should be no more than 10% across all SOWS.
  3. The worst-case cost of drought mitigation actions (Worst-case cost) should be no more than 10% across all SOWs.

This brings us to the following questions: under what conditions do the portfolios fail to meet these satisficing criteria?

Gradient Boosted Trees for scenario discovery

One drawback of using ROF metrics to define the portfolio action rules within the system is the introduction of SOW combinations that cause failure (failure regions) that are non-linear and non-convex (Trindade et al, 2019). It thus becomes necessary to use an model-free, unbiased approach that can classify non-linear success-failure regions while remaining easy to interpret.

Cue Gradient Boosted Trees, or Boosted Trees. This is a tree-based, machine-learn method that uses “rough and moderately inaccurate rules or thumb” to generate “a single, highly-accurate prediction rule”. This aggregation of rough rules of thumb is referred to as a ‘weak learning model’ (Freund and Shapire, 1999), and the final prediction rule is the ‘strong learning model’ that is the result of the boosting process. The transformation from a weak to strong model is conducted via a process of creating weak classifier algorithms that are only slightly better than random guessing and forcing them to continue classifying the scenarios that they previously classified incorrectly. These algorithms are iteratively updated to improve their ability to classify regions of success or failure, turning them into strong learning models.

In this post, we will implement Boosted Trees using the GradientBoostingClassifier function from the SKLearn Python library. Before beginning, we should first install the SKLearn library. In the command line, type in the following:

pip install sklearn

Done? Great – let’s get to the code!

# -*- coding: utf-8 -*-
"""
Created on Sun June 19 18:29:04 2022

@author: Lillian Lau
"""

import numpy as np
from sklearn.ensemble import GradientBoostingClassifier
from copy import deepcopy
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
sns.set(style='white')

'''
FUNCTION to check whether performance criteria are met
'''
def check_satisficing(objs, objs_col, satisficing_bounds):
    meet_low = objs[:, objs_col] >= satisficing_bounds[0]
    meet_high = objs[:, objs_col] <= satisficing_bounds[1]

    meets_criteria = np.hstack((meet_low, meet_high)).all(axis=1)

    return meets_criteria

Here, we import all required libraries and define a function, check_satisficing, that evaluates the performance of a given objective to see if it meets the criteria set above.

'''
Name all file headers and compSol to be analyzed
'''
obj_names = ['REL_C', 'RF_C', 'INF_NPC_C', 'PFC_C', 'WCC_C', \
        'REL_D', 'RF_D', 'INF_NPC_D', 'PFC_D', 'WCC_D', \
        'REL_R', 'RF_R', 'INF_NPC_R', 'PFC_R', 'WCC_R', \
        'REL_reg', 'RF_reg', 'INF_NPC_reg', 'PFC_reg', 'WCC_reg']

'''
Keys:
    CRE: Cary restriction efficiency
    DRE: Durham restriction efficiency
    RRE: Raleigh restriction efficiency
    DMP: Demand multiplier
    BTM: Bond term multiplier
    BIM: Bond interest rate multiplier
    IIM: Infrastructure interest rate multiplier
    EMP: Evaporation rate multplier
    STM: Streamflow amplitude multiplier
    SFM: Streamflow frequency multiplier
    SPM: Streamflow phase multiplier
'''

rdm_headers_dmp = ['CRE', 'DRE', 'RRE']
rdm_headers_utilities = ['DMP', 'BTM', 'BIM', 'IIM']
rdm_headers_inflows = ['STM', 'SFM', 'SPM']
rdm_headers_ws = ['EMP', 'CRR PTD', 'CRR CTD', 'LM PTD', 'LM CTD', 'AL PTD',
                  'AL CTD', 'D PTD', 'D CTD', 'NRR PTDD', 'NRR CTD', 'SCR PTD',
                  'SCT CTD', 'GC PTD', 'GC CTD', 'CRR_L PT', 'CRR_L CT',
                  'CRR_H PT', 'CRR_H CT', 'WR1 PT', 'WR1 CT', 'WR2 PT',
                  'WR2 CT', 'DR PT', 'DR CT', 'FR PT', 'FR CT']

rdm_headers_ws_drop = ['CRR PTD', 'CRR CTD', 'LM PTD', 'LM CTD', 'AL PTD',
                       'AL CTD', 'D PTD', 'D CTD', 'NRR PTDD', 'NRR CTD',
                       'SCR PTD', 'SCT CTD', 'GC PTD', 'GC CTD']

rdm_all_headers = ['CRE', 'DRE', 'RRE', 'DMP', 'BTM', 'BIM', 'IIM',
                   'STM', 'SFM', 'SPM', 'EMP', 'CRR_L PT', 'CRR_L CT',
                   'CRR_H PT', 'CRR_H CT', 'WR1 PT', 'WR1 CT', 'WR2 PT',
                   'WR2 CT', 'DR PT', 'DR CT', 'FR PT', 'FR CT']

all_headers = rdm_all_headers

utilities = ['Cary', 'Durham', 'Raleigh', 'Regional']

N_RDMS = 100
N_SOLNS = 69

'''
1 - Load DU factor files
'''
rdm_factors_directory = 'YourFilePath/WaterPaths/TestFiles/'
rdm_dmp_filename = rdm_factors_directory + 'rdm_dmp_test_problem_reeval.csv'
rdm_utilities_filename = rdm_factors_directory + 'rdm_utilities_test_problem_reeval.csv'
rdm_inflows_filename = rdm_factors_directory + 'rdm_inflows_test_problem_reeval.csv'
rdm_watersources_filename = rdm_factors_directory + 'rdm_water_sources_test_problem_reeval.csv'

rdm_dmp = pd.read_csv(rdm_dmp_filename, sep=",", names=rdm_headers_dmp)
rdm_utilities = pd.read_csv(rdm_utilities_filename, sep=",", names=rdm_headers_utilities)
rdm_inflows = pd.read_csv(rdm_inflows_filename, sep=",", names=rdm_headers_inflows)
rdm_ws_full = np.loadtxt(rdm_watersources_filename, delimiter=",")
rdm_ws = pd.DataFrame(rdm_ws_full[:, :len(rdm_headers_ws)], columns=rdm_headers_ws)

rdm_ws = rdm_ws.drop(rdm_headers_ws_drop, axis=1)

dufs = pd.concat([rdm_dmp, rdm_utilities, rdm_inflows, rdm_ws], axis=1, ignore_index=True)
dufs.columns = rdm_all_headers
dufs_np = dufs.to_numpy()

duf_numpy = dufs_np[:100, :]

all_params = duf_numpy

Next, we define shorthand names for the performance objectives and the DU SOWs (here referred to as ‘robust decision-making parameters’, or RDMs. One SOW is defined by a vector that contains one sampled parameter from deeply uncertain demand, policy implementation and hydroclimatic realizations. But how are these SOWs generated?

The RDM files shown in the code block above actually lists multipliers for the baseline input parameters. These multipliers scale the input up or down, widening the envelope of uncertainty and enabling the generate of more challenging scenarios as shown in the figure above. Next, remember how we used 1000 different hydroclimatic realizations to perform optimization with Borg? We will use the same 1000 realizations and pair each of them with one set of these newly-generated, more extreme scenarios as visualized in the image below. This will result in a total of (1000 x 100) different SOWs.

Each portfolio discovered during the optimization step is then evaluated over these SOWs (shown above) to identify if they meet or violate the satisficing criteria. The procedure to assess the ability of a portfolio to meet the criteria is performed below:

'''
Load objectives and robustness files
'''
out_directory = '/scratch/lbl59/blog/WaterPaths/post_processing/'

# objective values across all RDMs
objs_filename = out_directory + 'meanObjs_acrossSoln.csv'
objs_arr = np.loadtxt(objs_filename, delimiter=",")
objs_df = pd.DataFrame(objs_arr[:100, :], columns=obj_names)

'''
Determine the type of factor maps to plot.
'''
factor_names = all_headers
'''
Extract each utility's set of performance objectives and robustness
'''
Cary_all = objs_df[['REL_C', 'RF_C', 'INF_NPC_C', 'PFC_C', 'WCC_C']].to_numpy()
Durham_all = objs_df[['REL_D', 'RF_D', 'INF_NPC_D', 'PFC_D', 'WCC_D']].to_numpy()
Raleigh_all = objs_df[['REL_R', 'RF_R', 'INF_NPC_R', 'PFC_R', 'WCC_R']].to_numpy()
Regional_all = objs_df[['REL_reg', 'RF_reg', 'INF_NPC_reg', 'PFC_reg', 'WCC_reg']].to_numpy()

# Cary satisficing criteria
rel_C = check_satisficing(Cary_all, [0], [0.98, 1.0])
rf_C = check_satisficing(Cary_all, [1], [0.0, 0.1])
wcc_C = check_satisficing(Cary_all, [4], [0.0, 0.1])
satisficing_C = rel_C*rf_C*wcc_C
print('satisficing_C: ', satisficing_C.sum())

# Durham satisficing criteria
rel_D = check_satisficing(Durham_all, [0], [0.98, 1.0])
rf_D = check_satisficing(Durham_all, [1], [0.0, 0.1])
wcc_D = check_satisficing(Durham_all, [4], [0.0, 0.1])
satisficing_D = rel_D*rf_D*wcc_D
print('satisficing_D: ', satisficing_D.sum())

# Raleigh satisficing criteria
rel_R = check_satisficing(Raleigh_all, [0], [0.98,1.0])
rf_R = check_satisficing(Raleigh_all, [1], [0.0,0.1])
wcc_R = check_satisficing(Raleigh_all, [4], [0.0,0.1])
satisficing_R = rel_R*rf_R*wcc_R
print('satisficing_R: ', satisficing_R.sum())

# Regional satisficing criteria
rel_reg = check_satisficing(Regional_all, [0], [0.98, 1.0])
rf_reg = check_satisficing(Regional_all, [1], [0.0, 0.1])
wcc_reg = check_satisficing(Regional_all, [4], [0.0, 0.1])
satisficing_reg = rel_reg*rf_reg*wcc_reg
print('satisficing_reg: ', satisficing_reg.sum())

satisficing_dict = {'satisficing_C': satisficing_C, 'satisficing_D': satisficing_D,
                    'satisficing_R': satisficing_R, 'satisficing_reg': satisficing_reg}

utils = ['satisficing_C', 'satisficing_D', 'satisficing_R', 'satisficing_reg']

Following this, we apply the GradientBoostingClassifier to extract the parameters that most influence the ability of a portfolio to meet the satisficing criteria:

'''
Fit a boosted tree classifier and extract important features
'''
for j in range(len(utils)):
    gbc = GradientBoostingClassifier(n_estimators=200,
                                     learning_rate=0.1,
                                     max_depth=3)

    # fit the classifier to each utility's sd
    # change depending on utility being analyzed!!
    criteria_analyzed = utils[j]
    df_to_fit = satisficing_dict[criteria_analyzed]
    gbc.fit(all_params, df_to_fit)

    feature_ranking = deepcopy(gbc.feature_importances_)
    feature_ranking_sorted_idx = np.argsort(feature_ranking)
    feature_names_sorted = [factor_names[i] for i in feature_ranking_sorted_idx]

    feature1_idx = len(feature_names_sorted) - 1
    feature2_idx = len(feature_names_sorted) - 2

    feature_figpath = 'YourFilePath/WaterPaths/post_processing/BT_Figures/'
    print(feature_ranking_sorted_idx)

    feature_figname = feature_figpath + 'BT_' + criteria_analyzed + '.jpg'

    fig = plt.figure(figsize=(8, 8))
    ax = fig.gca()
    ax.barh(np.arange(len(feature_ranking)), feature_ranking[feature_ranking_sorted_idx])
    ax.set_yticks(np.arange(len(feature_ranking)))
    ax.set_yticklabels(feature_names_sorted)
    ax.set_xlim([0, 1])
    ax.set_xlabel('Feature ranking')
    plt.tight_layout()
    plt.savefig(feature_figname)

    '''
    7 - Create factor maps
    '''
    # select top 2 factors
    fm_figpath = 'YourFilePath/WaterPaths/post_processing/BT_Figures/'
    fm_figname = fm_figpath + 'factor_map_' + criteria_analyzed + '.jpg'

    selected_factors = all_params[:, [feature_ranking_sorted_idx[feature1_idx],
                                     feature_ranking_sorted_idx[feature2_idx]]]
    gbc_2_factors = GradientBoostingClassifier(n_estimators=200,
                                               learning_rate=0.1,
                                               max_depth=3)

    # change this one depending on utility and compSol being analyzed
    gbc_2_factors.fit(selected_factors, df_to_fit)

    x_data = selected_factors[:, 0]
    y_data = selected_factors[:, 1]

    x_min, x_max = (x_data.min(), x_data.max())
    y_min, y_max = (y_data.min(), y_data.max())

    xx, yy = np.meshgrid(np.arange(x_min, x_max*1.001, (x_max-x_min)/100),
                         np.arange(y_min, y_max*1.001, (y_max-y_min)/100))

    dummy_points = list(zip(xx.ravel(), yy.ravel()))

    z = gbc_2_factors.predict_proba(dummy_points)[:,1]
    z[z<0] = 0
    z = z.reshape(xx.shape)

    fig_factormap = plt.figure(figsize = (10,8))
    ax_f = fig_factormap.gca()
    ax_f.contourf(xx, yy, z, [0, 0.5, 1], cmap='RdBu', alpha=0.6, vmin=0, vmax=1)

    ax_f.scatter(selected_factors[:,0], selected_factors[:,1],
                 c=df_to_fit, cmap='Reds_r', edgecolor='grey',
                 alpha=0.6, s=100, linewidths=0.5)


    ax_f.set_xlabel(factor_names[feature_ranking_sorted_idx[feature1_idx]], size=16)
    ax_f.set_ylabel(factor_names[feature_ranking_sorted_idx[feature2_idx]], size=16)
    ax_f.legend(loc='upper center', bbox_to_anchor=(0.5, -0.08),
                fancybox=True, shadow=True, ncol=3, markerscale=0.5)

    plt.savefig(fm_figname)

The code will result in the following four factor maps:

These factor maps all agree on one thing: the demand growth rate (evaluated via the demand growth multiplier, DMP) is the DU factor that the system is most vulnerable to. This means three things:

  1. Each utility’s success is contingent upon different degrees of demand growth. Cary and Durham, being the smaller utilities with fewer resources, are the most susceptible to changes in demand beyond baseline projections. Raleigh can accommodate a 20% high demand growth rate than initially planned for, likely due to a larger reservoir capacity.
  2. The utilities should be wary of their demand growth projections. Planning and operating their water supply system solely based on the baseline projection of demand growth could be catastrophic both for individual utilities and the region as a whole. This is because as a small deviation from projected demand or the failure to account for uncertainty in planning for demand growth could result in the utility permanently failing to meet its satisficing criteria

MORDM Series Summary

We have come very far since the beginning of our MORDM tutorial using the Research Triangle test case. Here’s a brief recap:

Designing and generating states of the world

  1. We first generated a series of synthetic streamflows using the Kirsch Method to enable the simulation of multiple scenarios to generate more severe, deeply uncertain hydroclimatic events within the region.
  2. Next, we generated a set of risk-of-failure (ROF) tables to approximate the risk that a utility’s ability to meet weekly demand would be compromised and visualized the effects of different ROF thresholds on system storage dynamics.

Searching for alternatives

  1. Using the resulting ROF tables, we performed simulation-optimization to search for a Pareto-approximate set of candidate actions for each of the Triangle’s utilities WaterPaths and the Borg MOEA.
  2. The individual and regional performance of these Pareto-approximate actions were then visualized and explored using parallel axis plots.

Defining robustness measures and identifying controls

  1. We defined robustness using the satisficing criteria and calculated the robustness of these set of actions (portfolios) against the effects of challenging states of the world (SOWs) to discover how they fail.
  2. We then performed a simple sensitivity analysis across the average performance of all sixty-nine portfolios of actions, and identified that each utility’s use of water use restrictions, investment in new supply infrastructure, coupled with uncertain demand growth rates most strongly affect performance.
  3. Finally, we used Boosted Trees to discover consequential states of the world that most affect a portfolio’s ability to succeed in meeting all criteria, or to fail catastrophically.

This brings us to the end of our MORDM series! As usual, all files used in this series can be found in the Git Repository here. Hope you found this useful, and happy coding!

References

Freund, Y., Schapire, R., Abe, N., 1999. A short introduction to boosting. J.-Jpn. Soc. Artif. Intell. 14 (771–780), 1612

Herman, J., Zeff, H., Reed, P., & Characklis, G. (2014). Beyond optimality: Multistakeholder robustness tradeoffs for regional water portfolio planning under deep uncertainty. Water Resources Research, 50(10), 7692-7713. doi: 10.1002/2014wr015338

Lempert, R., & Collins, M. (2007). Managing the Risk of Uncertain Threshold Responses: Comparison of Robust, Optimum, and Precautionary Approaches. Risk Analysis, 27(4), 1009-1026. doi: 10.1111/j.1539-6924.2007.00940.x

Simon, H. A. (1959). Theories of Decision-Making in Economics and Behavioral Science. The American Economic Review, 49(3), 253–283. http://www.jstor.org/stable/1809901

Starr, M. (1963). Product design and decision theory. Journal Of The Franklin Institute, 276(1), 79. doi: 10.1016/0016-0032(63)90315-6

Trindade, B., Reed, P., & Characklis, G. (2019). Deeply uncertain pathways: Integrated multi-city regional water supply infrastructure investment and portfolio management. Advances In Water Resources, 134, 103442. doi: 10.1016/j.advwatres.2019.103442

A step-by-step tutorial for scenario discovery with gradient boosted trees

Our recently published eBook, Addressing Uncertainty in Multisector Dynamics Research, provides several interactive tutorials for hands on training in model diagnostics and uncertainty characterization. The purpose of this post is to expand upon these trainings by providing a tutorial demonstrating gradient boosted trees for scenario discovery. I’ll first provide some brief background on scenario discovery and gradient boosted trees, then demonstrate a Python implementation on a water supply planning problem. All code here is written in Python, but the workflow is model agnostic, and can be paired with simulation models in any language. I’ve included my code within the text below, but all code and data for this post can also be found in this git repository.

Scenario discovery gradient boosted trees

In water resources planning and management, decision makers are often faced with uncertainty about how their system will change in the future. Traditionally, planners have confronted this uncertainty by developing prespecified narrative scenarios, which reduce the multitude of possible future conditions into a small subset of important future states of the world (a prominent example is the ‘scenario matrix framework’ used to evaluate climate change (O’Neill et al., 2014)). While this approach provides intuitive appeal, it may increase system vulnerability if future conditions do not evolve as decision makers expect (for a detailed critique of scenario based planning see Reed et al., 2022). This vulnerability is especially apparent for systems facing deep uncertainty, where decision makers do not know or cannot agree upon the probability density functions of key system inputs (Kwakkel et al., 2016).

Scenario discovery (Groves and Lempert, 2007) is an exploratory modeling centered approach that seeks to discover consequential future scenarios using computational experiments rather than relying on prespecified information. To perform scenario discovery, decision makers first identify a set of relevant uncertainties and their plausible ranges. Next, an ensemble of these uncertainties is developed by sampling across parameter ranges. Candidate policies are then evaluated across this ensemble and machine learning or data mining algorithms are used to examine which combinations of uncertainties generate vulnerability in the system. These combinations can then be used to develop narrative scenarios to inform implementation and monitoring efforts or new policy development.

A core element of the scenario discovery process is the algorithm used to classify future states of the world. Popular algorithms include the PRIM, CART and logistic regression. Recently, gradient boosted trees have been applied as an alternative classificiation algorithm. Gradient boosted trees have advantages over other scenario discovery algorithms because they can easily capture nonlinear and non-differentiable boundaries in the uncertainty space (which are particularly prevalent in water supply planning problems that have discrete capacity expansion options), are highly resistant to overfitting and provide a clear means of ranking the importance of uncertain factors (Trindade et al., 2020). For a comprehensive overview of gradient boosted trees, see Bernardo’s post here.

Test case: the Sedento Valley

To demonstrate gradient boosted trees for scenario discovery we’ll use the Sedento Valley water supply planning test case (Trindade et al., 2020). In the Sedento Valley, three water utilities seek to discover cooperative water supply managment and infrastructure investment portfolios to meet several conflicting objectives in a system facing deep uncertainty. In this post, we’ll investigate how these deep uncertainties (which include demand growth, the efficacy of water use restrictions, financial variables and parameters governing infrastructure permitting and construction time) impact a utility’s ability to maintain three performance criteria: keeping reliability > 98%, restriction frequency < 20% and worst case cost less than 10% of annual revenue. For simplicity, we’ll focus on one regional water utility named Watertown.

Step 1: create a sample of deeply uncertain states of the world

To start the scenario discovery process, we generate an ensemble of deep uncertainties that represent future states of the world (SOWs). Here, we’ll use Latin Hypercube Sampling with an implementation I found in the Surrogate Modeling Toolbox.

import numpy as np
from smt.sampling_methods import LHS

'''
This script will generate 1000 Latin Hypercube Samples (LHS)
of deeply uncertain system parameters for the Sedento Valley
'''


# create an array storing the ranges of deeply uncertain parameters
DU_factor_limits = np.array([
    [0.9, 1.1], # Watertown restriction efficacy 
    [0.9, 1.1], # Dryville restriction efficacy
    [0.9, 1.1], # Fallsland restriction efficacy
    [0.5, 2.0], # Demand growth rate multiplier
    [1.0, 1.2], # Bond term
    [0.6, 1.0], # Bond interest rate
    [0.6, 1.4], # Discount rate
    [0.75, 1.5], # New River Reservoir permitting time
    [1.0, 1.2], # New River Reservoir construction time
    [0.75, 1.5], # College Rock Reservoir (low) permitting time
    [1.0, 1.2], # College Rock Reservoir (low) construction time
    [0.75, 1.5], # College Rock Reservoir (high) permitting time
    [1.0, 1.2], # College Rock Reseroir (high) construction time
    [0.75, 1.5], # Water Reuse permitting time
    [1.0, 1.2], # Water Reuse construction time
    [0.8, 1.2], # Inflow amplitude
    [0.2, 0.5], # Inflow frequency
    [-1.57, 1.57]]) # Inflow phase

# Use the smt package to set up the LHS sampling
sampling = LHS(xlimits=DU_factor_limits)

# We will create 1000 samples
num = 1000

# Create the actual sample
x = sampling(num)

# save to a csv file
np.savetxt('DU_factors.csv', x, delimiter=',')

Step 2: Evaluate performance across SOWs

Next, we’ll evaluate the performance of our policy across the LHS sample of DU factors. For the Sedento Valley test case, we use WaterPaths, an open-source simulation system for integrated water supply portfolio management and infrastructure investment planning (for more see Trindade et al., 2020). This step is not included in the git repository as it requires high-performance computing for this system, but results can be found in the “Model_output.csv” file. For simulation details, see Gold et al., 2022.

Step 3: Convert model output into a boolean array for classification

To perform classification, we need to convert the results of our simulations to a binary array classifying each SOW as a “success” or “failure” based on whether the policy met the performance criteria under the SOW. First, we define a small function to determine if an SOW meets a set of criteria, then we apply this function to our results. We also load the DU factor LHS sample.

# First, define a function to check whether performance criteria are met
def check_criteria(objectives, crit_objs, crit_vals):
    """
    Determines if an objective meets a given set of criteria for a set of SOWs

    inputs:
        objectives: np array of all objectives across a set of SOWs
        crit_objs: the column index of the objective in question
        crit_vals: an array containing [min, max] of the values 
    
    returns:
        meets_criteria: an numpy array containing the SOWs that meet both min and max criteria

    """
    
    # check max and min criteria for each objective
    meet_low = objectives[:, crit_objs] >= crit_vals[0]
    meet_high = objectives[:, crit_objs] <= crit_vals[1]

    # check if max and min criteria are met at the same time
    meets_criteria = np.hstack((meet_low, meet_high)).all(axis=1)

    return meets_criteria


##### Load data and pre-process #####

# load objectives and create input array of boolean values for SD input
Reeval_objectives = np.loadtxt('Model_output.csv', skiprows=1, delimiter=',')
REL = check_criteria(Reeval_objectives, [0], [.979, 1])
RF = check_criteria(Reeval_objectives, [1], [0, 0.10])
WCC = check_criteria(Reeval_objectives, [2], [0, 0.10])
SD_input = np.vstack((REL, RF, WCC)).SD_input(axis=0)


# load DU factors
DU_factors = np.loadtxt('DU_factors.csv', skiprows=1, delimiter=',')
DU_names = ['Watertown Rest. Eff.', 'Dryville Rest. Eff.', 'FSD_inputsland Rest. Eff.',
            'Demand Growth Rate', 'Bond Term', 'Bond Interest',
            'Discount Rate', 'NRR Perm', 'NRR Const', 'CRR L Perm',
            'CRR L Const.',	'CRR H Perm.', 'CRR H Const.', 'WR1 Perm.',
             'WR1 Const.', 'Inflows A', 'Inflows m','Inflows p']

Step 4: Fit a boosted trees classifier

After we’ve formatted the data, we’re ready to perform boosted trees classification. There are several packages for boosted trees in Python, here we’ll use the implementation from scikit-learn. We’ll use an ensemble of 200 trees with depth 3 and a learning rate of 0.1. These parameters need to be tuned for the individual problem, I found this nice post that goes into detail on parameter tuning.

##### Boosted Tree Classification #####

from sklearn.ensemble import GradientBoostingClassifier

# create a gradient boosted classifier object
gbc = GradientBoostingClassifier(n_estimators=200,
                                 learning_rate=0.1,
                                 max_depth=3)

# fit the classifier
gbc.fit(DU_factors, SD_input)

Step 5: Examine which DU factors have the most impact on performance criteria

Now we’re ready to examine the results of our classification. First, we’ll examine how important each DU factor is to the classification results generated by boosted trees. To rank the imporance of each DU factor, we examine the percentage decrease in impurity of the ensemble of trees that is associated with each factor. In plain english, this is a measure of how helpful each DU factor is to correctly classifying SOWs. This infromation is generated during the fit of the classifier above and is easily accessible as an attribute of our scikit-learn classifier.

For our example, one deep uncertainty, demand growth rate, clearly stands out as the most influential, as shown in the figure below. A second, the restriction efficacy for Watertown (the utility we’re focusing on), also stands out as a higher level of importance. All other DU factors have little impact on the classification in this case.

##### Factor Ranking #####

# Extract the feature importances
feature_importances = deepcopy(gbc.feature_importances_)

# rank the feature importances and plot
importances_sorted_idx = np.argsort(feature_importances)
sorted_names = [DU_names[i] for i in importances_sorted_idx]

fig = plt.figure(figsize=(8,8))
ax = fig.gca()
ax.barh(np.arange(len(feature_importances)), feature_importances[importances_sorted_idx])
ax.set_yticks(np.arange(len(feature_importances)))
ax.set_yticklabels(sorted_names)
ax.set_xlim([0,1])
ax.set_xlabel('Feature Importance')
plt.tight_layout()

Step 6: Create factor maps

Finally, we visualize the results of our classification through factor mapping. In the plot below, we show the uncertainty space projected onto the two most influential factors, demand growth rate and restriciton efficacy. Each point represents a sampled SOW, red points represent SOWs that resulted in failure, while white points represent SOWs that resulted in success. The color in the background shows the predicted regions of success and failure from the boosted trees classification.

Here we observe that high levels of demand growth are the primary source of vulnerability for the water utility. When restriction efficacy is lower than our estimate (multiplier < 1), the utility faces failures at demand growth levels of about 1.7 times the estimated values. When restriction effectiveness is at or above estimates, the acceptable scaling of demand growth raises to about 1.8.

Taken as a whole, these results provide valueable insights for decision makers. From our original 18 deep uncertainties, we have discovered that two are critical for the success of our water supply management policy. Further, we have defined thresholds within the uncertainty space that define scenarios that lead to failure. We can use this information to inform monitoring efforts for the water supply policy, or to inform a new problem formulation that tailors actions to mitigate these vulnerabilities.

##### Factor Mapping #####

# Select the top two factors discovered above
selected_factors = DU_factors[:, [3,0]]

# Fit a classifier using only these two factors
gbc_2_factors = GradientBoostingClassifier(n_estimators=200,
                                 learning_rate=0.1,
                                 max_depth=3)
gbc_2_factors.fit(selected_factors, SD_input)

# plot prediction contours
x_data = selected_factors[:,0]
y_data = selected_factors[:,1]

x_min, x_max = (x_data.min(), x_data.max())
y_min, y_max = (y_data.min(), y_data.max())

# create a grid to makes predictions on
xx, yy = np.meshgrid(np.arange(x_min, x_max * 1.001, (x_max - x_min) / 100),
                        np.arange(y_min, y_max * 1.001, (y_max - y_min) / 100))
                        
dummy_points = list(zip(xx.ravel(), yy.ravel()))

z = gbc_2_factors.predict_proba(dummy_points)[:, 1]
z[z < 0] = 0.
z = z.reshape(xx.shape)

# plot the factor map        
fig = plt.figure(figsize=(10,8))
ax = fig.gca()
ax.contourf(xx, yy, z, [0, 0.5, 1.], cmap='RdBu',
                alpha=.6, vmin=0.0, vmax=1)
ax.scatter(selected_factors[:,0], selected_factors[:,1],\
            c=SD_input, cmap='Reds_r', edgecolor='grey', 
            alpha=.6, s= 100, linewidth=.5)
ax.set_xlim([.5, 2])
ax.set_ylim([.9,1.1])
ax.set_xlabel('Demand Growth Multiplier')
ax.set_ylabel('Restriction Eff. Multiplier')

References

Gold, D. F., Reed, P. M., Gorelick, D. E., & Characklis, G. W. (2022). Power and Pathways: Exploring Robustness, Cooperative Stability, and Power Relationships in Regional Infrastructure Investment and Water Supply Management Portfolio Pathways. Earth’s Future, 10(2), e2021EF002472.

Groves, D. G., & Lempert, R. J. (2007). A new analytic method for finding policy-relevant scenarios. Global Environmental Change, 17(1), 73-85.

Kwakkel, J. H., Walker, W. E., & Haasnoot, M. (2016). Coping with the wickedness of public policy problems: approaches for decision making under deep uncertainty. Journal of Water Resources Planning and Management, 142(3), 01816001.

O’Neill, B. C., Kriegler, E., Riahi, K., Ebi, K. L., Hallegatte, S., Carter, T. R., … & van Vuuren, D. P. (2014). A new scenario framework for climate change research: the concept of shared socioeconomic pathways. Climatic change, 122(3), 387-400.

Reed, P.M., Hadjimichael, A., Malek, K., Karimi, T., Vernon, C.R., Srikrishnan, V., Gupta, R.S., Gold, D.F., Lee, B., Keller, K., Thurber, T.B., & Rice, J.S. (2022). Addressing Uncertainty in Multisector Dynamics Research [Book]. Zenodo. https://doi.org/10.5281/zenodo.6110623

Trindade, B. C., Gold, D. F., Reed, P. M., Zeff, H. B., & Characklis, G. W. (2020). Water pathways: An open source stochastic simulation system for integrated water supply portfolio management and infrastructure investment planning. Environmental Modelling & Software, 132, 104772.