In the previous blog post, we performed a simple sensitivity analysis using the Delta Moment-Independent Global Sensitivity Analysis method to identify the decision variables that will most likely cause changes in performance should their recommended values change. In this post, we will end out the MORDM series by performing scenario discovery to identify combinations of socioeconomic and/or hydroclimatic scenarios that result in the utilities being unable to meet their satisficing using Gradient Boosted Trees (Boosted Trees).
Revisiting the concept of the satisficing criteria
The satisficing criteria method is one of three approaches for defining robust strategies as outlined by Lempert and Collins (2007). It defines a robust strategy as a portfolio or set of actions that, according to a set of predetermined criteria outlined by the decision-maker(s), perform reasonably well across a wide range of possible scenarios. It does not have to be optimal (Simon, 1959). In the context of the Research Triangle, this approach is quantified using the domain criterion method (Starr, 1962) where robustness is calculated as the fraction of states of the world (SOWs) in which a portfolio meets or exceeds the criteria (Herman et al, 2014). This approach was selected for two reasons: the nature of the visualization used to communicate alternatives and uncertainty for the test case, and the elimination of the need to know the probability distributions of the uncertain SOWs.
The remaining two approaches are the regret-based approach (choosing a solution that minimizes performance degradation relative to degree of uncertainty in SOWs) and the open options approach (choosing a portfolio that keeps as many options open as possible).
For the Research Triangle, the satisficing criteria are as follows:
- Supply reliability (Reliability) should be at least 98% in all SOWs.
- The frequency of water-use restrictions (Restriction frequency) should be no more than 10% across all SOWS.
- The worst-case cost of drought mitigation actions (Worst-case cost) should be no more than 10% across all SOWs.
This brings us to the following questions: under what conditions do the portfolios fail to meet these satisficing criteria?
Gradient Boosted Trees for scenario discovery
One drawback of using ROF metrics to define the portfolio action rules within the system is the introduction of SOW combinations that cause failure (failure regions) that are non-linear and non-convex (Trindade et al, 2019). It thus becomes necessary to use an model-free, unbiased approach that can classify non-linear success-failure regions while remaining easy to interpret.
Cue Gradient Boosted Trees, or Boosted Trees. This is a tree-based, machine-learn method that uses “rough and moderately inaccurate rules or thumb” to generate “a single, highly-accurate prediction rule”. This aggregation of rough rules of thumb is referred to as a ‘weak learning model’ (Freund and Shapire, 1999), and the final prediction rule is the ‘strong learning model’ that is the result of the boosting process. The transformation from a weak to strong model is conducted via a process of creating weak classifier algorithms that are only slightly better than random guessing and forcing them to continue classifying the scenarios that they previously classified incorrectly. These algorithms are iteratively updated to improve their ability to classify regions of success or failure, turning them into strong learning models.
In this post, we will implement Boosted Trees using the GradientBoostingClassifier function from the SKLearn Python library. Before beginning, we should first install the SKLearn library. In the command line, type in the following:
pip install sklearn
Done? Great – let’s get to the code!
# -*- coding: utf-8 -*-
"""
Created on Sun June 19 18:29:04 2022
@author: Lillian Lau
"""
import numpy as np
from sklearn.ensemble import GradientBoostingClassifier
from copy import deepcopy
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
sns.set(style='white')
'''
FUNCTION to check whether performance criteria are met
'''
def check_satisficing(objs, objs_col, satisficing_bounds):
meet_low = objs[:, objs_col] >= satisficing_bounds[0]
meet_high = objs[:, objs_col] <= satisficing_bounds[1]
meets_criteria = np.hstack((meet_low, meet_high)).all(axis=1)
return meets_criteria
Here, we import all required libraries and define a function, check_satisficing
, that evaluates the performance of a given objective to see if it meets the criteria set above.
'''
Name all file headers and compSol to be analyzed
'''
obj_names = ['REL_C', 'RF_C', 'INF_NPC_C', 'PFC_C', 'WCC_C', \
'REL_D', 'RF_D', 'INF_NPC_D', 'PFC_D', 'WCC_D', \
'REL_R', 'RF_R', 'INF_NPC_R', 'PFC_R', 'WCC_R', \
'REL_reg', 'RF_reg', 'INF_NPC_reg', 'PFC_reg', 'WCC_reg']
'''
Keys:
CRE: Cary restriction efficiency
DRE: Durham restriction efficiency
RRE: Raleigh restriction efficiency
DMP: Demand multiplier
BTM: Bond term multiplier
BIM: Bond interest rate multiplier
IIM: Infrastructure interest rate multiplier
EMP: Evaporation rate multplier
STM: Streamflow amplitude multiplier
SFM: Streamflow frequency multiplier
SPM: Streamflow phase multiplier
'''
rdm_headers_dmp = ['CRE', 'DRE', 'RRE']
rdm_headers_utilities = ['DMP', 'BTM', 'BIM', 'IIM']
rdm_headers_inflows = ['STM', 'SFM', 'SPM']
rdm_headers_ws = ['EMP', 'CRR PTD', 'CRR CTD', 'LM PTD', 'LM CTD', 'AL PTD',
'AL CTD', 'D PTD', 'D CTD', 'NRR PTDD', 'NRR CTD', 'SCR PTD',
'SCT CTD', 'GC PTD', 'GC CTD', 'CRR_L PT', 'CRR_L CT',
'CRR_H PT', 'CRR_H CT', 'WR1 PT', 'WR1 CT', 'WR2 PT',
'WR2 CT', 'DR PT', 'DR CT', 'FR PT', 'FR CT']
rdm_headers_ws_drop = ['CRR PTD', 'CRR CTD', 'LM PTD', 'LM CTD', 'AL PTD',
'AL CTD', 'D PTD', 'D CTD', 'NRR PTDD', 'NRR CTD',
'SCR PTD', 'SCT CTD', 'GC PTD', 'GC CTD']
rdm_all_headers = ['CRE', 'DRE', 'RRE', 'DMP', 'BTM', 'BIM', 'IIM',
'STM', 'SFM', 'SPM', 'EMP', 'CRR_L PT', 'CRR_L CT',
'CRR_H PT', 'CRR_H CT', 'WR1 PT', 'WR1 CT', 'WR2 PT',
'WR2 CT', 'DR PT', 'DR CT', 'FR PT', 'FR CT']
all_headers = rdm_all_headers
utilities = ['Cary', 'Durham', 'Raleigh', 'Regional']
N_RDMS = 100
N_SOLNS = 69
'''
1 - Load DU factor files
'''
rdm_factors_directory = 'YourFilePath/WaterPaths/TestFiles/'
rdm_dmp_filename = rdm_factors_directory + 'rdm_dmp_test_problem_reeval.csv'
rdm_utilities_filename = rdm_factors_directory + 'rdm_utilities_test_problem_reeval.csv'
rdm_inflows_filename = rdm_factors_directory + 'rdm_inflows_test_problem_reeval.csv'
rdm_watersources_filename = rdm_factors_directory + 'rdm_water_sources_test_problem_reeval.csv'
rdm_dmp = pd.read_csv(rdm_dmp_filename, sep=",", names=rdm_headers_dmp)
rdm_utilities = pd.read_csv(rdm_utilities_filename, sep=",", names=rdm_headers_utilities)
rdm_inflows = pd.read_csv(rdm_inflows_filename, sep=",", names=rdm_headers_inflows)
rdm_ws_full = np.loadtxt(rdm_watersources_filename, delimiter=",")
rdm_ws = pd.DataFrame(rdm_ws_full[:, :len(rdm_headers_ws)], columns=rdm_headers_ws)
rdm_ws = rdm_ws.drop(rdm_headers_ws_drop, axis=1)
dufs = pd.concat([rdm_dmp, rdm_utilities, rdm_inflows, rdm_ws], axis=1, ignore_index=True)
dufs.columns = rdm_all_headers
dufs_np = dufs.to_numpy()
duf_numpy = dufs_np[:100, :]
all_params = duf_numpy
Next, we define shorthand names for the performance objectives and the DU SOWs (here referred to as ‘robust decision-making parameters’, or RDMs. One SOW is defined by a vector that contains one sampled parameter from deeply uncertain demand, policy implementation and hydroclimatic realizations. But how are these SOWs generated?
The RDM files shown in the code block above actually lists multipliers for the baseline input parameters. These multipliers scale the input up or down, widening the envelope of uncertainty and enabling the generate of more challenging scenarios as shown in the figure above. Next, remember how we used 1000 different hydroclimatic realizations to perform optimization with Borg? We will use the same 1000 realizations and pair each of them with one set of these newly-generated, more extreme scenarios as visualized in the image below. This will result in a total of (1000 x 100) different SOWs.
Each portfolio discovered during the optimization step is then evaluated over these SOWs (shown above) to identify if they meet or violate the satisficing criteria. The procedure to assess the ability of a portfolio to meet the criteria is performed below:
'''
Load objectives and robustness files
'''
out_directory = '/scratch/lbl59/blog/WaterPaths/post_processing/'
# objective values across all RDMs
objs_filename = out_directory + 'meanObjs_acrossSoln.csv'
objs_arr = np.loadtxt(objs_filename, delimiter=",")
objs_df = pd.DataFrame(objs_arr[:100, :], columns=obj_names)
'''
Determine the type of factor maps to plot.
'''
factor_names = all_headers
'''
Extract each utility's set of performance objectives and robustness
'''
Cary_all = objs_df[['REL_C', 'RF_C', 'INF_NPC_C', 'PFC_C', 'WCC_C']].to_numpy()
Durham_all = objs_df[['REL_D', 'RF_D', 'INF_NPC_D', 'PFC_D', 'WCC_D']].to_numpy()
Raleigh_all = objs_df[['REL_R', 'RF_R', 'INF_NPC_R', 'PFC_R', 'WCC_R']].to_numpy()
Regional_all = objs_df[['REL_reg', 'RF_reg', 'INF_NPC_reg', 'PFC_reg', 'WCC_reg']].to_numpy()
# Cary satisficing criteria
rel_C = check_satisficing(Cary_all, [0], [0.98, 1.0])
rf_C = check_satisficing(Cary_all, [1], [0.0, 0.1])
wcc_C = check_satisficing(Cary_all, [4], [0.0, 0.1])
satisficing_C = rel_C*rf_C*wcc_C
print('satisficing_C: ', satisficing_C.sum())
# Durham satisficing criteria
rel_D = check_satisficing(Durham_all, [0], [0.98, 1.0])
rf_D = check_satisficing(Durham_all, [1], [0.0, 0.1])
wcc_D = check_satisficing(Durham_all, [4], [0.0, 0.1])
satisficing_D = rel_D*rf_D*wcc_D
print('satisficing_D: ', satisficing_D.sum())
# Raleigh satisficing criteria
rel_R = check_satisficing(Raleigh_all, [0], [0.98,1.0])
rf_R = check_satisficing(Raleigh_all, [1], [0.0,0.1])
wcc_R = check_satisficing(Raleigh_all, [4], [0.0,0.1])
satisficing_R = rel_R*rf_R*wcc_R
print('satisficing_R: ', satisficing_R.sum())
# Regional satisficing criteria
rel_reg = check_satisficing(Regional_all, [0], [0.98, 1.0])
rf_reg = check_satisficing(Regional_all, [1], [0.0, 0.1])
wcc_reg = check_satisficing(Regional_all, [4], [0.0, 0.1])
satisficing_reg = rel_reg*rf_reg*wcc_reg
print('satisficing_reg: ', satisficing_reg.sum())
satisficing_dict = {'satisficing_C': satisficing_C, 'satisficing_D': satisficing_D,
'satisficing_R': satisficing_R, 'satisficing_reg': satisficing_reg}
utils = ['satisficing_C', 'satisficing_D', 'satisficing_R', 'satisficing_reg']
Following this, we apply the GradientBoostingClassifier to extract the parameters that most influence the ability of a portfolio to meet the satisficing criteria:
'''
Fit a boosted tree classifier and extract important features
'''
for j in range(len(utils)):
gbc = GradientBoostingClassifier(n_estimators=200,
learning_rate=0.1,
max_depth=3)
# fit the classifier to each utility's sd
# change depending on utility being analyzed!!
criteria_analyzed = utils[j]
df_to_fit = satisficing_dict[criteria_analyzed]
gbc.fit(all_params, df_to_fit)
feature_ranking = deepcopy(gbc.feature_importances_)
feature_ranking_sorted_idx = np.argsort(feature_ranking)
feature_names_sorted = [factor_names[i] for i in feature_ranking_sorted_idx]
feature1_idx = len(feature_names_sorted) - 1
feature2_idx = len(feature_names_sorted) - 2
feature_figpath = 'YourFilePath/WaterPaths/post_processing/BT_Figures/'
print(feature_ranking_sorted_idx)
feature_figname = feature_figpath + 'BT_' + criteria_analyzed + '.jpg'
fig = plt.figure(figsize=(8, 8))
ax = fig.gca()
ax.barh(np.arange(len(feature_ranking)), feature_ranking[feature_ranking_sorted_idx])
ax.set_yticks(np.arange(len(feature_ranking)))
ax.set_yticklabels(feature_names_sorted)
ax.set_xlim([0, 1])
ax.set_xlabel('Feature ranking')
plt.tight_layout()
plt.savefig(feature_figname)
'''
7 - Create factor maps
'''
# select top 2 factors
fm_figpath = 'YourFilePath/WaterPaths/post_processing/BT_Figures/'
fm_figname = fm_figpath + 'factor_map_' + criteria_analyzed + '.jpg'
selected_factors = all_params[:, [feature_ranking_sorted_idx[feature1_idx],
feature_ranking_sorted_idx[feature2_idx]]]
gbc_2_factors = GradientBoostingClassifier(n_estimators=200,
learning_rate=0.1,
max_depth=3)
# change this one depending on utility and compSol being analyzed
gbc_2_factors.fit(selected_factors, df_to_fit)
x_data = selected_factors[:, 0]
y_data = selected_factors[:, 1]
x_min, x_max = (x_data.min(), x_data.max())
y_min, y_max = (y_data.min(), y_data.max())
xx, yy = np.meshgrid(np.arange(x_min, x_max*1.001, (x_max-x_min)/100),
np.arange(y_min, y_max*1.001, (y_max-y_min)/100))
dummy_points = list(zip(xx.ravel(), yy.ravel()))
z = gbc_2_factors.predict_proba(dummy_points)[:,1]
z[z<0] = 0
z = z.reshape(xx.shape)
fig_factormap = plt.figure(figsize = (10,8))
ax_f = fig_factormap.gca()
ax_f.contourf(xx, yy, z, [0, 0.5, 1], cmap='RdBu', alpha=0.6, vmin=0, vmax=1)
ax_f.scatter(selected_factors[:,0], selected_factors[:,1],
c=df_to_fit, cmap='Reds_r', edgecolor='grey',
alpha=0.6, s=100, linewidths=0.5)
ax_f.set_xlabel(factor_names[feature_ranking_sorted_idx[feature1_idx]], size=16)
ax_f.set_ylabel(factor_names[feature_ranking_sorted_idx[feature2_idx]], size=16)
ax_f.legend(loc='upper center', bbox_to_anchor=(0.5, -0.08),
fancybox=True, shadow=True, ncol=3, markerscale=0.5)
plt.savefig(fm_figname)
The code will result in the following four factor maps:
These factor maps all agree on one thing: the demand growth rate (evaluated via the demand growth multiplier, DMP) is the DU factor that the system is most vulnerable to. This means three things:
- Each utility’s success is contingent upon different degrees of demand growth. Cary and Durham, being the smaller utilities with fewer resources, are the most susceptible to changes in demand beyond baseline projections. Raleigh can accommodate a 20% high demand growth rate than initially planned for, likely due to a larger reservoir capacity.
- The utilities should be wary of their demand growth projections. Planning and operating their water supply system solely based on the baseline projection of demand growth could be catastrophic both for individual utilities and the region as a whole. This is because as a small deviation from projected demand or the failure to account for uncertainty in planning for demand growth could result in the utility permanently failing to meet its satisficing criteria
MORDM Series Summary
We have come very far since the beginning of our MORDM tutorial using the Research Triangle test case. Here’s a brief recap:
Designing and generating states of the world
- We first generated a series of synthetic streamflows using the Kirsch Method to enable the simulation of multiple scenarios to generate more severe, deeply uncertain hydroclimatic events within the region.
- Next, we generated a set of risk-of-failure (ROF) tables to approximate the risk that a utility’s ability to meet weekly demand would be compromised and visualized the effects of different ROF thresholds on system storage dynamics.
Searching for alternatives
- Using the resulting ROF tables, we performed simulation-optimization to search for a Pareto-approximate set of candidate actions for each of the Triangle’s utilities WaterPaths and the Borg MOEA.
- The individual and regional performance of these Pareto-approximate actions were then visualized and explored using parallel axis plots.
Defining robustness measures and identifying controls
- We defined robustness using the satisficing criteria and calculated the robustness of these set of actions (portfolios) against the effects of challenging states of the world (SOWs) to discover how they fail.
- We then performed a simple sensitivity analysis across the average performance of all sixty-nine portfolios of actions, and identified that each utility’s use of water use restrictions, investment in new supply infrastructure, coupled with uncertain demand growth rates most strongly affect performance.
- Finally, we used Boosted Trees to discover consequential states of the world that most affect a portfolio’s ability to succeed in meeting all criteria, or to fail catastrophically.
This brings us to the end of our MORDM series! As usual, all files used in this series can be found in the Git Repository here. Hope you found this useful, and happy coding!
References
Freund, Y., Schapire, R., Abe, N., 1999. A short introduction to boosting. J.-Jpn. Soc. Artif. Intell. 14 (771–780), 1612
Herman, J., Zeff, H., Reed, P., & Characklis, G. (2014). Beyond optimality: Multistakeholder robustness tradeoffs for regional water portfolio planning under deep uncertainty. Water Resources Research, 50(10), 7692-7713. doi: 10.1002/2014wr015338
Lempert, R., & Collins, M. (2007). Managing the Risk of Uncertain Threshold Responses: Comparison of Robust, Optimum, and Precautionary Approaches. Risk Analysis, 27(4), 1009-1026. doi: 10.1111/j.1539-6924.2007.00940.x
Simon, H. A. (1959). Theories of Decision-Making in Economics and Behavioral Science. The American Economic Review, 49(3), 253–283. http://www.jstor.org/stable/1809901
Starr, M. (1963). Product design and decision theory. Journal Of The Franklin Institute, 276(1), 79. doi: 10.1016/0016-0032(63)90315-6
Trindade, B., Reed, P., & Characklis, G. (2019). Deeply uncertain pathways: Integrated multi-city regional water supply infrastructure investment and portfolio management. Advances In Water Resources, 134, 103442. doi: 10.1016/j.advwatres.2019.103442