Magnitude-varying sensitivity analysis and visualization (Part 1)

Various posts have discussed sensitivity analysis and techniques in this blog before. The purpose of this post is to show an application of the methods and demonstrate how they can be used in an exploratory manner, for the purposes of robust decision making (RDM). RDM aims to evaluate the performance of a policy/strategy/management plan over an ensemble of deeply uncertain parameter combinations – commonly referred to as “states of the world” (SOWs) – and then identify the policies that are most robust to those uncertainties. Most importantly, this process allows the decision maker to examine the implications of their assumptions about the world (or how it will unfold) on their candidate strategies [1].

This is Part 1 of a two part post. In this first post, I’ll introduce the types of figures I’ll be talking about, and some visualization code. In the second post (up in a couple days), I’ll discuss sensitivity analysis for the system as well as some visuals. All the code and data to produce the figures below can be found in this repository.

Now assume the performance of a system is described by a time-series, produced by our model as an output. This might be a streamflow we care about, reservoir releases, nutrient loading, or any type of time-series produced by a run of our model. For the purposes of this example, I’ll use a time-series from the system I’ve been working on, which represents historical shortages for an agricultural user.

historical_data

Fig. 1: Historical data in series

We can sort and rank these data, in the style of a flow duration curve, which would allow us to easily see, levels for median shortage (50th percentile), worst (99th), etc. The reasons one might care about these things (instead of, say, just looking at the mean, or at the time series as presented in Fig. 1) are multiple : we are often concerned with the levels and frequencies of our extremes when making decisions about systems (e.g. “how bad is the worst case?”, “how rare is the worst case?”), we might like to know how often we exceed a certain threshold (e.g. “how many years exceed an annual shortage of 1000 af?“), or, simply, maintain the distributional information of the series we care about in an easily interpretable format.

historical_data_percentiles

Fig. 2: Historical data sorted by percentile

For the purposes of an exploratory experiment, we would like to see how this time-series of model output might change under different conditions (or SOWs). There are multiple ways one might go about this [2], and in this study we sampled a broad range of parameters that we thought would potentially affect the system using Latin Hypercube Sampling [3], producing 1000 parameter combinations. We then re-simulated the system and saved all equivalent outputs for this time-series. We would like to see how this output changes under all the sampled runs.

experiment_data_all

Fig. 3: Historical data vs. experiment outputs (under 1000 SOWs)

Another way of visualizing this information, if we’re not interested in seeing all the individual lines, is to look at the range of outputs. To produce Fig. 4, I used the fill_between function in matplotlib, filling between the max and min values at each percentile level.

experiment_data_range.png

Fig. 4: Historical data vs. range of experiment outputs

By looking at the individual lines or the range, there’s one piece of potentially valuable information we just missed. We have little to no idea of what the density of outputs is within our experiment. We can see the max and min range, the lines thinning out at the edges, but it’s very difficult to infer any density of output within our samples. To address this, I’ve written a little function that loops through 10 frequency levels (you can also think of them as percentiles) and uses the fill_between function again. The only tricky thing to figure out was how to appropriately represent each layer of increasing opacity in the legend – they are all the same color and transparency, but become darker as they’re overlaid. I pulled two tricks for this. First, I needed a function that calculates the custom alpha, or the transparency, as it is not cumulative in matplotlib (e.g., two objects with transparency 0.2 together will appear as a single object with transparency 0.36).

def alpha(i, base=0.2):
l = lambda x: x+base-x*base
ar = [l(0)]
for j in range(i):
ar.append(l(ar[-1]))
return ar[-1]
view raw alpha.py hosted with ❤ by GitHub

Second, I needed proxy artists representing the color at each layer. These are the handles in the code below, produced with every loop iteration.

handles = []
labels=[]
fig = plt.figure()
ax=fig.add_subplot(1,1,1)
for i in range(len(p)):
ax.fill_between(P, np.min(expData_sort[:,:],1), np.percentile(expData_sort[:,:], p[i], axis=1), color='#4286f4', alpha = 0.1)
ax.plot(P, np.percentile(expData_sort[:,:], p[i], axis=1), linewidth=0.5, color='#4286f4', alpha = 0.3)
handle = matplotlib.patches.Rectangle((0,0),1,1, color='#4286f4', alpha=alpha(i, base=0.1))
handles.append(handle)
label = "{:.0f} %".format(100-p[i])
labels.append(label)
ax.plot(P,hist_sort, c='black', linewidth=2, label='Historical record')
ax.set_xlim(0,100)
ax.legend(handles=handles, labels=labels, framealpha=1, fontsize=8, loc='upper left', title='Frequency in experiment',ncol=2)
ax.set_xlabel('Shortage magnitude percentile', fontsize=12)
plt.savefig('experiment_data_density.png')
view raw plot.py hosted with ❤ by GitHub
experiment_data_density

Fig. 5: Historical data vs. frequency of experiment outputs

This allows us to draw some conclusions about how events of different magnitudes/frequencies shift under the SOWs we evaluated. For this particular case, it seems that high frequency, small shortages (left hand side) are becoming smaller and/or less frequent, whereas low frequency, large shortages (right hand side) are becoming larger and/or more frequent. Of course, the probabilistic inference here depends on the samples we chose, but it serves the exploratory purposes of this analysis.

References:

[1]: Bryant, Benjamin P., and Robert J. Lempert. “Thinking inside the Box: A Participatory, Computer-Assisted Approach to Scenario Discovery.” Technological Forecasting and Social Change 77, no. 1 (January 1, 2010): 34–49. https://doi.org/10.1016/j.techfore.2009.08.002.

[2]: Herman, Jonathan D., Patrick M. Reed, Harrison B. Zeff, and Gregory W. Characklis. “How Should Robustness Be Defined for Water Systems Planning under Change?” Journal of Water Resources Planning and Management 141, no. 10 (2015): 4015012. https://doi.org/10.1061/(ASCE)WR.1943-5452.0000509.

[3]: McKay, M. D., R. J. Beckman, and W. J. Conover. “A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code.” Technometrics 21, no. 2 (1979): 239–45. https://doi.org/10.2307/1268522.

Code Sample: Stacked Bars and Lines in Matlab

I needed to make a plot that superimposes a “stacked” bar graph with a line in Matlab.  I got started by referencing this post on Matlab central, which used “function handles” to pass plotting commands into plotyy.  This got me thinking about some other Matlab features that are explained in the following code sample.  Hope this helps in your work too!

The data is from the USGS’s estimate of U.S. Water Use for 2005.


%How to plot data from USGS, Estimated Water Use in the U.S. in 2005
%as a stacked bar/line graph in Matlab.

%Specifically, we want the US population as a line, and the different
%water use types as stacked bars.

%The data, where each data value is for a different 5 year period
population = [151, 164, 180, 194, 206, 216, 230, 242, 252, 267, 285, 301];
totalwithdrawal = [180, 240, 270, 310, 370, 420, 430, 397, 404, 399, 413, 410];
public = [14, 17, 21, 24, 27, 29, 33, 36.4, 38.8, 40.2, 43.2, 44.2];
irrigation = [89, 110, 110, 120, 130, 140, 150, 135, 134, 130, 139, 128];
thermo = [40, 72, 100, 130, 170, 200, 210, 187, 194, 190, 195, 201];

%Place the data in a matrix, where each row is a
%different data type (i.e., population)
data = [thermo; irrigation; public;
 totalwithdrawal-(public+irrigation+thermo)];

%Now, flip the data so each data type has its own column
data = data';

%To use different data types in the plotyy environment, you
%can use Matlab's 'anonymous function' feature. Stackedbar
%and prettyline below are temporary functions you can only
%use inside of this script.
stackedbar = @(x, A) bar(x, A, 'stack');
prettyline = @(x, y) plot(x, y, 'k', 'LineWidth', 5);

%There's a version of plotyy that accepts function handles as an
%argument. We'll pass stackedbar and prettyline in there as the
%last arguments to this function.
[ax, h1, h2] = plotyy(1950:5:2005, data, 1950:5:2005, population, stackedbar, prettyline);

%You'll notice the line and bar axes are not agreeing with each other.
%We can fix this using the ax variable created above.
set(ax(1), 'XLim', [1945 2010]);
set(ax(2), 'XLim', [1945 2010]);

%Get rid of one of the sets of ticks, to make sure
%the figure looks nice.
set(ax(2), 'XTick', []);

%Changing the colormap will change the colors of the
%stacked bars. For example:
colormap summer;

%Finally write the legend. Note that, for whatever reason,
%the line is first in the list of legend entries.
legend('Population', 'Thermoelectric Power', 'Irrigation', 'Public Supply', 'Other', 'Location', 'NorthWest');

Here’s the result!