Policy Diagnostics with Time-Varying and State Space PDFs

Some of my work has focused on “policy diagnostics,” analyzing how policies (in this case, multi-reservoir operating policies) that favor different objectives perform under different conditions and why. This can guide analysts in choosing a policy to implement, or even in determining objectives that policies should be optimized to (cough, cough, see Quinn et al., 2017). One of the more effective ways we’ve found to analyze these policies is by examining their probabilistic behavior through time-varying PDFs and state-space PDFs. This blog post will illustrate these two types of figures and provide sample code for creating them. The code for the versions of these figures generated in the above paper can be found here.

Below is an example of how time-varying PDFs can provide insights into system behavior using the Red River basin as an example. These plots show the probability of the water level in Hanoi (y axis in both figures) being at different levels on different days of the year (x axis in both figures), from the beginning of the monsoon in May to the end of the dry season in April. Red shades represent high probabilities and blue shades represent low probabilities. The left plot shows these dynamics for a policy minimizing the 100-yr annual maximum water level, while the right plot shows them for a policy maximizing the 100-yr average hydropower production. The flood-minimizing policy has a lower probability of overtopping the dikes and crossing a stakeholder-elicited alarm level of 11.25 m (Second Alarm) compared to the hydropower-maximizing policy. However, this reduction in the probability of high floodwaters requires a higher probability of crossing a lower stakeholder-elicited alarm level of 6 m (First Alarm), highlighting a tradeoff between reducing severe floods and nuisance floods. There are also different dynamics during the dry season, where the flood-minimizing solution releases more to both meet agricultural demand at the time of planting and lower the reservoir level in advance of the next monsoon. There is a bifurcation in the high probability density streak during this time, suggesting how much needs to be released depends on what is needed to lower the reservoir level to an acceptable pre-flood season level or meet the agricultural demand.

To create this figure, we simply need an N x 365 matrix of the water level on each day (column) of N different annual simulations (rows). Let’s call this matrix ‘data’. We then need to reformat ‘data’ into a Y x 365 matrix, where Y is the number of “bins” along the y axis (between ymin and ymax) that we are going to group our data into to make a histogram for each day. Finally, we just need to count how many data points occur in each bin, and then divide this count by the total number of simulated years, N. This is shown using the function ‘getTimeVaryingProbs.py’ below assuming we have two datasets we want to plot, ‘data1’ and ‘data2’.

import numpy as np

def getTimeVaryingProbs(data, N, Y, ymin, ymax):
    '''Finds the probability of being at a specific water level (y) on a given day.'''
    probMatrix = np.zeros([Y,365])
    step = (ymax-ymin)/Y
    for i in range(np.shape(probMatrix)[0]):
        for j in range(np.shape(probMatrix)[1]):
            count = ((data[:,j] < ymax-step*i) & (data[:,j] >= ymax-step*(i+1))).sum()
            probMatrix[i,j] = count/N

    return probMatrix

probMatrix1 = getTimeVaryingProbs(data1, 100000, 366, 0, 15)
probMatrix2 = getTimeVaryingProbs(data2, 100000, 366, 0, 15)

After calling ‘getTimeVaryingProbs.py’ to generate ‘probMatrix1’ and ‘probMatrix2’, we can plot the time-varying PDF of each of these using ‘imshow’. Since we want to compare the two side-by-side, we need to make sure they’re normalized over the same range. We do this by finding the lowest and highest probabilities over the two matrices and normalizing our color map over that range:

import numpy as np
from matplotlib import pyplot as plt
import matplotlib as mpl

# find the lowest and highest probability between two probability matrices
probMin = min(np.min(probMatrix1), np.min(probMatrix2))
probMax = max(np.max(probMatrix1), np.max(probMatrix2))

fig = plt.figure()
ax1 = fig.add_subplot(121)
sm = ax1.imshow(probMatrix1, cmap='RdYlBu', origin='upper', norm=mpl.colors.Normalize(vmin=probMin, vmax=ProbMax))
ax2 = fig.add_subplot(122)
sm = ax2.imshow(probMatrix2, cmap='RdYlBu', origin='upper', norm=mpl.colors.Normalize(vmin=probMin, vmax=ProbMax))
fig.subplots_adjust(right=0.8)
cbar_ax = fig.add_axes([0.85, 0.15, 0.05, 0.7])
cbar = fig.colorbar(sm, cax=cbar_ax)
cbar.ax.set_ylabel('Probability Density',fontsize=16)
fig.show()

In some cases, it may be helpful to plot a log transformation of the probability matrices, as was done in the above paper since streamflows are highly skewed.

Below is an example of how state-space PDFs can provide insights into system behavior, again using the Red River basin as an example. These plots show the probability of the water level in Hanoi (y axis in both figures) being at different levels when the total storage in the reservoirs upstream is at different levels (x axis in both figures). Red shades again represent high probabilities and blue shades represent low probabilities. The left plot shows these dynamics for a compromise policy optimized to one set of objectives, while the right plot shows them for a compromise policy optimized to a different set of objectives. The compromise policy on the left fills up the reservoirs without releasing much water downstream, resulting in a high probability streak along the bottom of the plot at low water levels. This will favor hydropower production. However, when the largest reservoirs fill up, they are forced to spill, resulting in a spike in the water level downstream. This occurs before the smaller reservoirs have filled up, and in wet years, results in overtopping before total system storage has been reached. Consequently, this policy does not make full use of the total system storage for flood protection. The compromise policy on the right, however, increases the system storage and water level simultaneously, releasing some of what initially comes in to leave empty capacity for future flood events. This strategy makes better use of the full system capacity, only resulting in overtopping when maximum system storage has been reached. The difference in the behavior of these two compromise solutions highlights the need to test rival framings of objective functions for multi-objective optimization, as some formulations may suffer unintended consequences like the formulation on the left.

To create this figure, we need two N x 365 matrices, one of the water level on each day (column) of N different annual simulations (rows) and another of the total system storage. Let’s call these matrices ‘h’ and ‘s’, respectively. We then need to use these matrices to populate a Y x X probability matrix, where Y is the number of bins along the y axis (water level, h) between ymin and ymax, and X the number of bins along the x axis (storage, s) between xmin and xmax. This probability matrix will represent a 2D histogram of how many data points lie in a combined water level and storage bin.  We again just need to count how many data points occur in each bin, and then divide this count by the total number of simulated points (365N). This is shown using the function ‘getJointProbs.py’ below assuming we have two joint datasets, (h1,s1) and (h2,s2), that we want to plot.

def getJointProbs(h, s, Y, X, ymax, ymin, xmax, xmin):
    '''Finds the probability of being at a specific water level (h) and storage (s) jointly'''
    probMatrix = np.zeros([Y,X])
    yStep = (ymax-ymin)/np.shape(probMatrix)[0]
    xStep = (xmax-xmin)/np.shape(probMatrix)[1]
    for i in range(np.shape(s)[0]):
        for j in range(np.shape(s)[1]):
            # figure out which "box" the simulated s and h are in
            row = int(np.floor((ymax-h[i,j])/yStep))
            col = int(np.ceil((s[i,j]-xmin)/xStep))
            if row < np.shape(probMatrix)[0] and col < np.shape(probMatrix)[1]:
                probMatrix[row,col] = probMatrix[row,col] + 1
            
    # calculate probability of being in each box
    probMatrix = probMatrix/(np.shape(s)[0]*np.shape(s)[1])

    return probMatrix

probMatrix1 = getJointProbs(h1, s1, 100, 100, 15, 0, 3.0E10, 0.5E10)
probMatrix2 = getJointProbs(h2, s2, 100, 100, 15, 0, 3.0E10, 0.5E10)

After calling ‘getJointProbs.py’ to generate ‘probMatrix1’ and ‘probMatrix2’, we can again plot the state space PDF of each of these using ‘imshow’ as illustrated in the second snippet of code above. Now go analyze how your reservoirs are probabilistically operating as a system!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s