Policy Diagnostics with Time-Varying and State Space PDFs

Some of my work has focused on “policy diagnostics,” analyzing how policies (in this case, multi-reservoir operating policies) that favor different objectives perform under different conditions and why. This can guide analysts in choosing a policy to implement, or even in determining objectives that policies should be optimized to (cough, cough, see Quinn et al., 2017). One of the more effective ways we’ve found to analyze these policies is by examining their probabilistic behavior through time-varying PDFs and state-space PDFs. This blog post will illustrate these two types of figures and provide sample code for creating them. The code for the versions of these figures generated in the above paper can be found here.

Below is an example of how time-varying PDFs can provide insights into system behavior using the Red River basin as an example. These plots show the probability of the water level in Hanoi (y axis in both figures) being at different levels on different days of the year (x axis in both figures), from the beginning of the monsoon in May to the end of the dry season in April. Red shades represent high probabilities and blue shades represent low probabilities. The left plot shows these dynamics for a policy minimizing the 100-yr annual maximum water level, while the right plot shows them for a policy maximizing the 100-yr average hydropower production. The flood-minimizing policy has a lower probability of overtopping the dikes and crossing a stakeholder-elicited alarm level of 11.25 m (Second Alarm) compared to the hydropower-maximizing policy. However, this reduction in the probability of high floodwaters requires a higher probability of crossing a lower stakeholder-elicited alarm level of 6 m (First Alarm), highlighting a tradeoff between reducing severe floods and nuisance floods. There are also different dynamics during the dry season, where the flood-minimizing solution releases more to both meet agricultural demand at the time of planting and lower the reservoir level in advance of the next monsoon. There is a bifurcation in the high probability density streak during this time, suggesting how much needs to be released depends on what is needed to lower the reservoir level to an acceptable pre-flood season level or meet the agricultural demand.

To create this figure, we simply need an N x 365 matrix of the water level on each day (column) of N different annual simulations (rows). Let’s call this matrix ‘data’. We then need to reformat ‘data’ into a Y x 365 matrix, where Y is the number of “bins” along the y axis (between ymin and ymax) that we are going to group our data into to make a histogram for each day. Finally, we just need to count how many data points occur in each bin, and then divide this count by the total number of simulated years, N. This is shown using the function ‘getTimeVaryingProbs.py’ below assuming we have two datasets we want to plot, ‘data1’ and ‘data2’.

import numpy as np

def getTimeVaryingProbs(data, N, Y, ymin, ymax):
    '''Finds the probability of being at a specific water level (y) on a given day.'''
    probMatrix = np.zeros([Y,365])
    step = (ymax-ymin)/Y
    for i in range(np.shape(probMatrix)[0]):
        for j in range(np.shape(probMatrix)[1]):
            count = ((data[:,j] < ymax-step*i) & (data[:,j] >= ymax-step*(i+1))).sum()
            probMatrix[i,j] = count/N

    return probMatrix

probMatrix1 = getTimeVaryingProbs(data1, 100000, 366, 0, 15)
probMatrix2 = getTimeVaryingProbs(data2, 100000, 366, 0, 15)

After calling ‘getTimeVaryingProbs.py’ to generate ‘probMatrix1’ and ‘probMatrix2’, we can plot the time-varying PDF of each of these using ‘imshow’. Since we want to compare the two side-by-side, we need to make sure they’re normalized over the same range. We do this by finding the lowest and highest probabilities over the two matrices and normalizing our color map over that range:

import numpy as np
from matplotlib import pyplot as plt
import matplotlib as mpl

# find the lowest and highest probability between two probability matrices
probMin = min(np.min(probMatrix1), np.min(probMatrix2))
probMax = max(np.max(probMatrix1), np.max(probMatrix2))

fig = plt.figure()
ax1 = fig.add_subplot(121)
sm = ax1.imshow(probMatrix1, cmap='RdYlBu', origin='upper', norm=mpl.colors.Normalize(vmin=probMin, vmax=ProbMax))
ax2 = fig.add_subplot(122)
sm = ax2.imshow(probMatrix2, cmap='RdYlBu', origin='upper', norm=mpl.colors.Normalize(vmin=probMin, vmax=ProbMax))
fig.subplots_adjust(right=0.8)
cbar_ax = fig.add_axes([0.85, 0.15, 0.05, 0.7])
cbar = fig.colorbar(sm, cax=cbar_ax)
cbar.ax.set_ylabel('Probability Density',fontsize=16)
fig.show()

In some cases, it may be helpful to plot a log transformation of the probability matrices, as was done in the above paper since streamflows are highly skewed.

Below is an example of how state-space PDFs can provide insights into system behavior, again using the Red River basin as an example. These plots show the probability of the water level in Hanoi (y axis in both figures) being at different levels when the total storage in the reservoirs upstream is at different levels (x axis in both figures). Red shades again represent high probabilities and blue shades represent low probabilities. The left plot shows these dynamics for a compromise policy optimized to one set of objectives, while the right plot shows them for a compromise policy optimized to a different set of objectives. The compromise policy on the left fills up the reservoirs without releasing much water downstream, resulting in a high probability streak along the bottom of the plot at low water levels. This will favor hydropower production. However, when the largest reservoirs fill up, they are forced to spill, resulting in a spike in the water level downstream. This occurs before the smaller reservoirs have filled up, and in wet years, results in overtopping before total system storage has been reached. Consequently, this policy does not make full use of the total system storage for flood protection. The compromise policy on the right, however, increases the system storage and water level simultaneously, releasing some of what initially comes in to leave empty capacity for future flood events. This strategy makes better use of the full system capacity, only resulting in overtopping when maximum system storage has been reached. The difference in the behavior of these two compromise solutions highlights the need to test rival framings of objective functions for multi-objective optimization, as some formulations may suffer unintended consequences like the formulation on the left.

To create this figure, we need two N x 365 matrices, one of the water level on each day (column) of N different annual simulations (rows) and another of the total system storage. Let’s call these matrices ‘h’ and ‘s’, respectively. We then need to use these matrices to populate a Y x X probability matrix, where Y is the number of bins along the y axis (water level, h) between ymin and ymax, and X the number of bins along the x axis (storage, s) between xmin and xmax. This probability matrix will represent a 2D histogram of how many data points lie in a combined water level and storage bin.  We again just need to count how many data points occur in each bin, and then divide this count by the total number of simulated points (365N). This is shown using the function ‘getJointProbs.py’ below assuming we have two joint datasets, (h1,s1) and (h2,s2), that we want to plot.

def getJointProbs(h, s, Y, X, ymax, ymin, xmax, xmin):
    '''Finds the probability of being at a specific water level (h) and storage (s) jointly'''
    probMatrix = np.zeros([Y,X])
    yStep = (ymax-ymin)/np.shape(probMatrix)[0]
    xStep = (xmax-xmin)/np.shape(probMatrix)[1]
    for i in range(np.shape(s)[0]):
        for j in range(np.shape(s)[1]):
            # figure out which "box" the simulated s and h are in
            row = int(np.floor((ymax-h[i,j])/yStep))
            col = int(np.ceil((s[i,j]-xmin)/xStep))
            if row < np.shape(probMatrix)[0] and col < np.shape(probMatrix)[1]:
                probMatrix[row,col] = probMatrix[row,col] + 1
            
    # calculate probability of being in each box
    probMatrix = probMatrix/(np.shape(s)[0]*np.shape(s)[1])

    return probMatrix

probMatrix1 = getJointProbs(h1, s1, 100, 100, 15, 0, 3.0E10, 0.5E10)
probMatrix2 = getJointProbs(h2, s2, 100, 100, 15, 0, 3.0E10, 0.5E10)

After calling ‘getJointProbs.py’ to generate ‘probMatrix1’ and ‘probMatrix2’, we can again plot the state space PDF of each of these using ‘imshow’ as illustrated in the second snippet of code above. Now go analyze how your reservoirs are probabilistically operating as a system!

Making Valgrind Easy

Some of this blog’s readers and authors (most notably, Joe Kasprzik) read the title of this post and though “wait, there already is a post about Valgrind in this blog.” And you are right, so in this blog post I will build on the legacy Joe has left us on his post about Valgrind and get into the details of how to use its basic functionalities to get your code right.

Common mistakes when coding in C++

Suppose we have the following code:

#include <stdio.h>

int main() {
    int *var = new int[5]; // you wouldn't do this if the size was always 5, but this makes the argument clear.
    int n = 5;
    int m;

    if (m > n) {
        printf("Got into if statement.\n");
        for (int i = 0; i < 6; ++i) {
            var[i] = i;
        }
    }

    printf("var[5] equals %d\n", var[n]);
}

Saving the code above in a file called test.cpp, compiling it with g++ to create an executable called "test," and running it with "./test" will return the following output:

bernardoct@DESKTOP-J6145HK ~
$ g++ test.cpp -o test

bernardoct@DESKTOP-J6145HK ~
$ ./test
Got into if statement.
var[5] equals 5

Great, it ran and did not crash (in such a simple code gcc's flag -Wall would have issued a warning saying m was not initialized, but in more complex code such warning may not be issued). However, it would be great if this code had crashed because this would make us look into it and figure out it actually has 3 problems:

  1. We did not assign a value to variable m (it was created but not initialized), so how did the code determine that m was greater than n to get into the code inside the if statement?
  2. The pointer array var was created as having length 5, meaning its elements are numbered 0 to 4. If the for-loop runs from 0 to 5 but element 5 does not exist, how did the code fill it in with the value of variable i when i was 5 in the loop? From the printf statement that returned 5 we know vars[5] equals 5.
  3. The pointer array var was not destroyed after the code did not need it any longer. This is not necessarily a problem in this case, but if this was a function that is supposed to be called over and over within a model there is a change the RAM would be filled with these seemingly inoffensive pointer arrays and the computer would freeze (or the node, if running on a cluster, would possibly crash and have to be rebooted).

Given C++ will not crash even in the presence of such errors, one way of making sure your code is clean is by running it through Valgrind. However, most people who has used Valgrind on a model that has a few hundreds or thousands of lines of code has gotten discouraged by its possibly long and cryptic-looking output. However, do not let this intimidate you because the output is actually fairly easy to read once you either learn what to look for or use Valkyrie, a graphical user interface for Valgrind.

Generating and interpreting Valgrind’s output

The first think that needs to be done for Valgrind to give you a meaningful output is to re-compile your code with the -O0 and -g flags, the former to prevent the compiler from modifying your code to make it more efficient but unintelligible to Valgrind (or to debuggers), and the latter for Valgrind (and debuggers) to be able to pinpoint the line of code where issues happen and are originated. Therefore, the code should be compiled as shown below:

bernardoct@DESKTOP-J6145HK ~
$ g++ -O0 -g test.cpp -o test

Now it is time to run your code with Valgrind to perform some memory checking. Valgrind itself will take flags that will dictate the type of analysis to be performed. Here we are interested in checking memory misuse (instead profiling, checking for thread safety, etc.), so the first flag (not required, but good to keep things for yourself) should be --tool=memcheck. Now that we specified that we want Valgrind to run a memory check, we should specify that we want it to look in detail for memory leaks and tell us where the erros are happening and originating, which can done by passing flags --leak-check=full and --track-origins-yes. This way, the complete function call to run Valgrind on our test program is:

bernardoct@DESKTOP-J6145HK ~
$ valgrind --tool=memcheck --leak-check=full --track-origins=yes ./test

Important: Beware that your code will take orders of magnitude longer to run with Valgrind than it would otherwise. This means that you should run something as small as possible but still representative — e.g. instead of running your stochastic model with 1,000 realizations and a simulation time of 50 years, consider running 2 realizations simulating 2 years, so that Valgrind analyzes the year-long simulation and the transition between realizations and years. Also, if running your code on a cluster, load the valgrind module with module load valgrind-xyz on your submission script and replace the call to your model on the submission script by the valgrind call above you can find the exact name of the Valgrind module by running module avail on the terminal. If running valgrind with a code that used MPI, use mpirun valgrind ./mycode -flags.

When called, valgrind will instrument our test.cpp and based on the collected information will print the following on the screen:

==385== Memcheck, a memory error detector
==385== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==385== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==385== Command: ./test
==385==
==385== Conditional jump or move depends on uninitialised value(s)
==385==    at 0x4006A9: main (test.cpp:9)
==385==  Uninitialised value was created by a stack allocation
==385==    at 0x400686: main (test.cpp:3)
==385==
Got into if statement.
==385== Invalid write of size 4
==385==    at 0x4006D9: main (test.cpp:12)
==385==  Address 0x5ab4c94 is 0 bytes after a block of size 20 alloc'd
==385==    at 0x4C2E8BB: operator new[](unsigned long) (vg_replace_malloc.c:423)
==385==    by 0x400697: main (test.cpp:5)
==385==
==385== Invalid read of size 4
==385==    at 0x4006F5: main (test.cpp:16)
==385==  Address 0x5ab4c94 is 0 bytes after a block of size 20 alloc'd
==385==    at 0x4C2E8BB: operator new[](unsigned long) (vg_replace_malloc.c:423)
==385==    by 0x400697: main (test.cpp:5)
==385==
var[5] equals 5
==385==
==385== HEAP SUMMARY:
==385==     in use at exit: 20 bytes in 1 blocks
==385==   total heap usage: 3 allocs, 2 frees, 73,236 bytes allocated
==385==
==385== 20 bytes in 1 blocks are definitely lost in loss record 1 of 1
==385==    at 0x4C2E8BB: operator new[](unsigned long) (vg_replace_malloc.c:423)
==385==    by 0x400697: main (test.cpp:5)
==385==
==385== LEAK SUMMARY:
==385==    definitely lost: 20 bytes in 1 blocks
==385==    indirectly lost: 0 bytes in 0 blocks
==385==      possibly lost: 0 bytes in 0 blocks
==385==    still reachable: 0 bytes in 0 blocks
==385==         suppressed: 0 bytes in 0 blocks
==385==
==385== For counts of detected and suppressed errors, rerun with: -v
==385== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)

Seeing Valgrind’s output being 5 times as long as the test code itself can be somewhat disheartening, but the information contained in the output is really useful. The first block of the output is the header it will always be printed so that you know the version of Valgrind you have been using, the call for your own code it used, and so on. In our example, the header is:

==385== Memcheck, a memory error detector
==385== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==385== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==385== Command: ./test

After that, Valgrind report the errors it found during the execution of your code. Errors are always reported as a description of the error in good old English, followed by where it happens in your code. Let’s look at the first error found by Valgrind:

==385== Conditional jump or move depends on uninitialised value(s)
==385==    at 0x4006A9: main (test.cpp:9)
==385==  Uninitialised value was created by a stack allocation
==385==    at 0x400686: main (test.cpp:3)

This tells us that there is an if statement (conditional statement) on line 9 of test.cpp in which at least one of the sides of the logical test has at least one uninitialized variable. As pointed out by Valgrind, line 9 of test.cpp has our problematic if statement which compares initialized variable n to uninitialized variable m, which will have whatever was put last in that memory address by the computer.

The second error block is the following:

==385== Invalid write of size 4
==385==    at 0x4006D9: main (test.cpp:12)
==385==  Address 0x5ab4c94 is 0 bytes after a block of size 20 alloc'd
==385==    at 0x4C2E8BB: operator new[](unsigned long) (vg_replace_malloc.c:423)
==385==    by 0x400697: main (test.cpp:5)

This means that your code is writing something in a location of memory that it did not allocated for its use. This block says that the illegal write, so to speak, happened in line 12 of test.cpp through a variable created in line 5 of test.cpp using the new[] operator. These lines correspond to var[i] = i; and to int *var = new int[5];. With this, we learned that either var was created too short on line 5 of test.cpp or that the for loop that assigns values to var goes one or more steps too far.

Similarly, the next block tells us that our printf statement used to print the value of var[5] on the screen has read past the amount of memory that was allocated to var in its declaration on line 5 of test.cpp, as shown below:

==385== Invalid read of size 4
==385==    at 0x4006F5: main (test.cpp:16)
==385==  Address 0x5ab4c94 is 0 bytes after a block of size 20 alloc'd
==385==    at 0x4C2E8BB: operator new[](unsigned long) (vg_replace_malloc.c:423)
==385==    by 0x400697: main (test.cpp:5)

The last thing Valgrind will report is the information about memory leaks, which are accounted for when the program is done running. The output about memory leaks for our example is:

==409== HEAP SUMMARY:
==409==     in use at exit: 20 bytes in 1 blocks
==409==   total heap usage: 3 allocs, 2 frees, 73,236 bytes allocated
==409==
==409== 20 bytes in 1 blocks are definitely lost in loss record 1 of 1
==409==    at 0x4C2E8BB: operator new[](unsigned long) (vg_replace_malloc.c:423)
==409==    by 0x400697: main (test.cpp:5)
==409==
==409== LEAK SUMMARY:
==409==    definitely lost: 20 bytes in 1 blocks
==409==    indirectly lost: 0 bytes in 0 blocks
==409==      possibly lost: 0 bytes in 0 blocks
==409==    still reachable: 0 bytes in 0 blocks
==409==         suppressed: 0 bytes in 0 blocks

The important points to take away from this last block are that:

  1. there were 20 bytes of memory leaks, meaning that if this were a function in your code every time it was run it would leave 20 bytes of garbage sitting in the RAM. This may not sound like a big deal but imagine if your code leaves 1 MB of garbage in the RAM for each of the 100,000 times a function is called. With this, there went 100 GB of RAM and everything else you were doing in your computer at that time because the computer will likely freeze and have to go through a hard-reset.
  2. the memory you allocated and did not free was allocated in line line 5 of test.cpp when you used the operator new[] to allocate the integer pointer array.

It is important to notice here that if we increase the amount of allocated memory by the new[] operator on line 5 to that corresponding to 6 instead of 5 integers, the last two errors (invalid read and invalid write) would disappear. This means that if you run your code with Valgrind and see hundreds of errors, chances are that it will take modifying a few lines of code to get rid of most of these errors.

Valkyrie — a graphical user interface for Valgrind

Another way of going through Valgrind’s output is by using Valkyrie (now installed in the login node of Reed’s cluster, The Cube). If you are analyzing your code from your own computer with a Linux terminal (does not work with Cygwin, but you can install a native Ubuntu terminal on Windows 10 by following instructions posted here) and do not have Valkyrie installed yet, you can install it by running the following on your terminal:

bernardoct@DESKTOP-J6145HK ~
$ sudo apt-get install valkyrie

Valkyrie works by reading an xml file exported by Valgrind containing the information about the errors it found. To export this file, you need to pass the flags --xml=yes and --xml-file=valgring_output.xml (or whatever name you want to give the file) to Valgrind, which would make the call to Valgrind become:

bernardoct@DESKTOP-J6145HK ~
$ valgrind --tool=memcheck --leak-check=full --track-origins=yes --xml=yes --xml-file=valgring_output.xml ./test

Now, you should have a file called “valgrind_output.xml” in the directory you are calling Valgrind from. To open it with Valkyrie, first open Valkyrie by typing valkyrie on your terminal if on Windows 10 you need to have Xming installed and running, which can be done by following the instructions in the end of this post. If on a cluster, besides having Xming open you also have to have ssh’ed into the cluster with the -X flag (e.g. by running ssh -X username@my.cluster.here) with either Cygwin or from a native Linux terminal. After opening Valkyrie, click on the green folder button and select the xml file, as in the screenshot below.

valkyrie_screenshot.png

After opening the xml file generated by Valgrind, Valkyrie should look like in the screenshot below:valkyrie_screenshot2

Now you can begin from a collapsed list of errors and unfold each error to see its details. Keep in mind that Valkyrie is not your only option of GUI for Valgrind, as IDEs like JetBrains’ CLion and QTCreator come integrated with Valgrind. Now go check your code!

PS: Thanks to folks on Redit for the comments which helped improve this post.

What is resilience in water resources systems?

Ah, resilience. Everybody talks about it, everybody has an idea (and sometimes a precise one) about what it means, but nobody seems to agree about what it is that it means. This post is about providing an overview of that topic for water resources systems: why people talk about it, how we can understand it, why it matters for adaptation and why it is more of an exciting research topic than a buzzword. I prepared a video to answer all those four questions in just over 4 minutes. Here it is.

An additional question would be: what are the connections with other topics abundantly described in this blog, such as robustness? Clues to the answer to this are in the video (a.k.a you should definitely watch it), and the answer to this is that resilience can be thought of as being robustness, or recovery if we are not robust. Academics and practitioners alike tend to focus on one of these aspects depending on the characteristics of the problem(s) they focus on.

That short answer can look a bit too easy, but it works whatever it is we are assessing robustness or resilience to: events, long-term changes, surprises (the state of the world is not what we thought), etc.

In practice that means that when we are concerned with resilience, we are concerned with three problems at once: 1) robustness within a safe operating space that we define according to our preferences, perceptions, the state of the system, etc., 2) a recovery problem outside of that safe space, and 3) where is our limit? Everything that has been done (in the Reed group and beyond) on robustness still applies when we operate our system in our safe space! Part of my work now is eliciting what is similar and what is different in the recovery problem, compared with the robustness problem. Another part is understanding what shapes the limit of a safe operating space.

More to follow on that soon… in the meantime, whoever is interested in how resilience amounts to solving two separate problems depending on the state of our system, can look up that work from my PhD thesis (yes, it even contains an application of the shallow lake problem!).

Geeky appendix

How did I make this video? I used the free and open-source Kazam software to record myself, screenshot and voice. Then I edited the video a bit with OpenShot (you have to leave your main screen to stop th recording with Kazam, so I had to edit that out), an equally free and open-source software. Both were easy to use and I would recommend them if you want to make this kind of video. One last thing: it is definitely interesting to record oneself talking! It is not easy to hold off the hesitations and quirks in your speech for 4 minutes…

A completely non-exhaustive list of tutorial resources for scientific computing

This is a short blog post to put together a list of resources with tutorials (similar to what’s usually found on this blog) for various programming languages. It is by no means exhaustive, so please comment if you feel there’s an important one I left out.

Matlab:

https://www.arnevogel.com/

Kinda new blog with Matlab tutorials on numerical methods

https://blogs.mathworks.com/loren/

One of the MathWorks blogs, great tutorials.

https://www.mathworks.com/support/learn-with-matlab-tutorials.html

Matlab tutorials from MathWorks

matlab

More Matlab tutorials, a lot of material on many topics

Python:

https://glowingpython.blogspot.com/

Not updated very frequently, but good data analysis and visualization tutorials are posted

https://pythonprogramming.net/

Updated regularly, some great data visualization and analysis tutorials. The tutorials come with videos. I really like this site.

http://treyhunner.com/

Updated regularly (about once a month) with python tutorials and general python coding insights. I really like the writing style of this author. 

https://docs.scipy.org/doc/scipy/reference/tutorial/

Tutorials on using SciPy

C and C++:

https://www.cprogramming.com/

C and C++ programming tutorials, tips and tricks

http://www.cplusplus.com/articles/

Not really updated anymore but some good basic tutorials are listed

https://blog.knatten.org/

Hadn’t been updated in a while, but it looks like it’s been picked up again. Good for general C++ programming and good practice.

http://www.bfilipek.com/

C++ tutorials

General:

https://towardsdatascience.com/

This is a great general resource not devoted to a particular language. They cover topics in data science, machine learning and programming in general. Some great tutorials for Python and R. 

https://projecteuler.net/

Mathematical programming problems to help with learning any language

https://github.com/EbookFoundation/free-programming-books/blob/master/free-programming-books.md

Free programming books repository

Reddits (some of the bigger ones):

/r/matlab (General on matlab, community provides help with coding)

/r/programming (General on programming)

/r/learnprogramming (Community provides help with debugging questions)

/r/python (General on python, community provides help with coding)

/r/learnpython (Community provides help with python questions, smaller than /r/python)

/r/cpp (General on C++)

/r/cpp_questions (Community provides help with C++ questions)

I’ve also recently made /r/sci_comp which has very little activity for the moment, but the aim is to create a community with general resources on coding for scientific applications.