MORDM Basics IV: Visualizing ROF-Storage Dynamics (finally)

The previous post described a simple, two-objective test case in which the city of Cary employed risk-of-failure (ROF) triggers as levers to adjust for its preferred tradeoff level between its objectives. The example given showed how ROF triggers allowed Cary to account for future uncertainty in its system inputs, thus enabling it to visualize how their risk appetite would affect their desired outcomes.

In meeting these objectives, different risk thresholds would have affected Cary’s response to extreme events such as floods and droughts, and its ability to fulfill demand. Simply analyzing the tradeoffs between objectives that result from a range of ROF trigger values only presents one side of the story. It is vital to visualize how these performance objectives and tradeoffs manifest in the system’s capacity (pun intended) to store enough water in times of scarcity, and by extension, its ability to fulfill its customers’ demand for water.

Using ROFs allow us to more concretely measure how the dynamics of both storage and demand fulfillment evolve and change over time for a given risk tolerance. In the long term, these dynamics will influence when and where new water infrastructure is built to cope with storage requirements and demand growth, but this is a topic for a future blog post. This week, we will focus on unpacking the dynamic evolution of storage and demand in response to different ROF trigger values.

As a quick refresher, our system is a water supply utility located in Cary, which is a city within the Research Triangle region in North Carolina (Trindade et al, 2017). Cary uses water-use restrictions when a weekly ROF exceeds the threshold of risk that Cary is willing to tolerate (α) during which only 50% of demand is met. More frequent water use restrictions help to maintain the reservoir levels and ensure reliability, which was defined in the previous blog post. However, the decision to implement restrictions (or not) will impact the storage levels of the system. With this in mind, we will first examine storage responds to the triggering of a water use restriction. For context, we consider a scenario in which Cary’s inflow timeseries is only 20% of the original levels. Figure 1 below shows the inflow, demand and storage timeseries for this scenario.

Figure 1: The hydrologic timeseries for Cary given that no water restrictions are implemented in a scenario where inflows are 20% of the original levels.

Cary’s challenge becomes apparent in Figure 1. While inflow decreases over time (fewer peaks), demand is steadily growing and has effectively tripled by the end of the period. This results in periods during which storage levels drop to zero, which occurs once past 2040. Also note that the frequency of low storage peaks have increased in the second half of the period. The following questions can thus be posed:

  1. How does the system’s ROF change with increasing demand and decreasing supply?
  2. How does risk tolerance affect the implementation of water-use restrictions during drought?
  3. How will the system reservoir levels respond to different levels of risk tolerance?
Figure 2: The length of the pink bars denote the nth-week during which the first water use restriction was implemented for a given α-value. This is an indicator of the responsiveness of the system to a drought, or decrease in storage levels. The blue line indicates the percent of storage filled with water.

To answer the first question, it is useful to identify how different values of α affect the first instance of a water-use restriction. Figure 2, generated using ‘rof_dynamics.py‘, demonstrates that lower risk tolerances result in earlier implementations of restrictions. This is reasonable, as an actor who more risk-averse will quickly implement water-use restrictions to maintain reliable levels of storage during a drought. However, an actor who is more willing to tolerate the change of low reservoir levels will delay implementing water use restrictions. The blue line juxtaposed on top of the bars indicates the inflows to the reservoir. After the first period of low flows between weeks 30-40, the plot shows that the amount of inflows do not recover, and is likely insufficient to fill the reservoir to initial levels. With a lower α, an actor is more likely to implement restrictions almost immediately after observing merely a few weeks of low inflows. In contrast, an actor who opts for a higher α will only resort to restrictions after seeing an extended period of low flows during which they can be more certain that restrictions are absolutely necessary.

Answering the second and third questions first require that periods of drought are more definitively quantified. To do this, the standardized streamflow indicator (SSI6) was used. The SSI6 is a method that identifies time periods during which the standardized inflow is less than the 6-month rolling mean (Herman et al, 2016). It detects a drought period when the value of the SSI6 < 0 for three consecutive months and SSI6 < -1 at least once during the three-month period. The juxtaposition of storage-restrictions and the periods of drought will allow us to see where restrictions were imposed and its impacts on reservoir levels for a given demand timeseries.

Figure 3 and Figure 4 are a visualization of how the system’s storage levels responds to drought (the red bars in the lower subplot) by implementing water-use restrictions (the dark red lines in the upper subplot) given α = 1% and α = 15% respectively. Predictably, restrictions coincide with periods of drought as defined by the SSI6. However, with a lower risk tolerance, period of restrictions are longer and more frequent. As Figure 3 shows, an actor with a lower risk tolerance may implement restrictions where only a slight risk of failure exists.

Figure 3: Storage dynamics given α=1%. (Upper subplot) The blue lines indicate the reservoir storage levels in billion gallons per week. The yellow lines are the weekly ROF values, or the likelihood that the percent of water stored will drop below 20% of the reservoir levels. The grey lines indicate where water use restrictions are implemented, and the red dashed line denotes α=2%. (Lower subplot) The zones are where droughts were detected using the SSI6 (Herman et al, 2016) method are highlighted in red.

Compared to α = 1%, an actor who is willing to tolerate higher ROF values (Figure 4 as an example) will implement restrictions less frequently and for shorter periods of time. Although this means that demands are less likely to get disrupted, this also puts water supplies at a higher risk of dropping to critical levels (<20%), as restrictions may not get implemented even during times of drought.

Figure 4: Storage dynamics given α=15%. (Upper subplot) The blue lines indicate the reservoir storage levels in billion gallons per week. The yellow lines are the weekly ROF values, or the likelihood that the percent of water stored will drop below 20% of the reservoir levels. The grey lines indicate where water use restrictions are implemented, and the red dashed line denotes α=15%. (Lower subplot) The zones are where droughts were detected using the SSI6 (Herman et al, 2016) method are highlighted in red.

There is one important thing to note when comparing Figures 3 and 4. When the periods water use restrictions coincide for both α-values (between 2040 and 2050), the actor with a lower tolerance implements water use restrictions at the beginning of both drought periods. This decision makes the biggest difference in terms of the reservoir storage levels. By implementing water use restrictions early and for a longer period of time, Cary’s reservoir levels are consistently kept at levels above 50% of full capacity (given full capacity of 7.45 BG). A different actor with higher risk tolerance resulted in water levels that dropped below the 30% of full capacity during periods of drought.

Although this seems undesirable, recall that the system is said to have failed if the capacity drops below 20% of full capacity. Herein lies the power of using an ROF metric – questions 2 and 3 can be answered by generating storage-restriction response figures as shown in the above figures, which allows an actor to examine the consequences of being varying levels of risk tolerance on their ability to fulfill demand while maintaining sufficient water levels. This ability can improve judgement on how much risk a utility can actually tolerate without adversely impacting the socioeconomic aspects of the systems dependent on a water supply utility. In addition, using ROFs enable a utility to better estimate when new infrastructure really needs to be built, instead of making premature investments as a result of unwarranted risk aversion.

To briefly summarize this blog post, we have shown how different risk tolerance levels affect the decisions made by an actor, and how these decisions in turn impact the system. Not shown here is the ability of an ROF to evolve over time given climate change and the building of new water supply infrastructure. In the next blog post, we will briefly discuss the role of ROFs in mapping out adaptation pathways for a utility, how ROFs form the basis of a dynamic and adaptive pathway and their associated operation policies, and connect this to the concept of the soft path (Gleick, 2002) in water supply management.

References

Gleick, P., 2002. Water management: Soft water paths. Nature, 418(6896), pp.373-373.

Herman, J., Zeff, H., Lamontagne, J., Reed, P. and Characklis, G., 2016. Synthetic Drought Scenario Generation to Support Bottom-Up Water Supply Vulnerability Assessments. Journal of Water Resources Planning and Management, 142(11), p.04016050.

Trindade, B., Reed, P., Herman, J., Zeff, H. and Characklis, G., 2017. Reducing regional drought vulnerabilities and multi-city robustness conflicts using many-objective optimization under deep uncertainty. Advances in Water Resources, 104, pp.195-209.

NetCDF Operators

This post is an introduction to Linux-based climate data and NetCDF operators (CDOs or NCOs) which allow you to perform various operations on netNDF files through the command line. I found these commands to be really nifty when I was working with pre-industrial control runs from a GCM. The output was being written on daily timestep, across 1200 years, and for the whole world, so it was absolutely essential that I cut the size of the files down as much as I could before transferring to my own computer.

The official documentation and installation instructions for NCO can be found here and CDO here, but if you’re working on a supercomputer, the libraries will already likely be installed. I will outline how I used some of these functions for my pre-industrial runs.

Concatenation

Some of the NCO commands have size limits of 40 GB, so it’s important to use the right order of operations when processing your files, which will be different depending on your ultimate goal. My goal was to ultimately get the 500-hpa geopotential height anomalies across the whole 1200 year period for specifically the Western US. Assuming you have a directory with all the NetCDF files, the first goal is to concatenate the data, since my run was broken into many smaller files. The easiest way to do this is with the following command which will take all the netcdf files in the directory (using the *) and merge them into a file called merged_file.nc:

cdo mergetime *.nc merged_file.nc

Return Individual Monthly Files

When calculating anomalies, you will need to determine a mean geopotential height value for each of the 12 months, and then calculate daily deviations with respect to these months to obtain daily deviations. You can do this with the following command:

cdo splitmon merged_file.nc zg500.mon

This command will return 12 files of the form zg500.mon$i.nc.

Return Monthly Means and Daily Anomalies

The next step is to calculate a monthly mean for each of these files. For example, for January use:

cdo timmean zg500.mon1.nc zg500.mean.mon1.nc

Return Anomalies

Now we subtract the means from each monthly file to return the daily anomalies for each month, which will be of the form: zg500.mean.mon${i}.anom.nc. If you want to combine the last two steps into one loop, you can use the code below:

for i in $(seq 1 12)
do
  cdo timmean zg500.mon${i}.nc zg500.mean.mon${i}.nc
  cdo sub zg500.mon${i}.nc zg500.mean.mon${i}.nc zg500.mean.mon${i}.anom.nc
done 

Cut Down to Geographical Area of Interest

Finally, we need to cut down the data just to the Western US. We use ncks (NetCDF Kitchen Sink) from NCO, which is probably the most versatile of all the functions (hence the name). This command is one that has the 40 GB limit, which is why I had to wait until I had monthly files before I could cut them down geographically. We must first specify the variable of interest using the -v flag. In my case, I only had one variable to extract, but you can also extract multiple in one command. Then denote the range of latitude and longitude using the -d flags. It is very important to include the periods at the end of each lat/lon (even if your bounds are integers) otherwise the command does not work.

for i in $(seq 1 12)
do
  ncks -v zg500 -d lon,180.,260. -d lat,30.,60. zg500.mean.mon${i}.cut.anom.nc -o zg500.mean.mon${i}.cut.anom.region.nc
done 

Ultimately, you will get 12 files of the form: zg500.mean.mon${i}.cut.anom.region.nc. And that’s it! You can concatenate the monthly files back together and try to resort the data back into the correct sequence according to time. I was unsuccessful at finding a quick way to do this, but it is possible. I found it much easier to work on this within R. I imported each of the 12 anomaly files, assigned a time vector, concatenated each monthly anomaly matrix into a larger matrix and then sorted according to date. If your files are small enough by the end of the process, this likely is the easiest way to take care of resorting. Enjoy!

Visualizing large directed networks with ggraph in R

How you choose to visualize complex multidimensional data can significantly shape the insights your audience derives from the plots. My colleague Antonia has written a couple of excellent blog posts on analyzing geospatial networks in Python using the NetworkX library which can be found here and here. I generally lean towards Python for coding but have recently come around on R, mostly because of how easy it is in R to make pretty network visualizations. I will go over some basic network visualizations in R using the igraph and ggraph libraries in this blog post. All the code and data I am using can be found here.

The data I will be using in this post is a processed and cleaned csv-file of Upper Colorado River Basin (UCRB) user interactions obtained from CDSS. The Colorado River is one of the most important river systems in North America, supplying water to millions of Americans. The river has been facing a record 20 year-drought — a situation that is further complicated by prior appropriation doctrine where senior users can put junior ones out of priority until their own needs are met.

Shown below is a snippet of the dataset. Column A shows the user whose water right was put out of priority in 2002 by a water right of column C. Column E shows the total number of days that a water right of column A was put out priority by one of column C in 2002. The rest of the columns contain user attributes.

Now on to turning this lifeless spreadsheet into some pretty pictures. We begin by importing all the necessary libraries.

library(tidyverse)
library(igraph)
library(ggraph)
library(dplyr)
library(ggplot2)

Next we will import the csv-file shown above, and create a list of nodes and edges. This information can be used by the igraph library to create a directed network object. In this directed network, the source node of each edge is priorityWdid (column C) while the destination node is analysisWdid (column A), since the former is putting a call on the latter to divert flow away.

# read network .csv file
data <- read.csv('network_csv_files/priorityWdid/v2_network_2002.csv')

# create nodes
from <- unique(data[c('priorityWdid','priorityStructure', 'priorityNetAbs', 
                      'priorityStreamMile')]) %>% rename(wdid = priorityWdid) %>%
  rename(structure = priorityStructure) %>% rename(netAbs = priorityNetAbs) %>%
  rename(streamMile = priorityStreamMile)
to <- unique(data[c('analysisWdid','analysisStructure', 'analysisNetAbs',
                    'analysisStreamMile')]) %>% rename(wdid = analysisWdid) %>% 
  rename(structure = analysisStructure) %>% rename(netAbs = analysisNetAbs) %>%
  rename(streamMile = analysisStreamMile)
nodes <- unique(rbind(from, to))

# create edges
edges <- data[c('priorityWdid', 'analysisWdid', 'sumWtdCount')] %>% rename(from = priorityWdid) %>% rename(to = analysisWdid)

# create network using igraph package
network <- graph_from_data_frame(d = edges, vertices = nodes, directed = TRUE)

This network has over 1400 nodes! That would be an unreadable mess of a visualization, so let us filter down this network to only include interactions between the 200 senior-most water rights in the UCRB.

# only include top 200 users by seniority
users <- read.csv('../data/CDSS_WaterRights.csv')
users <- head(users[order(users$Priority.Admin.No),], 200)

top_users <- users$WDID
for (i in 1:length(top_users)){
  top_users[i] <- toString(top_users[i])
}

# create subnetwork with only top 200 users by seniority
sub_nodes <- intersect(nodes$wdid, top_users)
subnet <- induced.subgraph(network, sub_nodes)

Excellent! Only 37 of the top 200 users in the UCRB interacted with each other in 2002. This is a much more manageable number for plotting. Let us start with the most basic visualization. In keeping with ggplot syntax, we can conveniently just keep adding plot specifications with a “+”.

# basic network graph
ggraph(subnet, layout = 'stress') +
  ggtitle('2002') + 
  geom_edge_link() +
  geom_node_point() +
  theme_graph()

Alright this is nice, but not particularly insightful. We have no idea which user each node corresponds to, and this figure would have us believe that all nodes and edges were created equal. When we constructed the network we actually added in a bunch of node and edge attributes, which we can now use to make our visuals more informative.

Shown below is a more helpful visualization. I went ahead and added some attributes to the network, and labelled all the nodes with their structure names. The nodes are sized by their out-degree and colored by their stream mile. The edge widths are determined by the total number of days the source node put the destination out of priority. In order to do this, I leveraged an amazing feature of ggplot2 and ggraph called aesthetic mapping which is quick way to map variables to visual cues on a plot. It automatically scales the data and creates a legend, which we can then further customize.

# plot graph in circular layout
ggraph(subnet, layout = "circle") +
  ggtitle('2002: "circle" Layout') +
  geom_edge_link(aes(width = sumWtdCount), alpha = 0.8, color = 'skyblue', 
                 arrow = arrow(length = unit(2, 'mm')), end_cap = circle(2, 'mm')) +
  labs(edge_width = "Number of days") + 
  geom_node_point(aes(size = deg, colour=streamMile)) +
  labs(colour = "Stream mile") +
  labs(size = "Out-degree") +
  scale_color_gradient(low = "skyblue", high = "darkblue") +
  scale_edge_width(range = c(0.2, 2)) +
  geom_node_text(aes(label = structure), repel = TRUE, size=2) +
  theme_graph()

The above network has a circle layout because it’s the easiest to read and is replicable. But there are actually a number of layouts available to choose from. Here is another one of my favorites, graphopt. While this layout is harder to read, it does a better job of revealing clusters in the network. The only change I had to make to the code above was swap out the word ‘circle’ for ‘graphopt’!

# plot graph in graphopt layout
set.seed(1998)
ggraph(subnet, layout = "graphopt") +
  ggtitle('2002: "graphopt" Layout') +
  geom_edge_link(aes(width = sumWtdCount), alpha = 0.8, color = 'skyblue', 
                 arrow = arrow(length = unit(2, 'mm')), end_cap = circle(2, 'mm')) +
  labs(edge_width = "Number of days") + 
  geom_node_point(aes(size = deg, colour=streamMile)) +
  labs(colour = "Stream mile") +
  labs(size = "Out-degree") +
  scale_color_gradient(low = "skyblue", high = "darkblue") +
  scale_edge_width(range = c(0.2, 2)) +
  geom_node_text(aes(label = structure), repel = TRUE, size=2) +
  theme_graph()

The above graph would be a lot easier to read if it weren’t for the long labels cluttering everything up. One way to deal with this is to adjust the opacity (alpha) of the text by the degree of the node. This way only the important and central nodes will have prominent labels. Again, all I have to do is add two extra words in line 12 of the code block above. Notice that I did set show.legend to False because I don’t want a legend entry for text opacity in my plot.

ggraph(subnet, layout = "graphopt") +
  ggtitle('2002: "graphopt" Layout') +
  geom_edge_link(aes(width = sumWtdCount), alpha = 0.8, color = 'skyblue', 
                 arrow = arrow(length = unit(2, 'mm')), end_cap = circle(2, 'mm')) +
  labs(edge_width = "Number of days") + 
  geom_node_point(aes(size = deg, colour=streamMile)) +
  labs(colour = "Stream mile") +
  labs(size = "Out-degree") +
  scale_color_gradient(low = "skyblue", high = "darkblue") +
  scale_edge_width(range = c(0.2, 2)) +
  geom_node_text(aes(label = structure, alpha = deg), repel = TRUE, size=2, show.legend = F) +
  theme_graph()

This is just a small sampling of the possibilities for network visualization in R. I have only just begun exploring the igraph and ggraph libraries, but the syntax is fairly intuitive, and the resultant plots are highly customizable. The data-to-viz blog is a pretty incredible resource to look at other network visualizations in R, if you are interested.