Radial convergence diagram (aka chord diagram) for sensitivity analysis results and other inter-relationships between data

TLDR; Python script for radial convergence plots that can be found here.

You might have encountered this type of graph before, they’re usually used to present relationships between different entities/parameters/factors and they typically look like this:

From https://datavizcatalogue.com/methods/non_ribbon_chord_diagram.html

In the context of our work, I have seen them used to present sensitivity analysis results, where we are interested in both the individual significance of a model parameter, but also the extent of its interaction with others. For example, in Butler et al. (2014) they were used to present First, Second, and Total order parameter sensitivities as produced by a Sobol’ Sensitivity Analysis.

From Butler et al. (2014)

I set out to write a Python script to replicate them. Calvin Whealton has written a similar script in R, and the same functionality also exists within Rhodium. I just wanted something with a bit more flexibility, so I wrote this script that produces two types of these graphs, one with straight lines and one with curved (which are prettier IMO). The script takes dictionary items as inputs, either directly from SALib and Rhodium (if you are using it to display Sobol results), or by importing them (to display anything else). You’ll need one package to get this to run: NetworkX. It facilitates the organization of the nodes in a circle and it’s generally a very stable and useful package to have.

Graph with straight lines
Graph with curved lines

I made these graphs to display results the results of a Sobol analysis I performed on the model parameters of a system I am studying (a, b, c, d, h, K, m, sigmax, and sigmay). The node size indicates the first order index (S1) per parameter, the node border thickness indicates the total order index (ST) per parameter, and the thickness of the line between two nodes indicates the secord order index (S2). The colors, thicknesses, and sizes can be easily changed to fit your needs. The script for these can be found here, and I will briefly discuss what it does below.

After loading the necessary packages (networkx, numpy, itertools, and matplotlib) and data, there is some setting parameters that can be adapted for the figure generation. First, we can define a significance value for the indices (here set to 0.01). To keep all values just set it to 0. Then we have some stylistic variables that basically define the thicknesses and sizes for the lines and nodes. They can be changed to get the look of the graph to your liking.

# Set min index value, for the effects to be considered significant
index_significance_value = 0.01
node_size_min = 15 # Max and min node size
node_size_max = 30
border_size_min = 1 # Max and min node border thickness
border_size_max = 8
edge_width_min = 1 # Max and min edge thickness
edge_width_max = 10
edge_distance_min = 0.1 # Max and min distance of the edge from the center of the circle
edge_distance_max = 0.6 # Only applicable to the curved edges

The rest of the code should just do the work for you. It basically does the following:

  • Define basic variables and functions that help draw circles and curves, get angles and distances between points
  • Set up graph with all parameters as nodes and draw all second order (S2) indices as lines (edges in the network) connecting the nodes. For every S2 index, we need a Source parameter, a Target parameter, and the Weight of the line, given by the S2 index itself. If you’re using this script for other data, different information can fit into the line thickness, or they could all be the same.
  • Draw nodes and lines in a circular shape and adjust node sizes, borders, and line thicknesses to show the relative importance/weight. Also, annotate text labels on each node and adjust their location accordingly. This produces the graph with the straight lines.
  • For the graph with the curved lines, define function that will generate the x and y coordinates for them, and then plot using matplotlib.

I would like to mention this script by Enrico Ubaldi, based on which I developed mine.

Google Earth Pro: the cool, the trade-offs and the limits

This post is an account of my recent experience with Google Earth Pro (GPE), a free online tool meant to make visualization of geographical information intuitive and accessible. This account contains tidbits that I thought others might find useful, but it is not meant to be an unbiased or comprehensive resource.

My goal in using GPE was to set up a short video tour of the reservoirs on the main stem of the Upper Snake River, which has its upper reaches around the Teton Range (WY, USA) before going through much of southern Idaho. Here is the video, produced with instructions from this post from the blog:

 

The cool

GPE is indeed an intuitive way for people that have only a limited experience of geographic information systems (GIS) to put together nice-looking videos in a limited amount of time. It relies on the increasingly large amounts of geographic data available online. For instance, polygons made of points with geographical coordinates, used to delineate political boundaries or catchments among others, can be found increasingly easily, be it under formats specific to GPE (KML, KMZ), or traditional GIS format such as SHP (shapefile format, which can be imported to GPE with… “File => Import”, I told you it was intuitive). This capability to find the right data could be improved by the recent launch of a dataset search engine by Google.

This video, after several tests, superposed three layers on the satellite images of the land surface:

  1. Pins of the dams’ locations. Please refer to this post to learn all you need to know about setting and customizing such new placemarks.
  2. The network of all streams with an average flow of more than 10 m3/s in the Columbia River basin. This is mainly for easy visualization of where the Snake River and its affluents are. That data was obtained from this useful USGS resource, with all the major basins in the US having data in GPE-ready KML format (basin boundaries, tributaries with their average flows, landuse, etc.).
  3. An invisible layer contains the shapes of the reservoirs’ lakes, in order to zoom in on the reservoir and not the dam itself. I got the shapes of waterbodies in the Upper Snake River basin from an amazing resource for water resources practitioners (see “The limits” below to see how to access this resource, but also to understand why it must be handled with care). Hint: since the list of waterbodies is dauntingly large and includes every little puddle in the area, I had to zoom in to the desired reservoir so as to only select its shape, by doing “File => Import”, then when prompted, by choosing to only import the features in my line of sight.

The trade-offs

The trade-offs when doing this kind of short intro video to your study area, are between video quality and required memory. This is especially true if the video is of a rather large area, as the GPE satellite images embed a level of detail that is amazing at the scale of that basin. Therefore, when creating my video, I had to resort to several tries before getting  a correct quality, and this video eats up 500Mo of disk space for 43 seconds (!!!!). Any video with lower resolution just looked downright disgusting, whereas any larger video may have serious problem running on your laptop. Think of it as a Goldilocks zone that you have to find if you want to have a video you can embed in an oral presentation.

(NB: to embed this video in a presentation in a fullproof kind of way, the easiest is to embed a link from the PPT file to the video in the same folder).

The limits

The limits of Google Earth are with its plain refusal to display features it deems too big. To understand this, let us look at this USGS webpage where a sizable of hydrological information can be downloaded. In particular it is possible to download data exactly for the upper Snake area (Subregion 1704 on the picture below). This data can include waterbodies as discussed above (in “the cool”) and that is contained in the NHD data ticket in the picture. By ticking “Watershed Boundary Dataset (WBD)”, one can also download basin and subbasin boundaries.

WBD_data

Why then did I not represent Upper Snake boundaries in my video? Well, the Upper Snake polygon, under SHP format at least, is too big so GPE just… refused to display it. I tried to represent the HU-6 and HU-8 subbasins (smaller than the HU-4-sized Upper Snake under this classification system, and:

  • The HU-6 subbasin just divides the basin into “headwaters” (smaller part upstream of Palisades) and “the rest”; the former feature was smaller and GPE plotted it, but it did not plot the latter.
  • The HU-8 subbasins all are individually smaller features that GPE accepts to plot. So I could have plotted the watershed boundaries by plotting ALL of the HU-8 subbasins, but spoiler alert: this looked horrendous.

Takeaway: GPE only displays features that individually have limited size. So maybe using a more powerful and recent desktop computer than the one I had would have done the trick… but be aware that there will always be a limit. Also note that assuming that GPE is just taking its time loading the data, and that staring blankly at the screen in the meantime is not going to be too painful, is not a good idea. Take a nap instead, or do something else, then admit that the damn thing just did not display.

A solution for this, of course, is to load the larger features on GIS software and making them lighter by eliminating points in the polygon without altering the shape… but that kinda beats the purpose of GPE, which is to avoid having to become a GIS geek, doesn’t it?

Introducing OpenMORDM

Introducing OpenMORDM

In this blog post, I will introduce a new open source library called OpenMORDM.  I will begin by defining MORDM (Multi-Objective Robust Decision Making), provide instructions for installing the software, and conclude with a case study.

What is MORDM?

Robust Decision Making (RDM) is an analytic framework developed by Robert Lempert and his collaborators at RAND Corporation that helps identify potential robust strategies for a particular problem, characterize the vulnerabilities of such strategies, and evaluate trade-offs among them [Lempert et al. (2006)]. Multiobjective Robust Decision Making (MORDM) is an extension of RDM to explicitly include the use of multiobjective optimization to discover robust strategies and explore the tradeoffs among multiple competing performance objectives [Kasprzyk et al. (2013)]. We encourage users to read these two articles to gain insight into these methods.

Lempert, R. J., D. G. Groves, S. W. Popper, and S. C. Bankes (2006). A General, Analytic Method for Generating Robust Strategies and Narrative Scenarios. Management Science, 52(4):514-528.

Kasprzyk, J. R., S. Nataraj, P. M. Reed, and R. J. Lempert (2013). Many objective robust decision making for complex environmental systems undergoing change. Environmental Modelling & Software, 42:55-71.

Continue reading

Announcing version 1.0 of pareto.py

Jon and I are pleased to announce version 1.0 of pareto.py, the free, open-source ε-nondominated sorting utility. Find the ε-nondominated solutions in your data today! ε-nondominated sorting lets you specify the resolution of your objectives to obtain a set of meaningfully distinct efficient points in your data set. (Confused? See this post for more context.)

ε-nondominated sorting

ε-nondominated sorting: Red ε-boxes are dominated, yellow solutions are ε-nondominated.

  • available on GitHub
  • licensed under LGPL
  • pure Python implementation with minimal dependencies (only standard library, does not require numpy)
  • ε-nondominated sorting
  • “importable”: use pareto.pyfrom other Python scripts
  • “pipelineable”: include pareto.py in a Unix pipeline
  • tag-along variables: columns (even non-numeric ones) other than those to be sorted may be retained in the output
  • verbatim transcription of input: nondominated rows in the output appear exactly the same as they did in the input
  • flexible column specification: e.g. 0-5 6 10-7
  • mix minimization and maximization objectives
  • skip header rows, commented rows, and blank lines
  • annotate each row in the output with the file it came from, including stdin
  • add line numbers to annotations
  • index columns from the end of the row, allowing rows of varying lengths to be sorted together

(source code for the impatient)

(previous announcement: version 0.2)

Announcing pareto.py: a free, open-source nondominated sorting utility

Jon Herman and I are pleased to announce version 0.2 of pareto.py, a free, open-source nondominated sorting utility. Released under the LGPL, pareto.py performs a nondominated sort on the data in any number of input files. To the best of our knowledge, this is the only standalone nondominated sorting utility available.

Implemented using only the Python standard library, pareto.py features:

  • epsilon-nondominated sorting
  • “importable” design: can be used from other Python scripts
  • tag-along variables: columns (even non-numeric ones) other than those to be sorted may be retained in the output
  • verbatim transcription of input: nondominated rows in the output appear exactly the same as they did in the input
  • optionally skip header rows, commented rows, and blank lines

We welcome feature requests, bug reports, and pull requests. If you’re in a hurry, here is the source code for version 0.2.

(Edit: bump to version 0.2)