Speeding up algorithm diagnosis by epsilon-sorting runtime files

Sometimes we need to calculate runtime metrics of Multiobjective Evolutionary Algorithm (MOEA) results in order to plan future runs, or to compare the performances of two different MOEAs. These calculations may prove challenging if small epsilon values were used during optimization because each Pareto front in each runtime file may contain thousands of solutions, which may make the hypervolume calculations take days to complete.. A possible solution may be to filter the Pareto fronts by using bigger epsilons in order to decrease the number of solutions in each front.

The following Python script can be used to split the Pareto fronts within each runtime file generated with BorgMOEA or MOEAFramework, apply new epsilon-dominance to each front, and recombine them into new smaller runtime files. The script and all runtime files must be in the same directory. The script output will be multiple files named after the original .runtime files but with extension .runtime_sorted instead.


import numpy as np
from glob import glob
from os import makedirs, system
from os.path import splitext, basename

def apply_epsilon_dominance(epsilons, number_of_objectives):
    files_fronts = []

    # Create a folder for each runtime file and store each Pareto front as a 
    # separate file.
    print '\nSpliting Pareto fronts.\n'
    for file in glob('*runtime'):
        print file

        s = open(file, 'r')
        fronts = s.read().split('#\r\n')
        folder = splitext(file)[0]
        makedirs(folder)
        for i in range(len(fronts) - 1):
            of = open(folder + "/" + str(i) + ".set", 'w')
            of.write(fronts[i])
            of.write('#')

    # Applies new epsilons to each Pareto front
    print '\nApplying new epsilons to each Pareto front.\n'
    for file in glob('*runtime'):
        print file
        folder = splitext(file)[0]
        for i in range(len(fronts) - 1):
            system('java -cp MOEAFramework-2.0-Executable.jar org.'
                'moeaframework.analysis.sensitivity.ResultFileMerger -d ' + 
                str(number_of_objectives) + ' ./'+ folder + '/' + str(i) + 
                '.set -e ' + epsilons + ' -o ' + folder + '/' + 
                str(i).zfill(2) + '.sorted')

    # Combines epsilon sorted Pareto fronts in new runtime files with 
    # extension .runtime_sorted
    print '\nCombining sorted fronts into new runtime_sorted files.\n'
    for file in glob('*runtime'):
        print file
        folder = splitext(file)[0]
        output_str = ''
        
        of = open(folder + '.runtime_sorted', 'w')

        for f in glob(folder + '/*.sorted'):
            output_str += open(folder + '/' + basename(f)).read() + "#\n"
            
        of.write(output_str)


# epsilons and number of objectives to be used
epsilons = '0.005,0.02,0.02,0.01,0.01,0.01'
number_of_objectives = 6

# Calls the function to apply epsilon dominance
apply_epsilon_dominance(epsilons, number_of_objectives)
Advertisements

From Writing NetCDF Files in C to Loading NetCDF Files in R

So much data from such little models…

It’s been my experience that even simple models can generate lots of data. If you’re a regular reader of this blog, I can imagine you’ve had similar experiences as well. My most recent experience with this is the work I’ve done with the Dynamic Integrated Climate-Economic model (DICE). I had inherited a port of the 2007 version of the model, which would print relevant output to the screen. During my initial runs with the model, I would simply redirect the output to ascii files for post-processing. I knew that eventually I would be adding all sorts of complexity to this model, ultimately leading to high-dimensional model output and rendering the use of ascii files as impractical. I knew that I would need a better way to handle all this data. So in updating the model to the 2013 version, I decided to incorporate support for netCDF file generation.

You can find details about the netCDF file format through Unidata (a University Cooperation for Atmospheric Research [UCAR] Community Program) and through some of our previous blog posts (here, here, and here). What’s important to note here is that netCDF is a self-describing file format designed to manage high-dimensional hierarchical data sets.

I had become accustomed to netCDF files in my previous life as a meteorologist. Output from complex numerical weather prediction models would often come in netCDF format. While I had never needed to generate my own netCDF output files, I found it incredibly easy and convenient to process them in R (my preferred post-processing and visualization software). Trying to incorporate netCDF output support in my simple model seemed daunting at first, but after a few examples I found online and a little persistence, I had netCDF support incorporated into the DICE model.

The goal of this post is to guide you through the steps to generate and process a netCDF file. Some of our earlier posts go through a similar process using the Python and Matlab interfaces to the netCDF library. While I use R for post-processing, I generally use C/C++ for the modeling; thus I’ll step through generating a netCDF file in C and processing the generated netCDF file in R on a Linux machine.

Edit:  I originally put a link to following code at the bottom of this post.  For convenience, here’s a link to the bitbucket repository that contains the code examples below.

Writing a netCDF file in C…

Confirm netCDF installation

First, be sure that netCDF is installed on your computing platform. Most scientific computing clusters will have the netCDF library already installed. If not, contact your system administrator to install the library as a module. If you would like to install it yourself, Unidata provides the source code and great documentation to step you through the process. The example I provide here isn’t all that complex, so any recent version (4.0+) should be able to handle this with no problem.

Setup and allocation

Include the header files

With the netCDF libraries installed, you can now begin to code netCDF support into your model. Again, I’ll be using C for this example. Begin by including the netCDF header file with your other include statements:

#include <stdlib.h>
#include <stdio.h>
#include <netcdf.h>

Setup an error handler

The netCDF library includes a nice way of handling possible errors from the various netCDF functions. I recommend writing a simple wrapper function that can take the returned values of the netCDF functions and produce the appropriate error message if necessary:

void ncError(int val)
{
  printf("Error: %s\n", nc_strerror(val));
  exit(2);
}

Generate some example data

Normally, your model will have generated important data at this point. For the sake of the example, let’s generate some data to put into a netCDF file:

  // Loop control variables
  int i, j, k;
  
  // Define the dimension sizes for
  // the example data.
  int dim1_size = 10;
  int dim2_size = 5;
  int dim3_size = 20;
  
  // Define the number of dimensions
  int ndims = 3;
  
  // Allocate the 3D vectors of example data
  float x[dim1_size][dim2_size][dim3_size]; 
  float y[dim1_size][dim2_size][dim3_size];
  float z[dim1_size][dim2_size][dim3_size];
  
  // Generate some example data
  for(i = 0; i < dim1_size; i++) {
      for(j = 0; j < dim2_size; j++) {
          for(k = 0; k < dim3_size; k++) {
              x[i][j][k] = (i+j+k) * 0.2;
              y[i][j][k] = (i+j+k) * 1.7;
              z[i][j][k] = (i+j+k) * 2.4;
          }
      }
    }

This generates three variables, each with three different size dimensions. Think of this, for example, as variables on a 3-D map with dimensions of [latitude, longitude, height]. In my modeling application, my dimensions were [uncertain state-of-the-world, BORG archive solution, time].

Allocate variables for IDs

Everything needed in creating a netCDF file depends on integer IDs, so the next step is to allocate variables for the netCDF file id, the dimension ids, and the variable ids:

// Allocate space for netCDF dimension ids
int dim1id, dim2id, dim3id;
  
// Allocate space for the netcdf file id
int ncid;
  
// Allocate space for the data variable ids
int xid, yid, zid;

Each one of these IDs will be returned through reference by the netCDF functions. While we’re at it, let’s make a variable to hold the return status of the netCDF function calls:

// Allocate return status variable
int retval;

Define the meta-data

Now we will start to build the netCDF file. This is a two-part process. The first part is defining the meta-data for the file and the second part is assigning the data.

Create an empty netCDF file

First, create the file:

// Setup the netcdf file
if((retval = nc_create("example.nc", NC_NETCDF4, &ncid))) { ncError(retval); }

Note that we store the return status of the function call in retval and test the return status for an error. If there’s an error, we pass retval to our error handler. The first parameter to the function call is the name of the netCDF file. The second parameter is a flag that determines the type of netCDF file. Here we use the latest-and-greatest type of NETCDF4, which includes the HDF5/zlib compression features. If you don’t need these features, or you need a version compatible with older versions of netCDF libraries, then use the default or 64-bit offset (NC_64BIT_OFFSET) versions. The third parameter is the netCDF integer ID used for assigning variables to this file.

 Add the dimensions

Now that we have a clean netCDF file to work with, let’s add the dimensions we’ll be using:

 // Define the dimensions in the netcdf file
 if((retval = nc_def_dim(ncid, "dim1_size", dim1_size, &dim1id))) { ncError(retval); }
 if((retval = nc_def_dim(ncid, "dim2_size", dim2_size, &dim2id))) { ncError(retval); }
 if((retval = nc_def_dim(ncid, "dim3_size", dim3_size, &dim3id))) { ncError(retval); }
  
 // Gather the dimids into an array for defining variables in the netcdf file
 int dimids[ndims];
 dimids[0] = dim1id;
 dimids[1] = dim2id;
 dimids[2] = dim3id;

Just as before, we catch and test the function return status for any errors. The function nc_def_dim() takes four parameters. First is the netCDF file ID returned when we created the file. The second parameter is the name of the dimension. Here we’re using “dimX_size” – you would want to use something descriptive of this dimension (i.e. latitude, time, solution, etc.). The third parameter is the size of this dimension (i.e. number of latitude, number of solutions, etc.). The last is the ID for this dimension, which will be used in the next step of assigning variables. Note that we create an array of the dimension IDs to use in the next step.

 Add the variables

The last step in defining the meta-data for the netCDF file is to add the variables:

// Define the netcdf variables
if((retval = nc_def_var(ncid, "x", NC_FLOAT, ndims, dimids, &xid))) { ncError(retval); }
if((retval = nc_def_var(ncid, "y", NC_FLOAT, ndims, dimids, &yid))) { ncError(retval); }
if((retval = nc_def_var(ncid, "z", NC_FLOAT, ndims, dimids, &zid))) { ncError(retval); }

The nc_def_var() function takes 6 parameters. These include (in order) the netCDF file ID, the variable name to be displayed in the file, the type of data the variable contains, the number of dimensions of the variable, the IDs for each of the dimensions, and the variable ID (which is returned through reference). The type of data in our example is NC_FLOAT, which is a 32-bit floating point. The netCDF documentation describes the full set of data types covered. The IDs for each dimension are passed as that combined array of dimension IDs we made earlier.

 Optional: Add variable attributes

This part is optional, but is incredibly useful and true to the spirit of making a netCDF file. When sharing a netCDF file, the person receiving the file should have all the information they need about the data within the file itself. This can be done by adding “attributes”. For example, let’s add a “units” attribute to each of the variables:

 // OPTIONAL: Give these variables units
 if((retval = nc_put_att_text(ncid, xid, "units", 2, "cm"))) { ncError(retval); }
 if((retval = nc_put_att_text(ncid, yid, "units", 4, "degC"))) { ncError(retval); }
 if((retval = nc_put_att_text(ncid, zid, "units", 1, "s"))) { ncError(retval); }

The function nc_put_att_text() puts a text-based attribute onto a variable. The function takes the netCDF ID, the variable ID, the name of the attribute, the length of the string of characters for the attribute, and the text associated with the attribute. In this case, we’re adding an attribute called “units”. Variable ‘x’ has units of “cm”, which has a length of 2. Variable ‘y’ has units of “degC”, which has a length of 4 (and so on). You can apply text-based attributes as shown here or numeric-based attributes using the appropriate nc_put_att_X() function (see documentation for the full list of numeric attribute functions). You can also apply attributes to dimensions by using the appropriate dimension ID or set a global attribute using the ID “0” (zero).

 End the meta-data definition portion

At this point, we’ve successfully created a netCDF file and defined the necessary meta-data. We can now end the meta-data portion:

 // End "Metadata" mode
  if((retval = nc_enddef(ncid))) { ncError(retval); }

…and move on to the part 2 of the netCDF file creation process.

Populate the file with data

Put your data into the netCDF file

Here, all we do is put data into the variables we defined in the file:

 // Write the data to the file
 if((retval = nc_put_var(ncid, xid, &x[0][0][0]))) { ncError(retval); }
 if((retval = nc_put_var(ncid, yid, &y[0][0][0]))) { ncError(retval); }
 if((retval = nc_put_var(ncid, zid, &z[0][0][0]))) { ncError(retval); }

The function nc_put_var() takes three parameters: the netCDF file ID, the variable ID, and the memory address of the start of the multi-dimensional data array. At this point, the data will be written to the variable in the netCDF file. There is a way to write to the netCDF file in data chunks, which can help with memory management, and a way to use parallel I/O for writing data in parallel to the file, but I have no experience with that (yet). I refer those interested in these features to the netCDF documentation.

Finalize the netCDF file

That’s it! We’re done writing to the netCDF file. Time to close it completely:

 // Close the netcdf file
 if((retval = nc_close(ncid))) { ncError(retval); }

Compile and run the code

Let’s compile and run the code to generate the example netCDF file:

gcc -o netcdf_example netcdf_write_example.c -lnetcdf

Some common problems people run into here are not including the netCDF library flag at the end of the compilation call, not having the header files in the include-path, and/or not having the netCDF library in the library-path. Check your user environment to make sure the netCDF paths are included in your C_INCLUDE_PATH and LIBRARY_PATH:

env | grep –i netcdf

Once the code compiles, run it to generate the example netCDF file:

./netcdf_example

If everything goes according to plan, there should be a file called “example.nc” in the same directory as your compiled code. Let’s load this up in R for some post-processing.

 Reading a netCDF file in R…

Install and load the “ncdf4” package

To start using netCDF files in R, be sure to install the netCDF package “ncdf4”:

install.packages("ncdf4")
library(ncdf4)

Note that there’s also an “ncdf” package. The “ncdf” package reads and writes the classic (default) and 64-bit offset versions of netCDF file. I recommend against using this package as the new package “ncdf4” can handle the old file versions as well as the new netCDF4 version.  Turns out the “ncdf” package has been removed from the CRAN repository.  It’s just as well since the new “ncdf4” package obsoletes the “ncdf” package.


Open the netCDF file

With the library installed and sourced, let’s open the example netCDF file we just created:

 nc <- nc_open("example.nc")

This stores an open file handle to the netCDF file.

View summary of netCDF file

Calling or printing the open file handle will produce a quick summary of the contents of the netCDF file:

 print(nc)

This summary produces the names of the available variables, the appropriate dimensions, and any global/dimension/variable attributes.

Extract variables from the netCDF file

To extract those variables, use the command:

x <- ncvar_get(nc, "x")
y <- ncvar_get(nc, "y")
z <- ncvar_get(nc, "z")

At this point, the data you extracted from the netCDF file are loaded into your R environment as 3-dimensional arrays. You can treat these the same as you would any multi-dimensional array of data (i.e. subsetting, plotting, etc.). Note that the dimensions are reported in reverse order from which you created the variables.

dim(x)

 Close the netCDF file

When you’re done, close the netCDF file:

nc_close(nc)

And there you have it! Hopefully this step-by-step tutorial has helped you incorporate netCDF support into your project. The code I described here is available through bitbucket.

Happy computing!

~Greg

Visualization strategies for multidimensional data

This is the first part of a series of blog posts on multidimensional data visualization strategies.   The main objectives of this first part are:

  1. Show you how to expand plotting capabilities by modifying matplotlib source code.
  2. Generate a tailored 6-D Pareto front plot with completely customized legends.
  3. Provide a glimpse of a recently developed Pareto front video repository in R.

1. Expanding matplotlib capabilities

Keeping in mind that matplotlib is an opensource project developed in the contributors’ free time, there is no guarantee that features that contributors make will be added straightaway.  In my case, I needed the marker rotation capabilities in a 3 D scatter plot.  Luckily, someone already had figured out how to do so and started a pull request in the matplotlib github repository but this change has not yet been implemented.  Since I couldn’t wait for the changes to happen, here’s the straightforward solution that I found:

Here’s  the link to the  pull request that I am referring to.

First, I located where Matplotlib lives in my computer, the path in my case is:

C:/Python27/matplotlib

Then, I located the files that the contributor changed.  The files’ paths are circled in red in the following snippets of the pull request:

githubsnippet.png

github_snippet2.png

I located those files in my local matplotlib folder, which in my case are:

C:/Python27/matplotlib/axes/_axes.py

C:/Python27/matplotlib/collections.py

In the previous snippets, the lines of code that were added to the original script are highlighted in green and those that were removed are highlighted in red.  Hence, to access the clean version I clicked on the view button and selected the entire script and copied and pasted it in my local matplotlib code.  For this exercise I ended changing only a couple of scripts: the axes.py and the collections.py.

NOTE:  If you ever need to undertake this type of solution, make sure you paste the lines of code in the right places, do this part carefully.   Also, it’s always a good idea to make backups of the original files in case something goes irreversibly wrong.  Or you can always uninstall and install, no big deal.

2. Generate a tailored 6D Pareto front plot with customized legends.

Matplotlib allows visualization of 5 objectives quite easily, but scaling to 6 or more objectives can be a bit tricky.  So, lets walk through our  6 D  plots in Matplotlib. We will learn how to do one of the following plots:

Pie Day  Plot:

pie.png

St. Patrick’s Day  Plot:

stpatricks.png

2.1. Required libraries:

The following are the only libraries that you’ll need.   I import seaborn sometimes because it looks fancy but it’s totally unnecessary in this case, which is why it is commented out.

import matplotlib as mpl
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import matplotlib.pyplot as plt
#import seaborn

2.2. Importing data:

The data file that I used consists of 6 space-separated columns, if your data has another delimiter you can just add it like so:   data= np.loadtxt(‘sample_data.txt, delimiter=’,’).  I am also multiplying the first five columns by -1 because I want to remove the negatives, this is specific to my data, you may not require to do so.

data= np.loadtxt('sample_data.txt')

#Organizing the data by objectives
obj1 = data[:,0]*-1
obj2 = data[:,1]*-1
obj3 = data[:,2]*-1
obj4 = data[:,3]*-1
obj5 = data[:,4]*-1
obj6 = data[:,5]

2.3. Object-based plotting:

To allow more customization, we need to move to a more object-based way to make the plots.  That is, storing  elements of the  plots in variables.

&lt;span class=&quot;n&quot;&gt;
fig = plt.figure() # create a figure object
ax = fig.add_subplot(111, projection='3d') # create an axes object in the figure

2.4. Setting marker options:

Any mathtext symbol can be used as a marker.  In order to use rotation to represent an additional objective  it’s preferable if the marker has a single axis of symmetry so that the rotation is distinguishable.  Here are some marker options:

pie=r'$\pi$' #pie themed option
arrow = u'$\u2193$' # Arrows
clover=r'$\clubsuit$' #Saint Patrick's theme
heart=r'$\heartsuit$' # Valentine's theme
marker=pie #this is were you provide the marker options

More marker options can be found in : http://matplotlib.org/users/mathtext.html

2.4.  Scatter 6D plot:

The first three objectives are plotted in a 3-D scatter plot, in the x,y, and z axis respectively.  The fourth objective is represented by color, the fifth by size and the sixth by rotation.  Note that the rotation is scaled in degrees.  This is the step were I had to modify matplotlib source code to enable the ‘angles’ option shown below.  Also, it may be required to scale the size objective to have the desired marker size in your plot.  You can also plot the ideal point by adding a second scatter plot specifying the ideal values for each objective.  Finally, we assign the size objective “objs” and rotation objective “objr”, this will be useful later on when setting up the legend for these two objectives.

rot_angle=180 #rotation angle multiplier
scale=2000 #size objective multiplier
#Plotting 6 objectives:
im= ax.scatter(obj1, obj2, obj3, c=obj4, s= obj5*scale, marker=marker, angles=obj6*rot_angle, alpha=1, cmap=plt.cm.summer_r)
ax.scatter(1,1,0, marker=pie, c='seagreen', s=scale, alpha=1)
objs=obj5 #size objective
objr=obj6 #rotation objective

2.5.  Main axis labels and limits:

This is extremely straightforward, you can set the x,y, and z labels and specify their limits as follows:

#Main axis labels:
ax.set_xlabel('Objective 1')
ax.set_ylabel('Objective 2')
ax.set_zlabel('Objective 3')
#Axis limits:
plt.xlim([0,1])
plt.ylim([0,1])
ax.set_zlim3d(0, 1)

2.6.  Color bar options:

The colorbar limits and labels can also be specified, as shown in the code below.  There are many colormap options in matplotlib, some of the most popular ones are: jet, hsv and spectral.   As an example, if you want to change the colormap in the code shown in part 2.4, do cmap= plt.cm.hsv.  To reverse the colormap attach an ‘_r ‘ like so: cmap= plt.cm.hsv_r.  There is also a color brewer package for the more artistic plotter.

# Set the color limits.. not necessary here, but good to know how.
im.set_clim(0.0, 1.0)

#Colorbar label:
cbar = plt.colorbar(im)
cbar.ax.set_ylabel('Objective 4')

2.6.  Size and rotation legends:

This is were it gets interesting.  The first couple of lines get the labels for legend and chose which ones to display.  This allows for much flexibility when creating the legends.  As you can see in the code below, you can show markers that correspond to the maximum and the minimum objective values to orient the reader.  You can assign the spacing between lines in the legend, the  title, weather you want to frame your legend or not, the location in the figure, etc.  Line 22 of the following code shows how to add more than one legend.  There are many options for an entirely customized legend in the legend documentation which you can explore for more options.

&lt;pre&gt;handles, labels = ax.get_legend_handles_labels()
display = (0,1,2)

#Code for size and rotation legends begins here for Objectives 5 and 6:
min_size=np.amin(objs)
max_size=np.amax(objs)

#Custom size legend:
size_max = plt.Line2D((0,1),(0,0), color='k', marker=marker, markersize=max_size,linestyle='')
size_min = plt.Line2D((0,1),(0,0), color='k', marker=marker, markersize=min_size,linestyle='')
legend1= ax.legend([handle for i,handle in enumerate(handles) if i in display]+[size_max,size_min],
[label for i,label in enumerate(labels) if i in display]+[&quot;%.2f&quot;%(np.amax(objs)), &quot;%.2f&quot;%(np.amin(objs))], labelspacing=1.5, title='Objective 6', loc=1, frameon=True, numpoints=1, markerscale=1)

markersize=15
#Custom rotation legend
rotation_max = plt.Line2D((0,1),(0,0),color='k',marker=r'$\Uparrow$', markersize=15, linestyle='')
rotation_min = plt.Line2D((0,1),(0,0),color='k', marker=r'$\Downarrow$', markersize=15, linestyle='')
ax.legend([handle for i,handle in enumerate(handles) if i in display]+[rotation_max,rotation_min],
[label for i,label in enumerate(labels) if i in display]+[&quot;%.2f&quot;%(np.amax(objr)), &quot;%.2f&quot;%(np.amin(objr))], labelspacing=1.5, title='Objective 5',loc=2, frameon=True, numpoints=1, markerscale=1)

plt.gca().add_artist(legend1)

plt.show()

You can find the full code for the previous example in the following github repository:

https://github.com/JazminZatarain/Visualization-of-multidimensional-data/blob/master/paretoplot6d.py

3. Generate 6D Pareto front and runtime videos in R.

And last but not least, let me direct everyone to Calvin’s repository: https://github.com/calvinwhealton/ParetoFrontMovie.  Where  you can find the paretoMovieFront6D.R script which enables the exploration of  the evolution of a  6D Pareto front.   It is an extremely flexible tool and it has around 50 customization options to adapt your video or your plot to your visual needs, all you need is your runtime output, so check it out.  I made the tiniest contribution to this repository so I feel totally entitled to talk about it.   Here is a snippet of the video:

video.png

A beginner’s guide to narrating the multiobjective tradeoff story

A posteriori decision making aided by state of the art multiobjective evolutionary algorithms can improve upon traditional a priori weighting techniques for solving complex engineering problems. Though advanced computing power is essential for enumerating the Pareto front of “non-dominated” solutions, computers are not able to perfectly capture stakeholder preferences or choose solutions that best fit those preferences. In a multiobjective decision setting, it is the job of the analyst to guide decision makers through the multiobjective search process and allow them to understand the logical progression of the analysis. In other words, it is the job of the analyst to use the MOEA results to narrate the “multiobjective tradeoff story”. Visual analytics are a key tool in this narration, particularly the use of parallel axis and 3-D scatter plots. This post will introduce some basic visual analytic tools and explain some helpful techniques that may be used for this narration. Examples are provided from 200 Pareto optimal solutions generated by MOEA search on a 5 objective problem formulation, where each objective was to be minimized.

Visualizing tradeoffs: 3-D scatter plots and parallel axis plots

Heterogeneous plots have been found to improve user’s ability to comprehend complex data sets. Two types of plots that can be used in tandem are parallel axis plots and 3D-scatter plots. 3D-scatter plots allow decision makers to plot 3 objectives against each other in 3 dimensional plot. The 3D-scatter plot is not limited to 3 objectives however; analysts can include additional objectives by modifying the size, color, shape and orientation of the plotted points. A 3D-scatter plot of the example data generated using Matlab can be found below.

Example_3D_Scatter

In the figure above, objective 1 is plotted on the x-axis, objective 2 on the y-axis,  and objective 3 on the z-axis. Objective 4 is displayed through point size and objective 5 is displayed through point color

Another helpful visualization tool is the parallel axis plot. Parallel axis plots display each objective on a separate axis, whose scale is normalized to the range of scales given in the objective values. Each of solution is plotted as a line across the axes, with the objective function value for each objective plotted on its respective axis. Crossing lines represent tradeoffs between objectives for two different solutions. Two key benefits of parallel axis plots are their ability to scale to an unlimited number of dimensions and present different coordinates in a uniform manner. A parallel axis plot for the given data can be found below.

Screen Shot 2016-03-08 at 5.22.18 PM

The above plot shows the 5 objectives on parallel axes. Each line represents a different solution from the Pareto set. Crossing lines indicate tradeoffs between objectives. Line colors are based off the objective 1 value for each solution. The axis furthest to the right represents the row number for each solution in the csv file containing solutions.

The above plot was generated using the Web Cornell Tool, which can be found here: http://reed.cee.cornell.edu/parallel-axis/. This simple and easy to use tool allows decision makers to quickly make visually appealing parallel axis plots without writing a single line of code. To create a plot, you only need a csv file of Pareto optimal solutions. A detailed description of the tool can be found in Bernardo’s original post: https://waterprogramming.wordpress.com/2015/03/24/creating-parallel-axes-plots/

Narrating the MO Tradeoff Story Chapter 1: Plotting the baseline

Once you’ve learned how to create your 3-D scatter and parallel axis plots, you are ready to begin narrating the multiobjective tradeoff story. A key advantage of using a multiobjective problem formulation is that they may bring to light tradeoffs inherent to the system that decision makers had not previously been aware of. Decision makers are often hesitant to deviate from their current or “baseline” course of action without substantial evidence that alternate courses of action may be beneficial. By evaluating the baseline course of action along with alternate solutions from a search based on the multiobjective formulation, analysts can explicitly examine the tradeoffs they currently face, and objectively judge them against those discovered through search. The 3-D scatter plot and parallel axis plots below show how this may be done.

Screen Shot 2016-03-08 at 5.39.17 PM

In this plot, the baseline solution is shown as the diamond. 

Screen Shot 2016-03-08 at 5.48.29 PM

The baseline solution is highlighted in red, while the solutions found through MOEA search are plotted in the background.

Through use of the two plots above, the analyst can make a compelling argument to decision makers that that deviating from the baseline solution may be of beneficial. It is apparent that the baseline does very well in objectives 1, but rather poorly compared to the solutions found through MOEA search in objectives 2, 3, 4 and 5. Scenarios like this may result from baseline operating procedures that were based on single objective problem formulations where between conflicting objectives were not fully understood.

Chapter 2: Narrowing the field of potential solutions

Once decision makers have been made aware of the tradeoffs that are inherent to the baseline solution, the analyst can begin to guide them to find the most preferable solution from the Pareto optimal set. One potential pitfall of discovering solutions through MOEA search is the sheer number of Pareto optimal solutions that may be generated. A helpful tool to narrow the field of potential solutions is the technique known as brushing. To “brush” the data, decision makers simply choose acceptable/desirable ranges of each objective value and then filter the data to display only the Pareto optimal values that fall within that range. The Web Cornell Tool has two simple to use features that make it easy for the user to brush the data. Users can simply select acceptable values on each objectives axis or create 2-D vectors between objectives that specify the slope of tradeoff lines. A brushed version of the parallel axis plot and scatter plots can be found below. The brakes on the parallel axes depict the ranges of solutions deemed “acceptable” for consideration.

Screen Shot 2016-03-08 at 5.56.01 PM

A brushed version of the parallel axis plot leave decision makers with a much more manageable set of potential solutions

brushed

A brushed version of the 3D-scatter plot.

Chapter 3 and beyond…

The narration of the multiobjective tradeoff story is not meant to be a method for the analyst to dictate which decision should be made, but rather to provide insight into the nature of tradeoffs within those decisions. It is then up to the decision makers to use this information to come up with a policy that most closely fits their preferences. This post has provided two simple ways of using visual analytics to understand multiobjective tradeoffs and two techniques for narrating the “tradeoff story”. Future posts will take a deeper dive into multiobjective tradeoff analysis, including the search for system robustness and methodology for mapping the effects of solutions to different stakeholders.

Making Movies of Time-Evolving Global Maps with Python

Making Movies of Time-Evolving Global Maps with Python

Hi All,

These past few months I’ve been working with the Global Change Assessment Model (GCAM) which is an integrated assessment model (IAM) that combines models of the global climate and economic systems. I wrote an earlier post on compiling GCAM on a Unix cluster.  This post discusses some visualization tools I’ve developed for GCAM output.

GCAM models energy and agriculture systems at a regional level, where the world is composed of 32 regions.  We’re interested in tracking statistics (like the policy cost of stabilization) over time and across regions.  This required three things:

  1. The ability to draw a global map.
  2. The ability to shade individual political units on that map.
  3. The ability to animate this map.

Dr. Jon Herman has already posted a good example of how to do (1) in python using matplotlib’s Basemap.  We’ll appropriate some of his example for this example.  The Basemap has the option to draw coastlines and boundaries, but these boundaries are not tied to shapes, meaning that you can’t assign different colors to individual countries (task (2) above).  To do that, we need a shapefile containing information about political boundaries.  You can find these for free from a number of sources online, but I like Natural Earth.  They provide data on many different scales. For this application I downloaded their coarsest data set.  To give each country a shade which is tied to data, we use matplotlib’s color map.  The basic plan is to generate a colored map for each time-step in our data, and then to animate the maps using the convert linux command.

Now that we’ve described roughly how we’ll proceed, a word about the data we’re dealing with and how I’ve handled it.  GCAM has 32 geo-political regions, some of which are individual countries (like the USA or China), while others are groups of countries (like Australia & New Zealand). I stored this information using a list of lists (i.e. a 32-element list, where each element is a list of countries in that region). I’ve creatively named this variable list_list in this example (see code below). For each of the regions GCAM produces a time series of policy costs as a fraction of GDP every 5 years from 2020-2100. I’ve creatively named this variable data. We want to tie the color of a country in each time to its policy cost relative to costs across countries and times.  To do this, I wrote the following (clumsy!) Python function, which I explain below.


def world_plot(data,idx,MN,MX):
 from mpl_toolkits.basemap import Basemap
 import matplotlib.pyplot as plt
 from matplotlib.patches import Polygon
 from matplotlib.collections import PatchCollection
 import matplotlib.cm as cm
 import matplotlib as mpl
 import numpy as np

 norm = mpl.colors.Normalize(vmin=MN, vmax=MX)
 cmap = cm.coolwarm
 colors=cm.ScalarMappable(norm=norm, cmap=cmap)
 colors.set_array(data)
 a = np.zeros([32,4])
 a = colors.to_rgba(data)

 fig = plt.figure(figsize=(10,10))
 ax = fig.add_subplot(111)

 m = Basemap(projection='robin', lon_0=0,resolution='c')
 m.drawmapboundary(fill_color='white', zorder=-1)
 m.drawparallels(np.arange(-90.,91.,30.), labels=[1,0,0,1], dashes=[1,1], linewidth=0.25, color='0.5',fontsize=14)
 m.drawmeridians(np.arange(0., 360., 60.), labels=[1,0,0,1], dashes=[1,1], linewidth=0.25, color='0.5',fontsize=14)

 year = [1990,2005,2010,2015,2020,2025,2030,2035,2040,2045,2050,2055,2060,2065,2070,2075,2080,2085,2090,2095,2100]
 GCAM_32 = ['PRI','USA','VIR']
 GCAM_1 = ['BDI','COM','DJI','ERI','ETH','KEN','MDG','MUS','REU','RWA','SDS','SDN','SOM','UGA','SOL']
 GCAM_2 = ['DZA','EGY','ESH','LBY','MAR','TUN','SAH']
 GCAM_3 = ['AGO','BWA','LSO','MOZ','MWI','NAM','SWZ','TZA','ZMB','ZWE']
 GCAM_4 = ['BEN','BFA','CAF','CIV','CMR','COD','COG','CPV','GAB','GHA','GIN','GMB','GNB','GNQ','LBR','MLI','MRT','NER','NGA','SEN','SLE','STP','TCD','TGO']
 GCAM_6 = ['AUS','NZL']
 GCAM_7 = ['BRA']
 GCAM_8 = ['CAN']
 GCAM_9 = ['ABW','AIA','ANT','ATG','BHS','BLZ','BMU','BRB','CRI','CUB','CYM','DMA','DOM','GLP','GRD','GTM','HND','HTI','JAM','KNA','LCA','MSR','MTQ','NIC','PAN','SLV','TTO','VCT']
 GCAM_10 = ['ARM','AZE','GEO','KAZ','KGZ','MNG','TJK','TKM','UZB']
 GCAM_11 = ['CHN','HKG','MAC']
 GCAM_13 = ['BGR','CYP','CZE','EST','HUN','LTU','LVA','MLT','POL','ROM','SVK','SVN']
 GCAM_14 = ['AND','AUT','BEL','CHI','DEU','DNK','ESP','FIN','FLK','FRA','FRO','GBR','GIB','GRC','GRL','IMN','IRL','ITA','LUX','MCO','NLD','PRT','SHN','SMR','SPM','SWE','TCA','VAT','VGB','WLF']
 GCAM_15 = ['BLR','MDA','UKR']
 GCAM_16 = ['ALB','BIH','HRV','MKD','MNE','SCG','SRB','TUR','YUG']
 GCAM_17 = ['CHE','ISL','LIE','NOR','SJM']
 GCAM_18 = ['IND']
 GCAM_19 = ['IDN']
 GCAM_20 = ['JPN']
 GCAM_21 = ['MEX']
 GCAM_22 = ['ARE','BHR','IRN','IRQ','ISR','JOR','KWT','LBN','OMN','PSE','QAT','SAU','SYR','YEM']
 GCAM_23 = ['PAK']
 GCAM_24 = ['RUS']
 GCAM_25 = ['ZAF']
 GCAM_26 = ['GUF','GUY','SUR','VEN']
 GCAM_27 = ['BOL','CHL','ECU','PER','PRY','URY']
 GCAM_28 = ['AFG','ASM','BGD','BTN','LAO','LKA','MDV','NPL']
 GCAM_29 = ['KOR']
 GCAM_30 = ['BRN','CCK','COK','CXR','FJI','FSM','GUM','KHM','KIR','MHL','MMR','MNP','MYS','MYT','NCL','NFK','NIU','NRU','PCI','PCN','PHL','PLW','PNG','PRK','PYF','SGP','SLB','SYC','THA','TKL','TLS','TON','TUV','VNM','VUT','WSM']
 GCAM_31 = ['TWN']
 GCAM_5 = ['ARG']
 GCAM_12 = ['COL']

 list_list = [GCAM_1,GCAM_2,GCAM_3,GCAM_4,GCAM_5,GCAM_6,GCAM_7,GCAM_8,GCAM_9,GCAM_10,GCAM_11,GCAM_12,GCAM_13,GCAM_14,GCAM_15,GCAM_16,GCAM_17,GCAM_18,GCAM_19,GCAM_20,GCAM_21,GCAM_22,GCAM_23,GCAM_24,GCAM_25,GCAM_26,GCAM_27,GCAM_28,GCAM_29,GCAM_30,GCAM_31,GCAM_32]
 m.readshapefile('ne_110m_admin_0_countries_lakes/ne_110m_admin_0_countries_lakes','comarques')
 num = len(list_list)
 for info, shape in zip(m.comarques_info,m.comarques):
 for i in range(num):
 if info['adm0_a3'] in list_list[i]:
 patches1 = []
 patches1.append( Polygon(np.array(shape), True) )
 ax.add_collection(PatchCollection(patches1,facecolor=a[i,:],edgecolor='k',linewidths=1.,zorder=2))
 ax.set_title('Policy Cost',fontsize=25,y=1.01)#GDP Adjusted Policy Cost#Policy Cost#Policy Cost Reduction from Technology
 plt.annotate('%s'%year[idx],xy=(0.1,0.2),xytext=(0.1,0.2),xycoords='axes fraction',fontsize=30)
 cb = m.colorbar(colors,'right')
 cb.ax.tick_params(labelsize=14)
 filename = &amp;quot;out/map_%s.png&amp;quot; %(str(idx).rjust(3,&amp;quot;0&amp;quot;))
 plt.show()
 fig.savefig(filename)
 return

The function’s name is world_plot and it’s inputs are:

  1. The raw data for a specific time step.
  2. The index of the time step for the map we are working with (e.g. idx=0 for 2020).
  3. The minimum and maximum of the data across countries and time.

(1) is plotted, (2) is used to name the resulting png figure (line 73), and (3) is used to scale the color colormap (line 11).  On lines 2-8 we import the necessary Python packages, which in this case are pretty standard Matplotlib packages and numpy.  On lines 11-16 we generate a numpy array which contains the rgba color code for each of the data points in data.  In lines 18-19 we create the pyplot figure object.

On lines 21-24 we create and format the Basemap object.  Note that on line 21 I’ve selected the Robinson projection, but that the Basemap provides many options.

Lines 26-60 are specific for this application, and certainly could have been handled more compactly if I wanted to invest the time.  year is a list of time steps for our GCAM experiment, and lines 27-58 contain lists of three letter ID codes for each GCAM region, which are assembled into a list of lists (creatively called list_list) on line 60.

On line 61 we read the data from the shapefile database which was downloaded from Natural Earth. From lines 63-68 we loop through the info and shape attributes of the shapefile database, and determine which of the GCAM geo-political units each of the administrative units in the database is associated with.  Once this is determined, the polygon associated with that administrative unit is given the correct color (lines 66-68).

Lines 69-72 are doing some final formatting and labeling, and in lines 73-75 we are giving the file a unique name (tied to the time step plotted) and saving the images to some output directory.

When we put this function into a loop over time, we generate a sequence of figures looking something like this:

test_017

To convert this sequence of PNGs to a gif file, we use the convert command in linux (or in my case Cygwin).  So, we go to the command line and cd into the directory where we’ve saved our figures and type:

convert -delay 45 -loop 0 *.png globe_Cost_Reduction_faster.gif

Here the delay flag controls the framerate of the gif (in milliseconds), the loop flag controls whether the gif repeats, next I’m using a wildcat to include all of the pngs in the output directory, and the final input is the resulting name of the gif. The final product:

globe_GDP_Cost_Low_faster