Profiling C++ code with Callgrind

Often times, we have to write code to perform tasks whose complexity vary from mundane, such as retrieving and organizing data, to highly complex, such as simulations CFD simulations comprising the spine of a project. In either case, depending on the complexity of the task and amount of data to be processed, it may happen for the newborn code to leave us staring at an underscore marker blinking gracefully for hours on a command prompt during its execution until the results are ready, leading to project schedule delays and shortages of patience. Two standard and preferred approaches to the problem of time intensive codes are to simplify the algorithm and to make the code more efficient. In order to better select the parts of the code to work on, it is often useful to first find the parts of the code in which more time by profiling the code.

In this post, I will show how to use Callgrind, part of Valgrind, and KCachegrind to profile C/C++ codes on Linux — unfortunately, Valgrind is not available for Windows or Mac, although it can be ran on cluster from which results can be downloaded and visualized on Windows with QCachegrind. The first step is to install Valgrind and KCachegrind by typing the following commands in the terminal of a Debian based distribution, such as Ubuntu (equivalent yum commands area available for Red Hat based distributions):

$ sudo apt-get install valgrind
$ sudo apt-get install kcachegrind

Now that the required tools are installed, the next step is to compile your code with GCC/G++ (with a make file, cmake, IDE or by running the compiler directly from the terminal) and then run the following command in a terminal (type ctrl+shift+T to open the terminal):

$ valgrind --tool=callgrind path/to/your/compiled/program program_arguments

Callgrind will then run your program with some instrumentation added to its execution to measure time expenditures and cache use by each function in your code. Because of the instrumentation, Your code will take considerably longer to run under Callgrind than it typically would, so be sure to run a representative task that is as small as possible when profiling your code. During its execution, Callgrind will output a report similar to the one below on terminal itself:

==12345== Callgrind, a call-graph generating cache profiler
==12345== Copyright (C) 2002-2015, and GNU GPL'd, by Josef Weidendorfer et al.
==12345== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==12345== Command: path/to/your/compiled/program program_arguments
==12345==
==12345== For interactive control, run 'callgrind_control -h'.
IF YOUR CODE OUTPUTS TO THE TERMINAL, THE OUTPUT WILL BE SHOWN HERE.
==12345== 
==12345== Events    : Ir
==12345== Collected : 4171789731
==12345== 
==12345== I   refs:      4,171,789,731

The report above shows that it collected 4 billion events in order to generate the comprehensive report saved in the file callgrind.out.12345 — 12345 is here your process id, shown in the report above. Instead of submerging your soul into a sea of despair by trying to read the output file in a text editor, you should load the file into KCachegrind by typing:

$ kcachegrind calgrind.out.12345

You should now see a screen like the one below:

kcachegrind_initial.png

The screenshot above shows the profiling results for my code. The left panel shows the functions called by my code sorted by total time spent inside each function. Because functions call each other, callgrind shows two cost metrics as proxies for time spent in each function: Incl., showing the total cost of a function, and self, showing the time spent in each function itself discounting the callees. By clicking on “Self” to order to functions by the cost of the function itself, we sort the functions by the costs of their own codes, as shown below:

Untitled_sorted

Callgrind includes functions that are native to C/C++ in its analysis. If one of them appears in the highest positions of the left panel, it may be the case to try to use a different function or data structure that performs a similar task in a more efficient way. Most of the time, however, our functions are the ones in most of the top positions in the list. In the example above, we can see that a possible first step I can take to improve the time performance of my code is to make function “ContinuityModelROF::shiftStorage” more efficient. A few weeks ago, however, the function “ContinuityModel::continuityStep” was ranked first with over 30% of the cost, followed by a C++ map related function. I then replaced a map inside that function by a pointer vector, resulting in the drop of my function’s cost to less than 5% of the total cost of the code.

In case KCachegrind shows that a given function that is called from multiple places in the code is costly, you may want to know which function is the main culprit behind the costly calls. To do this, click on the function of interest (in this case, “_memcpy_sse2_unalight”) in the left panel, and then click on “Callers” in the right upper panel and on “Call Graph” in the lower right panel. This will show in list and graph forms the calls made to the function by other functions, and the asociated percent costs. Unfortunately, I have only the function “ContinuityModelROF::calculateROF” calling “_memcpy_sse2_unalight,” hence the simple graph, but the graph would be more complex if multiple functions made calls to “_memcpy_sse2_unalight.”

I hope this saves you at least the time spend reading this post!

Map making in Matlab

Map making in Matlab

Greetings,

This weeks post will cover the basics of generating maps in Matlab.  Julie’s recent post showed how to do some of this in Python, but, Matlab is also widely used by the community.  You can get a lot done with Matlab, but in this post we’ll just cover a few of the basics.

We’ll start off by plotting a map of the continental United States, with the states.  We used three  this with three commands: usamap, shaperead, and geoshow.  usamap creates an empty map axes having the Lambert Projection covering the area of the US, or any state or collection of states.  shaperead reads shapefiles (duh) and returns a Matlab geographic data structure, composed of both geographic data and attributes.  This Matlab data structure then interfaces really well with various Matlab functions (duh).  Finally, geoshow plots geographic data, in our case on the map axes we defined.  Here’s some code putting it all together.

hold on
figure1 = figure;
ax = usamap('conus');

set(ax, 'Visible', 'off')
latlim = getm(ax, 'MapLatLimit');
lonlim = getm(ax, 'MapLonLimit');
states = shaperead('usastatehi',...
 'UseGeoCoords', true, 'BoundingBox', [lonlim', latlim']);
geoshow(ax, states, 'FaceColor', [0.5 0.5 0.5])
tightmap
hold off

Note that ‘usastatehi’ is a shapefile containing the US states (duh) that’s distributed with Matlab. The above code generates this figure:

conus_blank

Now, suppose we wanted to plot some data, say a precipitation forecast, on our CONUS map.  Let’s assume our forecast is being made at many points (lat,long).  To interpolate between the points for plotting we’ll use Matlab’s griddata function.  Once we’ve done this, we use the Matlab’s contourm command.  This works exactly like the normal contour function, but the ‘m’ indicates it plots map data.

xi = min(x):0.5:max(x);
yi = min(y):0.5:max(y);
[XI, YI] = meshgrid(xi,yi);
ZI = griddata(x,y,V,XI,YI);

hold on
figure2 = figure;
ax = usamap('conus');

set(ax, 'Visible', 'off')
latlim = getm(ax, 'MapLatLimit');
lonlim = getm(ax, 'MapLonLimit');
states = shaperead('usastatehi',...
 'UseGeoCoords', true, 'BoundingBox', [lonlim', latlim']);
geoshow(ax, states, 'FaceColor', [0.5 0.5 0.5])

contourm(YI,-1*XI,ZI)
tightmap
hold off

Here x, y, and V are vectors of long, lat, and foretasted precipitation respectively.  This code generates the following figure:

conus_contour

Wow!  Louisiana is really getting hammered!  Let’s take a closer look.  We can do this by changing the entry to usamap to indicate we want to consider only Louisiana.  Note, usamap accepts US postal code abbreviations.

ax = usamap('LA');

Making that change results in this figure:

LA_contour

Neat!  We can also look at two states and add annotations.  Suppose, for no reason in particular, you’re interested in the location of Tufts University relative to Cornell.  We can make a map to look at this with the textm and scatterm functions.  As before, the ‘m’ indicates the functions  plot on a map axes.

hold on
figure4 = figure;
ax = usamap({'MA','NY'});

set(ax, 'Visible', 'off')
latlim = getm(ax, 'MapLatLimit');
lonlim = getm(ax, 'MapLonLimit');
states = shaperead('usastatehi',...
 'UseGeoCoords', true, 'BoundingBox', [lonlim', latlim']);
geoshow(ax, states, 'FaceColor', [0.5 0.5 0.5])
scatterm(42.4075,-71.1190,100,'k','filled')
textm(42.4075+0.2,-71.1190+0.2,'Tufts','FontSize',30)

scatterm(42.4491,-76.4842,100,'k','filled')
textm(42.4491+0.2,-76.4842+0.2,'Cornell','FontSize',30)
tightmap
hold off

This code generates the following figure.

Cornell_Tufts

Cool! Now back to forecasts.  NOAA distributes short term Quantitative Precipitation Forecasts (QPFs) for different durations every six hours.  You can download these forecasts in the form of shapefiles from a NOAA server.  Here’s an example of a 24-hour rainfall forecast made at 8:22 AM UTC on April 29.

fill_94qwbg

Wow, that’s a lot of rain!  Can we plot our own version of this map using Matlab!  You bet!  Again we’ll use usamap, shaperead, and geoshow.  The for loop, (0,1) scaling, and log transform are simply to make the color map more visually appealing for the post.  There’s probably a cleaner way to do this, but this got the job done!

figure5 = figure;
ax = usamap('conus');
S=shaperead('94q2912','UseGeoCoords',true);

set(ax, 'Visible', 'off')
latlim = getm(ax, 'MapLatLimit');
lonlim = getm(ax, 'MapLonLimit');
states = shaperead('usastatehi',...
 'UseGeoCoords', true, 'BoundingBox', [lonlim', latlim']);
geoshow(ax, states, 'FaceColor', [0.5 0.5 0.5])
p = colormap(jet);

N = max(size(S));
d = zeros(N,1);
for i = 1:N
 d(i) = log(S(i).QPF);
end

y=floor(((d-min(d))/range(d))*63)+1;
col = p(y,:);
for i = 1:N
 geoshow(S(i),'FaceColor',col(i,:),'FaceAlpha',0.5)%,'SymbolSpec', faceColors)
end

This code generates the following figure:

conus_shape

If you are not plotting in the US, Matlab also has a worldmap command.  This works exactly the same as usamap, but now for the world (duh).  Matlab is distibuted with a shapefile ‘landareas.shp’ which contains all of the land areas in the world (duh).  Generating a global map is then trivial:

figure6 = figure;

worldmap('World')
land = shaperead('landareas.shp', 'UseGeoCoords', true);
geoshow(land, 'FaceColor', [0.15 0.5 0.15])

Which generates this figure.

globe

 

Matlab also comes with a number of other included that might be of interest.  For instance, shapefiles detailing the locations of major world cities, lakes, and rivers.  We can plot those with the following code:

figure7 = figure;

worldmap('World')
land = shaperead('landareas.shp', 'UseGeoCoords', true);
geoshow(land, 'FaceColor', [0.15 0.5 0.15])
lakes = shaperead('worldlakes', 'UseGeoCoords', true);
geoshow(lakes, 'FaceColor', 'blue')
rivers = shaperead('worldrivers', 'UseGeoCoords', true);
geoshow(rivers, 'Color', 'blue')
cities = shaperead('worldcities', 'UseGeoCoords', true);
geoshow(cities, 'Marker', '.', 'Color', 'red')

Which generates the figure:

globe_river

But suppose we’re interested in one country or a group of countries.  worldmap works in the same usamap does.  Also, you can plot continents, for instance Europe.

worldmap('Europe')

Europe.png

Those are the basics, but there are many other capabilities, including 3-D projections. I can cover this in a later post if there is interest.

contour

That’s it for now!

Using Borg in Parallel and Serial with a Python Wrapper – Part 2

This blog post is Part 2 of a two-part series that will demonstrate how I have coupled a pure Python simulation model with the Borg multi-objective evolutionary algorithm (MOEA). I recommend reading Part 1 of this series before you read Part 2. In Part 1, I explain how to get Borg and provide sample code showing how you can access Borg’s serial and/or parallelized (master-slave) implementations through a Python wrapper (borg.py). In Part 2, I provide details for more advanced simulation-optimization setups that require you pass additional information from the borg wrapper into the simulation model (the “function evaluation”) other than just decision variable values.

In Part 1, the example simulation model I use (PySedSim) is called through a function handle “Simulation_Caller” in the example_sim_opt.py file. Borg needs only this function handle to properly call the simulation model in each “function evaluation”. Borg’s only interaction with the simulation model is to pass the simulation model’s function handle (e.g., “Simulation_Caller”) the decision variables, and nothing else. In many circumstances, this is all you need.

However, as your simulation-optimization setup becomes more complex, in order for your simulation model (i.e., the function evaluation) to execute properly, you may need to pass additional arguments to the simulation model from Borg other than just the decision variables. For example, in my own use of Borg in a simulation-optimization setting, in order to do optimization I first import a variety of assumptions and preferences to set up a Borg-PySedSim run. Some of those assumptions and preferences are helpful to the simulation model (PySedSim) in determining how to make use of the decision variable values Borg feeds it. So, I would like to pass those relevant assumptions and preferences directly into the Borg wrapper (borg.py), so the wrapper can in turn pass them directly into the simulation model along with the decision variable values.

Before I show how to do this, let me provide a more concrete example of how/why I am doing this in my own research. In my current work, decision variable values represent parameters for a reservoir operating policy that is being optimized. The simulation model needs to know how to take the decision variable values and turn them into a useful operating policy that can be simulated. Some of this information gets imported in order to run Borg, so I might as well pass that information directly into the simulation model while I have it on hand, rather than importing it once again in the simulation model.

To do what I describe above, we just need to modify the two functions in the example_sim_opt.py module so that a new argument “additional_inputs” is passed from borg to the simulation handle.  Using my python code from blog post 1, I provide code below that is modified in the Simulation_Caller() function on lines 5, 21, 22 and 27; and in the Optimization() function on lines 55, 56 and 70. After that code, I then indicate how I modify the borg.py wrapper so it can accept this information.

import numpy as np
import pysedsim # This is your simulation model
import platform  # helps identify directory locations on different types of OS

def Simulation_Caller(vars, additional_inputs):
    '''
    Purpose: Borg calls this function to run the simulation model and return multi-objective performance.

    Note: You could also just put your simulation/function evaluation code here.

    Args:
        vars: A list of decision variable values from Borg
        additional_inputs: A list of python data structures you want to pass from Borg into the simulation model.
    Returns:
        performance: policy's simulated objective values. A list of objective values, one value each of the objectives.
    '''

    borg_vars = vars  # Decision variable values from Borg

    # Unpack lists of additional inputs from Borg (example assumes additional inputs is a python list with two items)
    borg_dict_1 = additional_inputs[0]
    borg_dict_2 = additional_inputs[1]

    # Reformat decision variable values as necessary (.e.g., cast borg output parameters as array for use in simulation)
    op_policy_params = np.asarray(borg_vars)
    # Call/run simulation model with decision vars and additional relevant inputs, return multi-objective performance:
    performance = pysedsim.PySedSim(decision_vars = op_policy_params, sim_addl_input_1 = borg_dict_1, sim_addl_input_2 = borg_dict_2)
    return performance

def Optimization():

    '''

    Purpose: Call this method from command line to initiate simulation-optimization experiment

    Returns:
        --pareto approximate set file (.set) for each random seed
        --Borg runtime file (.runtime) for each random seed

    '''

    import borg as bg  # Import borg wrapper

    parallel = 1  # 1= master-slave (parallel), 0=serial

    # The following are just examples of relevant MOEA specifications. Select your own values.
    nSeeds = 25  # Number of random seeds (Borg MOEA)
    num_dec_vars = 10  # Number of decision variables
    n_objs = 6  # Number of objectives
    n_constrs = 0  # Number of constraints
    num_func_evals = 30000  # Number of total simulations to run per random seed. Each simulation may be a monte carlo.
    runtime_freq = 1000  # Interval at which to print runtime details for each random seed
    decision_var_range = [[0, 1], [4, 6], [-1,4], [1,2], [0,1], [0,1], [0,1], [0,1], [0,1], [0,1]]
    epsilon_list = [50000, 1000, 0.025, 10, 13, 4]  # Borg epsilon values for each objective
    borg_dict_1 = {'simulation_preferences_1': [1,2]}  # reflects data you want Borg to pass to simulation model
    borg_dict_2 = {'simulation_preferences_2': [3,4]}  # reflects data you want Borg to pass to simulation model

    # Where to save seed and runtime files
    main_output_file_dir = 'E:\output_directory'  # Specify location of output files for different seeds
    os_fold = Op_Sys_Folder_Operator()  # Folder operator for operating system
    output_location = main_output_file_dir + os_fold + 'sets'

    # If using master-slave, start MPI. Only do once.
    if parallel == 1:
        bg.Configuration.startMPI()  # start parallelization with MPI

    # Loop through seeds, calling borg.solve (serial) or borg.solveMPI (parallel) each time
    for j in range(nSeeds):
        # Instantiate borg class, then set bounds, epsilon values, and file output locations
        borg = bg.Borg(num_dec_vars, n_objs, n_constrs, Simulation_Caller, add_sim_inputs = [borg_dict_1, borg_dict_2])
        borg.setBounds(*decision_var_range)  # Set decision variable bounds
        borg.setEpsilons(*epsilon_list)  # Set epsilon values
        # Runtime file path for each seed:
        runtime_filename = main_output_file_dir + os_fold + 'runtime_file_seed_' + str(j+1) + '.runtime'
        if parallel == 1:
            # Run parallel Borg
            result = borg.solveMPI(maxEvaluations='num_func_evals', runtime=runtime_filename, frequency=runtime_freq)

        if parallel == 0:
            # Run serial Borg
            result = borg.solve({"maxEvaluations": num_func_evals, "runtimeformat": 'borg', "frequency": runtime_freq,
                                 "runtimefile": runtime_filename})

        if result:
            # This particular seed is now finished being run in parallel. The result will only be returned from
            # one node in case running Master-Slave Borg.
            result.display()

            # Create/write objective values and decision variable values to files in folder "sets", 1 file per seed.
            f = open(output_location + os_fold + 'Borg_DPS_PySedSim' + str(j+1) + '.set', 'w')
            f.write('#Borg Optimization Results\n')
            f.write('#First ' + str(num_dec_vars) + ' are the decision variables, ' + 'last ' + str(n_objs) +
                    ' are the ' + 'objective values\n')
            for solution in result:
                line = ''
                for i in range(len(solution.getVariables())):
                    line = line + (str(solution.getVariables()[i])) + ' '

                for i in range(len(solution.getObjectives())):
                    line = line + (str(solution.getObjectives()[i])) + ' '

                f.write(line[0:-1]+'\n')
            f.write("#")
            f.close()

            # Create/write only objective values to files in folder "sets", 1 file per seed. Purpose is so that
            # the file can be processed in MOEAFramework, where performance metrics may be evaluated across seeds.
            f2 = open(output_location + os_fold + 'Borg_DPS_PySedSim_no_vars' + str(j+1) + '.set', 'w')
            for solution in result:
                line = ''
                for i in range(len(solution.getObjectives())):
                    line = line + (str(solution.getObjectives()[i])) + ' '

                f2.write(line[0:-1]+'\n')
            f2.write("#")
            f2.close()

            print("Seed %s complete") %j

    if parallel == 1:
        bg.Configuration.stopMPI()  # stop parallel function evaluation process

def Op_Sys_Folder_Operator():
    '''
    Function to determine whether operating system is (1) Windows, or (2) Linux

    Returns folder operator for use in specifying directories (file locations) for reading/writing data pre- and
    post-simulation.
    '''

    if platform.system() == 'Windows':
        os_fold_op = '\\'
    elif platform.system() == 'Linux':
        os_fold_op = '/'
    else:
        os_fold_op = '/'  # Assume unix OS if it can't be identified

    return os_fold_op
 

 

Next, you will need to acquire the Borg wrapper using the instructions I specified in my previous blog post. You will need to make only two modifications: (1) modify the Borg class in borg.py so it accepts the inputs you want to pass to the simulation; and (2) some additional internal accounting in borg.py to ensure those inputs are passed to the borg.py methods that deal with your function handle. I will address these two in order.

First, modify the Borg class in borg.py so it now accepts an additional input (I only show some of the borg.py code here, just to indicate where changes are being made):


class Borg:
    def __init__(self, numberOfVariables, numberOfObjectives, numberOfConstraints, function, epsilons = None, bounds = None, directions = None, add_sim_inputs=None):

    # add_sim_inputs is the new input you will pass to borg

 

Then, modify the portion of the borg.py wrapper where self.function is called, so it can accommodate any simulation inputs you have specified.


if add_pysedsim_inputs is None:
    self.function = _functionWrapper(function, numberOfVariables, numberOfObjectives, numberOfConstraints, directions)
else:
    # More simulation inputs are specified and can be passed to the simulation handle
    self.function = _functionWrapper(function, numberOfVariables, numberOfObjectives, numberOfConstraints, directions, addl_inputs=add_sim_inputs)

After the above, the last step is to modify the _functionWrapper method in borg.py:


def _functionWrapper(function, numberOfVariables, numberOfObjectives, numberOfConstraints, directions=None, addl_inputs=None):
    # addl_inputs will be passed into the simulation model
    def innerFunction(v,o,c):
    global terminate
    try:
        if addl_inputs is None:
            result = function(*[v[i] for i in range(numberOfVariables)])
        else:
            result = function([v[i] for i in range(numberOfVariables)], addl_inputs)

Making Watershed Maps in Python

This post builds off of earlier posts by Jon Lamontagne and Jon Herman on making global maps in Python using matplotlib and basemap. However rather than making a global map, I’ll show how to zoom into a particular region, here the Red River basin in East Asia. To make these maps, you’ll need to have basemap installed (from github here, or using a Windows installer here).

The first step is to create a basemap. Both Jons used the ‘robin’ global projection to do this in their posts. Since I’m only interested in a particular region, I just specify the bounding box using the lower and upper latitudes and longitudes of the region I’d like to plot. As Jon H points out, you can also specify the resolution (‘f’ = full, ‘h’ =high, ‘i’ = intermediate, ‘l’ = low, ‘c’ = crude), and you can even use different ArcGIS images for the background (see here). I use ‘World_Shaded_Relief’. It’s also possible to add a lot of features such as rivers, countries, coastlines, counties, etc. I plot countries and rivers. The argument ‘zorder’ specifies the order of the layering from 1 to n, where 1 is the bottom layer and n the top.


from mpl_toolkits.basemap import Basemap
from matplotlib import pyplot as plt

fig = plt.figure()
fig.set_size_inches([17.05,8.15])
ax = fig.add_subplot(111)

# plot basemap, rivers and countries
m = basemap(llcrnrlat=19.5, urcrnrlat=26.0, llcrnrlon=99.6, urcrnr=107.5, resolution='h')
m.arcgisimage(service='World_Shaded_Relief')
m.drawrivers(color='dodgerblue',linewidth=1.0,zorder=1)
m.drawcountries(color='k',linewidth=1.25)

The above code makes the following image (it takes some time, since I’m using high resolution):

Now let’s add a shaded outline of the Red River basin. To do this, you need a shapefile of the basin. The FAO provides a shapefile of major watersheds in the world, from which you can extract the watershed you’re interested in using ArcGIS (see instructions here). In this shapefile, the Red River is labeled by its name in Vietnamese, ‘Song Hong.’ I chose not to draw the bounds of the basin in my map because it would be too busy with the country borders. Instead, I shaded the region gray (facecolor=’0.33′) with a slightly darker border (edgecolor=’0.5′) and slight transparency (alpha=0.5). To do that, I had to collect all of the patches associated with the shapefile (which I called ‘Basin’ when reading it in) that needed to be shaded.


from matplotlib.patches import Polygon
from matplotlib.collections import Patch Collection

# plot Red River basin
m.readshapefile('RedRiverBasin_WGS1984', 'Basin', drawbounds=False)
patches = []
for info, shape in zip(m.Basin_info, m.Basin):
    if info['OBJECTID'] == 1: # attribute in attribute table of shapefile
        patches.append(Polygon(np.array(shape), True))

ax.add_collection(PatchCollection(patches, facecolor='0.33', edgecolor='0.5', alpha=0.5))

This creates the following image:

Now let’s add the locations of major dams and cities in the basin using ‘scatter‘. You could again do this by adding a shapefile, but I’m just going to add their locations manually, either by uploading their latitude and longitude coordinates from a .csv file or by passing them directly.


import numpy as np

# plot dams
damsLatLong = np.loadtxt('DamLocations.csv', delimiter=',', skiprows=1, usecols=[1,2])
x, y = m(damsLatLong[:,1], damLatLong[:,0]) # m(longitude, latitude)
m.scatter(x, y, c='k', s = 150, marker = '^')

# plot Hanoi
x, y = m(105.8342, 21.0278)
m.scatter(x, y, facecolor='darkred', edgecolor='darkred', s=150)

This makes the following image:

If we want to label the dams and cities, we can add text specifying where on the map we’d like them to be located. This may require some guess-and-check work to determine the best place (comment if you know a better way!). I temporarily added gridlines to the map to aid in this process using ‘drawparallels‘ and ‘drawmeridians‘.


# label dams and Hanoi
plt.text(104.8, 21.0, 'Hoa Binh', fontsize=18, ha='center', va='center', color='k')
plt.text(104.0, 21.7, 'Son La', fontsize=18, ha='center', va='center', color='k')
plt.text(105.0, 21.95, 'Thac Ba', fontsize=18, ha='center', va='center', color='k')
plt.text(105.4, 22.55, 'Tuyen Quang', fontsize=18, ha='center', va='center', color='k')
plt.text(105.8, 21.2, 'Hanoi', fontsize=18, ha='center', va='center', color='k')

Now our map looks like this:

That looks nice, but it would be helpful to add some context as to where in the world the Red River basin is located. To illustrate this, we can create an inset of the greater geographical area by adding another set of axes with its own basemap. This one can be at a lower resolution.


from mpl_toolkits.axes_grid1.inset_locator import zoomed_inset_axes

# plot inset of greater geographic area
axins = zoomed_inset_axes(ax, 0.1, loc=1) # locations
axins.set_xlim(90, 115) # longitude boundaries of inset map
axins.set_ylim(8, 28) # latitude boundaries of inset map

# remove tick marks from inset axes
plt.xticks(visible=False)
plt.yticks(visible=False)

# add basemap to inset map
m2 = Basemap(llcrnrlat=8.0, urcrnclat=28.0, llcrnr=90.0, urcrnrlon=115.0, resolution='l', ax=axins)
m2.arcgisimage(service='World_Shaded_Relief')
m2.drawcountries(color='k', linewidth=0.5)

This image looks like this:

Now let’s highlight a country of interest (Vietnam) in green and also add the Red River basin in light gray again.


# plot Vietnam green in inset
m2.readshapefile('VN_borders_only_WGS1984', 'Vietnam', drawbounds=False)
patches2 = []
for info, shape in zip(m2.Vietnam_info, m2.Vietnam):
    if info['Joiner'] == 1:
        patches2.append(Polygon(np.array(shape), True))

axins.add_collection(PatchCollection(patches2, facecolor='forestgreen', edgecolor='0.5', alpha=0.5))

# shade Red River basin gray in inset
axins.add_collection(PatchCollection(patches, faceolor='0.33', edgecolor='0.5', alpha=0.5)

Now our map looks like this:

Finally, let’s label the countries in the inset. Some of the countries are too small to fit their name inside, so we’ll have to create arrows pointing to them using ‘annotate‘. In this function, ‘xy’ specifies where the arrow points to and ‘xytext’ where the text is written relative to where the arrow points.


# label countries
plt.text(107.5, 25.5, 'China', fontsize=11, ha='center', va='center', color='k')
plt.text(102.5, 20.2, 'China', fontsize=11, ha='center', va='center', color='k')
plt.text(101.9, 15.5, 'China', fontsize=11, ha='center', va='center', color='k')
plt.text(9.5, 21.0, 'China', fontsize=11, ha='center', va='center', color='k')

# add arrows to label Vietnam and Cambodia 
plt.annotate('Vietnam', xy=(108.0, 14.0), xycoords='data', xytext=(5.0, 20.0), textcoords='offset points', \ 
    color='k', arrowprops=dict(arrowstyle='-'), fontsize=11)
plt.annotate('Cambodia', xy=(104.5, 12.0), xycoords='data', xytext=(-60.0, -25.0), textcoords='offset points', \ 
    color='k', arrowprops=dict(arrowstyle='-'), fontsize=11)

Now our map looks like this:

I think that’s pretty good, so let’s save it ;). See below for all the code used to make this map, with all the import statements at the beginning rather than sporadically inserted throughout the code!

If you’re looking for any other tips on how to make different types of maps using basemap, I recommend browsing through the basemap toolkit documentation and this basemap tutorial, where I learned how to do most of what I showed here.


from mpl_toolkits.basemap import Basemap
from mpl_toolkits.axes_grid1.inset_locator import zoomed_inset_axes
from matplotlib import pyplot as plt
from matplotlib.patches import Polygon
from matplotlib.collections import PatchCollection
import numpy as np

# set-up Vietnam basemap
fig = plt.figure()
fig.set_size_inches([17.05, 8.15])
ax = fig.add_subplot(111)

# plot basemap, rivers and countries
m = Basemap(llcrnrlat=19.5,urcrnrlat=26.0,llcrnrlon=99.6,urcrnrlon=107.5,resolution='h')
m.arcgisimage(service='World_Shaded_Relief')
m.drawrivers(color='dodgerblue',linewidth=1.0,zorder=1)
m.drawcountries(color='k',linewidth=1.25)

# plot Red River basin
m.readshapefile('RedRiverBasin_WGS1984','Basin',drawbounds=False)
patches = []
for info, shape in zip(m.Basin_info, m.Basin):
    if info['OBJECTID'] == 1:
        patches.append(Polygon(np.array(shape), True))

ax.add_collection(PatchCollection(patches, facecolor='0.33',edgecolor='0.5',alpha=0.5))

# plot dams
damsLatLong = np.loadtxt('DamLocations.csv',delimiter=',',skiprows=1,usecols=[1,2])
x, y = m(damsLatLong[:,1], damsLatLong[:,0])
m.scatter(x, y, c='k', s=150, marker='^')

# plot Hanoi
x, y = m(105.8342, 21.0278)
m.scatter(x, y, facecolor='darkred', edgecolor='darkred', s=150)

# label reservoirs and Hanoi
plt.text(104.8, 21.0, 'Hoa Binh', fontsize=18, ha='center',va='center',color='k')
plt.text(104.0, 21.7, 'Son La', fontsize=18, ha='center', va='center', color='k')
plt.text(105.0, 21.95, 'Thac Ba', fontsize=18, ha='center', va='center', color='k')
plt.text(105.4, 22.55, 'Tuyen Quang', fontsize=18, ha='center', va='center', color='k')
plt.text(105.8, 21.2, 'Hanoi', fontsize=18, ha='center', va='center', color='k')

# plot inset of greater geographic area
axins = zoomed_inset_axes(ax, 0.1, loc=1)
axins.set_xlim(90, 115)
axins.set_ylim(8,28)

plt.xticks(visible=False)
plt.yticks(visible=False)

m2 = Basemap(llcrnrlat=8.0,urcrnrlat=28.0,llcrnrlon=90.0,urcrnrlon=115.0,resolution='l',ax=axins)
m2.arcgisimage(service='World_Shaded_Relief')
m2.drawcountries(color='k',linewidth=0.5)

# plot Vietnam green in inset
m2.readshapefile('VN_borders_only_WGS1984','Vietnam',drawbounds=False)
patches2 = []
for info, shape in zip(m2.Vietnam_info, m2.Vietnam):
    if info['Joiner'] == 1:
        patches2.append(Polygon(np.array(shape), True))

axins.add_collection(PatchCollection(patches2, facecolor='forestgreen',edgecolor='0.5',alpha=0.5))

# shade Red River basin gray in inset
axins.add_collection(PatchCollection(patches, facecolor='0.33',edgecolor='0.5',alpha=0.5))

# label countries
plt.text(107.5, 25.5, 'China', fontsize=11, ha='center',va='center',color='k')
plt.text(102.5, 20.2, 'Laos', fontsize=11, ha='center', va='center', color='k')
plt.text(101.9, 15.5, 'Thailand', fontsize=11, ha='center', va='center', color='k')
plt.text(96.5, 21.0, 'Myanmar', fontsize=11, ha='center', va='center', color='k')

plt.annotate('Vietnam', xy=(108.0,14.0), xycoords='data', xytext=(5.0,20.0), textcoords='offset points', \
    color='k',arrowprops=dict(arrowstyle='-'),fontsize=11)
plt.annotate('Cambodia', xy=(104.5,12.0), xycoords='data', xytext=(-60.0,-25.0), textcoords='offset points', \
    color='k',arrowprops=dict(arrowstyle='-'),fontsize=11)

fig.savefig('RedRiverMap.png')
fig.clf()

A visual introduction to data compression through Principle Component Analysis

Principle Component Analysis (PCA) is a powerful tool that can be used to create parsimonious representations of a multivariate data set. In this post I’ll code up an example from Dan Wilks’ book Statistical Methods in the Atmospheric Sciences to visually illustrate the PCA process. All code can be found at the bottom of this post.

As with many of the examples in Dr. Wilks’ excellent textbook, we’ll be looking at minimum temperature data from Ithaca and Canandaigua, New York  (if anyone is interested, here is the distance between the two towns).  Figure 1 is a scatter plot of the minimum temperature anomalies at each location for the month of January 1987.

raw_data

Figure 1: Minimum temperature anomalies in Ithaca and Canandaigua, New York in January 1987

As you can observe from Figure 1, the two data sets are highly correlated, in fact, they have a Pearson correlation coefficient of 0.924. Through PCA, we can identify the primary mode of variability within this data set (its largest “principle component”) and use it to create a single variable which describes the majority of variation in both locations. Let x define the matrix of our minimum temperature anomalies in both locations. The eigenvectors (E) of the covariance matrix of x describe the primary modes variability within the data set. PCA uses these eigenvectors to  create a new matrix, u,  whose columns contain the principle components of the variability in x.

u = xE

Each element in u is a linear combination of the original data, with eigenvectors in E serving as a kind of weighting for each data point. The first column of u corresponds to the eigenvector associated with the largest eigenvalue of the covariance matrix. Each successive column of u represents a different level of variability within the data set, with u1 describing the direction of highest variability, u2 describing the direction of the second highest variability and so on and so forth. The transformation resulting from PCA can be visualized as a rotation of the coordinate system (or change of basis) for the data set, this rotation is shown in Figure 2.

PCA visualization

Figure 2: Geometric interpretation of PCA

As can be observed in Figure 2, each data point can now be described by its location along the newly rotated axes which correspond to its corresponding value in the newly created matrix u. The point (16, 17.8), highlighted in Figure 2, can now be described as (23, 6.6) meaning that it is 23 units away from the origin in the direction of highest variability and 6.6 in the direction of second highest variability. As shown in Figure 2, the question of “how different from the mean” each data point is can mostly be answered by looking at its  corresponding u1 value.

Once transformed, the original data can be recovered through a process known as synthesis. Synthesis  can be thought of as PCA in reverse. The elements in the original data set x, can be approximated using the eigenvalues of the covariance matrix and the first principle component, u1.

\tilde{x} = \tilde{u}\tilde{E}^T

Where:

\tilde{x}\hspace{.1cm} is\hspace{.1cm} the\hspace{.1cm} reconstructed\hspace{.1cm} data\hspace{.1cm} set

\tilde{u}\hspace{.1cm} is\hspace{.1cm} the\hspace{.1cm} PCs\hspace{.1cm} used \hspace{.1cm} for \hspace{.1cm} reconstruction\hspace{.1cm} (in\hspace{.1cm} our\hspace{.1cm} case\hspace{.1cm} the\hspace{.1cm} first\hspace{.1cm} column)

\tilde{E}\hspace{.1cm} is \hspace{.1cm} the \hspace{.1cm} eigenvector\hspace{.1cm} of \hspace{.1cm} the \hspace{.1cm} PCs \hspace{.1cm} used

For our data set, these reconstructions seem to work quite well, as can be observed in Figure 3.

final_synth.png

 

 

Data compression through PCA can be a useful alternative tolerant methods for dealing with multicollinearity, which I discussed in my previous post. Rather than running a constrained regression, one can simply compress the data set to eliminate sources of multicollinearity. PCA can also be a helpful tool for identifying patterns within your data set or simply creating more parsimonious representations of a complex set of data. Matlab code used to create the above plots can be found below.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Ithaca_Canandagua_PCA
% By: D. Gold
% Created: 3/20/17
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% This script will perform Principle Component analysis on minimum
% temperature data from Ithaca and Canadaigua in January, 1987 provided in 
% Appendix A of Wilks (2011). It will then estimate minimum temperature
% values of both locations using the first principle component.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% create data sets
clear all

% data from appendix Wilks (2011) Appendix A.1
Ith = [19, 25, 22, -1, 4, 14, 21, 22, 23, 27, 29, 25, 29, 15, 29, 24, 0,... 
 2, 26, 17, 19, 9, 20, -6, -13, -13, -11, -4, -4, 11, 23]';

Can = [28, 28, 26, 19, 16, 24, 26, 24, 24, 29, 29, 27, 31, 26, 38, 23,... 
 13, 14, 28, 19, 19, 17, 22, 2, 4, 5, 7, 8, 14, 14, 23]';

%% center the data, plot temperature anomalies, calculate covariance and eigs

% center the data
x(:,1) = Ith - mean(Ith);
x(:,2) = Can - mean(Can);

% plot anomalies
figure
scatter(x(:,1),x(:,2),'Filled')
xlabel('Ithaca min temp anomaly ({\circ}F)')
ylabel('Canandagua min temp anomaly ({\circ}F)')

% calculate covariance matrix and it's corresponding Eigenvalues & vectors
S = cov(x(:,1),x(:,2));
[E, Lambda]=eigs(S);

% Identify maximum eigenvalue, it's column will be the first eigenvector
max_lambda = find(max(Lambda)); % index of maximum eigenvalue in Lambda
idx = max_lambda(2); % column of max eigenvalue

%% PCA
U = x*E(:,idx);

%% synthesis
x_syn = E(:,idx)*U'; % reconstructed values of x

% plot the reconstructed values against the original data
figure
subplot(2,1,1)
plot(1:31,x(:,1) ,1:31, x_syn(1,:),'--')
xlim([1 31])
xlabel('Time (days)')
ylabel('Ithaca min. temp. anomalies ({\circ}F)')
legend('Original', 'Reconstruction')
subplot(2,1,2)
plot(1:31, x(:,2),1:31, x_syn(2,:)','--')
xlim([1 31])
xlabel('Time (days)')
ylabel('Canadaigua min. temp. anomalies ({\circ}F)')
legend('Original', 'Reconstruction')

 

Sources:

Wilks, D. S. (2011). Statistical methods in the atmospheric sciences. Amsterdam: Elsevier Academic Press.

Using Borg in Parallel and Serial with a Python Wrapper – Part 1

Simulation and optimization are frequently used to solve complex water resources and environmental systems problems. By itself, a simulation model begs the question “what to simulate?” Similarly, by itself, an optimization model begs the question “is the solution really best?” For this reason, simulation and optimization models are frequently coupled.

This blog post is part 1 of a multi-part series that will demonstrate how I have coupled a pure Python simulation model with the multi-objective evolutionary optimization algorithm Borg. In this post, I will show how you can access Borg’s serial and/or parallelized (master-slave) implementations through a Python wrapper (borg.py).

Please see this previous blog post for some background about Borg, and how to obtain it. My instructions below assume you have access to the Borg files.

In the setup I will describe below, Borg parameterizes and iteratively refines solutions (e.g., reservoir operating policies) to a problem, optimizing them in response to their simulated performance with respect to multiple objectives.

You will need the following Borg files (see link above for how to download these):

  • Serial (i.e., borg.c, libborg.so, etc.) and/or master-slave (i.e., borgms.c, libborgms.so, etc.) implementations of Borg, depending upon your ability to parallelize.
  • Python wrapper for Borg (borg.py), which will allow you to to access Borg easily in Python.

You will need to create the following files yourself (I provide sample code below for these files):

  • example_sim_opt.py—A python module that should contain two main functions:
    1. A simulation caller, which takes decision variables and returns multi-objective performance. This function is called “Simulation_Caller” in the example_sim_opt.py code below.
    2. An optimization function, which calls the Borg MOEA through its python wrapper borg.py. This borg.py wrapper provides extensive docstring documentation regarding required arguments, returned values, etc., so I do suggest reading through the wrapper if you have questions (e.g., about the python data types of arguments and returns).

Note that the file and function names above are just example names. You can name the above files whatever you want. Just be sure to modify the code I provide below to reflect the new names.

A sample of code for example_sim_opt.py is as follows:

import numpy as np
import pysedsim # This is your simulation model
import platform  # helps identify directory locations on different types of OS

def Simulation_Caller(vars):
    '''
    Purpose: Borg calls this function to run the simulation model and return multi-objective performance.

    Note: You could also just put your simulation/function evaluation code here.

    Args:
        vars: A list of decision variable values from Borg

    Returns:
        performance: policy's simulated objective values. A list of objective values, one value each of the objectives.
    '''

    borg_vars = vars  # Decision variable values from Borg

    # Reformat decision variable values as necessary (.e.g., cast borg output parameters as array for use in simulation)
    op_policy_params = np.asarray(borg_vars)
    # Call/run simulation model, return multi-objective performance:
    performance = pysedsim.PySedSim(decision_vars = op_policy_params)
    return performance

def Optimization():

    '''

    Purpose: Call this method from command line to initiate simulation-optimization experiment

    Returns:
        --pareto approximate set file (.set) for each random seed
        --Borg runtime file (.runtime) for each random seed

    '''

    import borg as bg  # Import borg wrapper

    parallel = 1  # 1= master-slave (parallel), 0=serial

    # The following are just examples of relevant MOEA specifications. Select your own values.
    nSeeds = 25  # Number of random seeds (Borg MOEA)
    num_dec_vars = 10  # Number of decision variables
    n_objs = 6  # Number of objectives
    n_constrs = 0  # Number of constraints
    num_func_evals = 30000  # Number of total simulations to run per random seed. Each simulation may be a monte carlo.
    runtime_freq = 1000  # Interval at which to print runtime details for each random seed
    decision_var_range = [[0, 1], [4, 6], [-1,4], [1,2], [0,1], [0,1], [0,1], [0,1], [0,1], [0,1]]
    epsilon_list = [50000, 1000, 0.025, 10, 13, 4]  # Borg epsilon values for each objective

    # Where to save seed and runtime files
    main_output_file_dir = 'E:\output_directory'  # Specify location of output files for different seeds
    os_fold = Op_Sys_Folder_Operator()  # Folder operator for operating system
    output_location = main_output_file_dir + os_fold + 'sets'

    # If using master-slave, start MPI. Only do once.
    if parallel == 1:
        bg.Configuration.startMPI()  # start parallelization with MPI

    # Loop through seeds, calling borg.solve (serial) or borg.solveMPI (parallel) each time
    for j in range(nSeeds):
        # Instantiate borg class, then set bounds, epsilon values, and file output locations
        borg = bg.Borg(num_dec_vars, n_objs, n_constrs, Simulation_Caller)
        borg.setBounds(*decision_var_range)  # Set decision variable bounds
        borg.setEpsilons(*epsilon_list)  # Set epsilon values
        # Runtime file path for each seed:
        runtime_filename = main_output_file_dir + os_fold + 'runtime_file_seed_' + str(j+1) + '.runtime'
        if parallel == 1:
            # Run parallel Borg
            result = borg.solveMPI(maxEvaluations='num_func_evals', runtime=runtime_filename, frequency=runtime_freq)

        if parallel == 0:
            # Run serial Borg
            result = borg.solve({"maxEvaluations": num_func_evals, "runtimeformat": 'borg', "frequency": runtime_freq,
                                 "runtimefile": runtime_filename})

        if result:
            # This particular seed is now finished being run in parallel. The result will only be returned from
            # one node in case running Master-Slave Borg.
            result.display()

            # Create/write objective values and decision variable values to files in folder "sets", 1 file per seed.
            f = open(output_location + os_fold + 'Borg_DPS_PySedSim' + str(j+1) + '.set', 'w')
            f.write('#Borg Optimization Results\n')
            f.write('#First ' + str(num_dec_vars) + ' are the decision variables, ' + 'last ' + str(n_objs) +
                    ' are the ' + 'objective values\n')
            for solution in result:
                line = ''
                for i in range(len(solution.getVariables())):
                    line = line + (str(solution.getVariables()[i])) + ' '

                for i in range(len(solution.getObjectives())):
                    line = line + (str(solution.getObjectives()[i])) + ' '

                f.write(line[0:-1]+'\n')
            f.write("#")
            f.close()

            # Create/write only objective values to files in folder "sets", 1 file per seed. Purpose is so that
            # the file can be processed in MOEAFramework, where performance metrics may be evaluated across seeds.
            f2 = open(output_location + os_fold + 'Borg_DPS_PySedSim_no_vars' + str(j+1) + '.set', 'w')
            for solution in result:
                line = ''
                for i in range(len(solution.getObjectives())):
                    line = line + (str(solution.getObjectives()[i])) + ' '

                f2.write(line[0:-1]+'\n')
            f2.write("#")
            f2.close()

            print("Seed %s complete") %j

    if parallel == 1:
        bg.Configuration.stopMPI()  # stop parallel function evaluation process

def Op_Sys_Folder_Operator():
    '''
    Function to determine whether operating system is (1) Windows, or (2) Linux

    Returns folder operator for use in specifying directories (file locations) for reading/writing data pre- and
    post-simulation.
    '''

    if platform.system() == 'Windows':
        os_fold_op = '\\'
    elif platform.system() == 'Linux':
        os_fold_op = '/'
    else:
        os_fold_op = '/'  # Assume unix OS if it can't be identified

    return os_fold_op
 

The following is an example of how you would submit a batch script on a Linux cluster to run a parallelized simulation-optimization experiment using the example_sim_opt.py and borg.py files. Note that in the parallelized version, simulations (i.e., “function evaluations”) are being run in parallel by separate processors.

You would need the following two files:

  1. example_sim_opt_caller.py. This is a python file that is used to call example_sim_opt.py
  2. example_sim_opt_batch_script.pbs. This is batch script that runs example_sim_opt_caller.py in parallel on a cluster using open MPI.

Example code for example_sim_opt_caller.py:


'''
Purpose: To initiate the optimization process, which will iteratively call the simulation model.
'''

import example_sim_opt  # Import main optimization module that uses borg python wrapper

# Module within example
example_sim_opt.Optimization()

Example code for example_sim_opt_batch_script.pbs:


#PBS -l nodes=8:ppn=16
#PBS -l walltime=24:00:00
#PBS -j oe
#PBS -o pysedsim_output.out

cd $PBS_O_WORKDIR
source /etc/profile.d/modules.sh
module load openmpi-1.6.5-intel-x86_64/gnu
module load python-2.7.5
mpirun python example_sim_opt_caller.py

You could then run the above batch script with a command such as:

qsub example_sim_opt_batch_script.pbs

Using Rhodium for RDM Analysis of External Dataset

In my last blog post, I showed how to run an MORDM experiment using Rhodium. This process included the multi-objective optimization to an assumed state of the world (SOW) as well as the re-evaluation of the Pareto-approximate solutions on alternative SOWs, before using sensitivity and classification tools such as PRIM and CART for the scenario discovery analysis. However, there may be cases where you have to run the optimization and re-evaluation outside of Rhodium, for instance if your model is in another programming language than Python. There are two ways you can do this while still using Rhodium for the scenario discovery. The first option is to run the model through the Executioner. Another option is to run the model separately and import the output into the same format as is generated by Rhodium for post-analysis. I will explain the second method here for the fish game described in my last post.

The first step is to read the decision variables and objectives from the optimization into 2D arrays. Then the uncertainties, levers and responses can be defined as before, except they no longer have to be associated with an object of the class ‘Model‘.


# read in output of optimization
variables = np.loadtxt('FishGame/FishGame.resultfile',usecols=[0,1])
objectives = np.loadtxt('FishGame/FishGame.resultfile',usecols=[2,3])

# make maximization objectives positive
maxIndices = [0]
objectives[:,maxIndices] = -objectives[:,maxIndices]

# define X of XLRM framework
uncertainties = [UniformUncertainty("a", 1.5, 4.0),
    UniformUncertainty("b0", 0.25, 0.67)]

# define L of XLRM framework
levers = [RealLever("vars", 0.0, 1.0, length=2)]

# define R of XLRM framework
responses = [Response("NPVharvest", Response.MAXIMIZE),
    Response("std_z", Response.MINIMIZE)]

Note: If you are interested in using Rhodium’s plotting tools to visualize the results of the optimization, you can still make the uncertainties, levers and responses attributes of a model object. However, you will have to create a model function to instantiate the model. This is sloppy, but you can fake this by just creating a function that takes in the decision variables and model parameters, and returns the objective values, but doesn’t actually perform any calculations.


def fishGame(vars,
    a = 1.75, # rate of prey growth
    b0 = 0.6, # initial rate of predator growth
    F = 0, # rate of change of radiative forcing per unit time
    S = 0.5): # climate sensitivity)

    NPVharvest = None
    std_z = None

    return (NPVharvest, std_z)

model = Model(fishGame)

# define all parameters to the model that we will be studying
model.parameters = [Parameter("vars"),
    Parameter("a"),
    Parameter("b0"),
    Parameter("F"),
    Parameter("S")]

If using Rhodium for the optimization, this function would actually perform the desired calculation and Platypus could be used for the optimization. Since we have already performed the optimization, we just need to reformat the output of the optimization into that used by Rhodium for the RDM analysis. This can be done by mimicking the output structure that would be returned by the function ‘optimize‘.


# find number of solutions
nsolns = np.shape(objectives)[0]

# properly format output of optimization
output = DataSet()
for i in range(nsolns):
    env = OrderedDict()
    offset = 0

    for lever in levers:
        if lever.length == 1:
            env[lever.name] = list(variables[i,:])
        else:
            env[lever.name] = list(variables[i,offset:offset+lever.length])
            
        offset += lever.length

    for j, response in enumerate(responses):
        env[response.name] = objectives[i,j]

    output.append(env)

# write output to file
with open("FishGame/FishGame_data.txt","w") as f:
    json.dump(output, f)

Next we need to read in the uncertain parameters that were sampled for the re-evaluation and format the results of the re-evaluation into the same format as would be output by calling ‘evaluate‘ within Rhodium. Below is an example with the first solution (soln_index=0).

# read in LH samples of uncertain parameters and determine # of samples
LHsamples = np.loadtxt('FishGame/LHsamples.txt')
nsamples = np.shape(LHsamples)[0]

# load policy from optimization
soln_index = 0
policy = output[soln_index]

# load its objective values from re-evaluation and make maximization objectives positive
objectives = np.loadtxt('FishGame/MORDMreeval/FishGame_Soln' + str(soln_index+1) + '.obj')
objectives[:,maxIndices] = -objectives[:,maxIndices]

# convert re-evaluation output to proper format
results = DataSet()
for j in range(nsamples):
    env = OrderedDict()
    offset = 0

    for k, uncertainty in enumerate(uncertainties):
        env[uncertainty.name] = LHsamples[j,k]

    for k, response in enumerate(responses):
        env[response.name] = objectives[j,k]

    for lever in levers:
        if lever.length == 1:
            env[lever.name] = list(variables[soln_index,:])
        else:
            env[lever.name] = list(variables[soln_index,offset:offset+lever.length])

        offset += lever.length

    results.append(env)

# write results to file
with open("FishGame/FishGame_Soln" + str(soln_index+1) + "_reeval.txt","w") as f:
    json.dump(results, f)

Finally, you have to define the metrics.


# calculate M of XLRM framework
metric = ["Profitable" if v["NPVharvest"] >= 3.0 else "Unprofitable" for v in results]

Then you can run PRIM and CART.  This requires defining the names, or ‘keys’, of the uncertain parameters. If you created a fake model object, you can pass ‘include=model.uncertainties.keys()’ to the functions Prim() and Cart(). If not, you have to create your own list of ‘keys’ as I do below.


keys = []
for i in range(len(uncertainties)):
    keys.append(uncertainties[i].name)

# run PRIM and CART on metrics
p = Prim(results, metric, include=keys, coi="Profitable")
box = p.find_box()
box.show_details()
plt.show()

c = Cart(results, metrics[j], include=keys)
c.print_tree(coi="Profitable")
c.show_tree()
plt.show()

The above code creates the following two figures.

pasting_trajectory_24figure_1

 

If you had run the analysis using Sobol samples, you could use the SALib wrapper to calculate sensitivity indices and make bar charts or radial convergence plots of the results. (Note: My previous post did not show how to make these plots, but has since been updated. Check it out here.)


import seaborn as sns
from SALib.analyze import sobol
from SALib.util import read_param_file

# Read the parameter range file and Sobol samples
problem = read_param_file('FishGame/uncertain_params.txt')
param_values = np.loadtxt('FishGame/SobolSamples.txt')

# Load the first solution
Y = np.loadtxt('FishGame/SobolReeval/FishGame_Soln' + (soln_index+1) + '.obj')

# Evaluate sensitivity to the first objective, NPVharvest
obj_index = 0
Si = sobol.analyze(problem, Y[:,obj_index], calc_second_order=True, conf_level=0.95, print_to_console=False)
pretty_result = get_pretty_result(Si)

sns.set()
fig1 = pretty_result.plot()
fig2 = pretty_result.plot_sobol(threshold=0.01,groups={"Prey Growth Parameters" : ["a"],
        "Predator Growth Parameters" : ["b0"]})

def get_pretty_result(result):
    pretty_result = SAResult(result["names"] if "names" in result else problem["names"])

    if "S1" in result:
        pretty_result["S1"] = {k : float(v) for k, v in zip(problem["names"], result["S1"])}
    if "S1_conf" in result:
        pretty_result["S1_conf"] = {k : float(v) for k, v in zip(problem["names"], result["S1_conf"])}
    if "ST" in result:
        pretty_result["ST"] = {k : float(v) for k, v in zip(problem["names"], result["ST"])}
    if "ST_conf" in result:
        pretty_result["ST_conf"] = {k : float(v) for k, v in zip(problem["names"], result["ST_conf"])}
    if "S2" in result:
        pretty_result["S2"] = _S2_to_dict(result["S2"], problem)
    if "S2_conf" in result:
        pretty_result["S2_conf"] = _S2_to_dict(result["S2_conf"], problem)
    if "delta" in result:
        pretty_result["delta"] = {k : float(v) for k, v in zip(problem["names"], result["delta"])}
    if "delta_conf" in result:
        pretty_result["delta_conf"] = {k : float(v) for k, v in zip(problem["names"], result["delta_conf"])}
    if "vi" in result:
        pretty_result["vi"] = {k : float(v) for k, v in zip(problem["names"], result["vi"])}
    if "vi_std" in result:
        pretty_result["vi_std"] = {k : float(v) for k, v in zip(problem["names"], result["vi_std"])}
    if "dgsm" in result:
        pretty_result["dgsm"] = {k : float(v) for k, v in zip(problem["names"], result["dgsm"])}
    if "dgsm_conf" in result:
        pretty_result["dgsm_conf"] = {k : float(v) for k, v in zip(problem["names"], result["dgsm_conf"])}
    if "mu" in result:
        pretty_result["mu"] = {k : float(v) for k, v in zip(result["names"], result["mu"])}
    if "mu_star" in result:
        pretty_result["mu_star"] = {k : float(v) for k, v in zip(result["names"], result["mu_star"])}
    if "mu_star_conf" in result:
        pretty_result["mu_star_conf"] = {k : float(v) for k, v in zip(result["names"], result["mu_star_conf"])}
    if "sigma" in result:
        pretty_result["sigma"] = {k : float(v) for k, v in zip(result["names"], result["sigma"])}

    return pretty_result

def _S2_to_dict(matrix, problem):
    result = {}
    names = list(problem["names"])
    for i in range(problem["num_vars"]):
        for j in range(i+1, problem["num_vars"]):
            if names[i] not in result:
                result[names[i]] = {}
            if names[j] not in result:
                result[names[j]] = {}

            result[names[i]][names[j]] = result[names[j]][names[i]] = float(matrix[i][j])

    return result

soln1_obj1_barchartsoln1_obj1_radialplot
 

So don’t feel like you need to run your optimization and re-evaluation in Python in order to use Rhodium!