January 30, 2017 by Julie

Using Rhodium for RDM Analysis of External Dataset

In my last blog post, I showed how to run an MORDM experiment using Rhodium. This process included the multi-objective optimization to an assumed state of the world (SOW) as well as the re-evaluation of the Pareto-approximate solutions on alternative SOWs, before using sensitivity and classification tools such as PRIM and CART for the scenario discovery analysis. However, there may be cases where you have to run the optimization and re-evaluation outside of Rhodium, for instance if your model is in another programming language than Python. There are two ways you can do this while still using Rhodium for the scenario discovery. The first option is to run the model through the Executioner. Another option is to run the model separately and import the output into the same format as is generated by Rhodium for post-analysis. I will explain the second method here for the fish game described in my last post.

The first step is to read the decision variables and objectives from the optimization into 2D arrays. Then the uncertainties, levers and responses can be defined as before, except they no longer have to be associated with an object of the class ‘Model‘.


# read in output of optimization
variables = np.loadtxt('FishGame/FishGame.resultfile',usecols=[0,1])
objectives = np.loadtxt('FishGame/FishGame.resultfile',usecols=[2,3])

# make maximization objectives positive
maxIndices = [0]
objectives[:,maxIndices] = -objectives[:,maxIndices]

# define X of XLRM framework
uncertainties = [UniformUncertainty("a", 1.5, 4.0),
    UniformUncertainty("b0", 0.25, 0.67)]

# define L of XLRM framework
levers = [RealLever("vars", 0.0, 1.0, length=2)]

# define R of XLRM framework
responses = [Response("NPVharvest", Response.MAXIMIZE),
    Response("std_z", Response.MINIMIZE)]

Note: If you are interested in using Rhodium’s plotting tools to visualize the results of the optimization, you can still make the uncertainties, levers and responses attributes of a model object. However, you will have to create a model function to instantiate the model. This is sloppy, but you can fake this by just creating a function that takes in the decision variables and model parameters, and returns the objective values, but doesn’t actually perform any calculations.


def fishGame(vars,
    a = 1.75, # rate of prey growth
    b0 = 0.6, # initial rate of predator growth
    F = 0, # rate of change of radiative forcing per unit time
    S = 0.5): # climate sensitivity)

    NPVharvest = None
    std_z = None

    return (NPVharvest, std_z)

model = Model(fishGame)

# define all parameters to the model that we will be studying
model.parameters = [Parameter("vars"),
    Parameter("a"),
    Parameter("b0"),
    Parameter("F"),
    Parameter("S")]

If using Rhodium for the optimization, this function would actually perform the desired calculation and Platypus could be used for the optimization. Since we have already performed the optimization, we just need to reformat the output of the optimization into that used by Rhodium for the RDM analysis. This can be done by mimicking the output structure that would be returned by the function ‘optimize‘.


# find number of solutions
nsolns = np.shape(objectives)[0]

# properly format output of optimization
output = DataSet()
for i in range(nsolns):
    env = OrderedDict()
    offset = 0

    for lever in levers:
        if lever.length == 1:
            env[lever.name] = list(variables[i,:])
        else:
            env[lever.name] = list(variables[i,offset:offset+lever.length])
            
        offset += lever.length

    for j, response in enumerate(responses):
        env[response.name] = objectives[i,j]

    output.append(env)

# write output to file
with open("FishGame/FishGame_data.txt","w") as f:
    json.dump(output, f)

Next we need to read in the uncertain parameters that were sampled for the re-evaluation and format the results of the re-evaluation into the same format as would be output by calling ‘evaluate‘ within Rhodium. Below is an example with the first solution (soln_index=0).

# read in LH samples of uncertain parameters and determine # of samples
LHsamples = np.loadtxt('FishGame/LHsamples.txt')
nsamples = np.shape(LHsamples)[0]

# load policy from optimization
soln_index = 0
policy = output[soln_index]

# load its objective values from re-evaluation and make maximization objectives positive
objectives = np.loadtxt('FishGame/MORDMreeval/FishGame_Soln' + str(soln_index+1) + '.obj')
objectives[:,maxIndices] = -objectives[:,maxIndices]

# convert re-evaluation output to proper format
results = DataSet()
for j in range(nsamples):
    env = OrderedDict()
    offset = 0

    for k, uncertainty in enumerate(uncertainties):
        env[uncertainty.name] = LHsamples[j,k]

    for k, response in enumerate(responses):
        env[response.name] = objectives[j,k]

    for lever in levers:
        if lever.length == 1:
            env[lever.name] = list(variables[soln_index,:])
        else:
            env[lever.name] = list(variables[soln_index,offset:offset+lever.length])

        offset += lever.length

    results.append(env)

# write results to file
with open("FishGame/FishGame_Soln" + str(soln_index+1) + "_reeval.txt","w") as f:
    json.dump(results, f)

Finally, you have to define the metrics.


# calculate M of XLRM framework
metric = ["Profitable" if v["NPVharvest"] >= 3.0 else "Unprofitable" for v in results]

Then you can run PRIM and CART. This requires defining the names, or ‘keys’, of the uncertain parameters. If you created a fake model object, you can pass ‘include=model.uncertainties.keys()’ to the functions Prim() and Cart(). If not, you have to create your own list of ‘keys’ as I do below.


keys = []
for i in range(len(uncertainties)):
    keys.append(uncertainties[i].name)

# run PRIM and CART on metrics
p = Prim(results, metric, include=keys, coi="Profitable")
box = p.find_box()
box.show_details()
plt.show()

c = Cart(results, metrics[j], include=keys)
c.print_tree(coi="Profitable")
c.show_tree()
plt.show()

The above code creates the following two figures.

pasting_trajectory_24 figure_1

If you had run the analysis using Sobol samples, you could use the SALib wrapper to calculate sensitivity indices and make bar charts or radial convergence plots of the results. (Note: My previous post did not show how to make these plots, but has since been updated. Check it out here.)


import numpy as np
import seaborn as sns
from rhodium import *
from SALib.analyze import sobol
from SALib.util import read_param_file

def _S2_to_dict(matrix, problem): 
    result = {} names = list(problem["names"]) 
    for i in range(problem["num_vars"]): 
        for j in range(i+1, problem["num_vars"]): 
            if names[i] not in result: 
                result[names[i]] = {} 
             if names[j] not in result: 
                result[names[j]] = {} 

            result[names[i]][names[j]] = result[names[j]][names[i]] = float(matrix[i][j]) 

    return result

def get_pretty_result(result):
    pretty_result = SAResult(result["names"] if "names" in result else problem["names"])

    if "S1" in result:
        pretty_result["S1"] = {k : float(v) for k, v in zip(problem["names"], result["S1"])}
    if "S1_conf" in result:
        pretty_result["S1_conf"] = {k : float(v) for k, v in zip(problem["names"], result["S1_conf"])}
    if "ST" in result:
        pretty_result["ST"] = {k : float(v) for k, v in zip(problem["names"], result["ST"])}
    if "ST_conf" in result:
        pretty_result["ST_conf"] = {k : float(v) for k, v in zip(problem["names"], result["ST_conf"])}
    if "S2" in result:
        pretty_result["S2"] = _S2_to_dict(result["S2"], problem)
    if "S2_conf" in result:
        pretty_result["S2_conf"] = _S2_to_dict(result["S2_conf"], problem)
    if "delta" in result:
        pretty_result["delta"] = {k : float(v) for k, v in zip(problem["names"], result["delta"])}
    if "delta_conf" in result:
        pretty_result["delta_conf"] = {k : float(v) for k, v in zip(problem["names"], result["delta_conf"])}
    if "vi" in result:
        pretty_result["vi"] = {k : float(v) for k, v in zip(problem["names"], result["vi"])}
    if "vi_std" in result:
        pretty_result["vi_std"] = {k : float(v) for k, v in zip(problem["names"], result["vi_std"])}
    if "dgsm" in result:
        pretty_result["dgsm"] = {k : float(v) for k, v in zip(problem["names"], result["dgsm"])}
    if "dgsm_conf" in result:
        pretty_result["dgsm_conf"] = {k : float(v) for k, v in zip(problem["names"], result["dgsm_conf"])}
    if "mu" in result:
        pretty_result["mu"] = {k : float(v) for k, v in zip(result["names"], result["mu"])}
    if "mu_star" in result:
        pretty_result["mu_star"] = {k : float(v) for k, v in zip(result["names"], result["mu_star"])}
    if "mu_star_conf" in result:
        pretty_result["mu_star_conf"] = {k : float(v) for k, v in zip(result["names"], result["mu_star_conf"])}
    if "sigma" in result:
        pretty_result["sigma"] = {k : float(v) for k, v in zip(result["names"], result["sigma"])}

    return pretty_result

# Read the parameter range file and Sobol samples 
problem = read_param_file('FishGame/uncertain_params.txt') 
param_values = np.loadtxt('FishGame/SobolSamples.txt') 

# Load the first solution 
Y = np.loadtxt('FishGame/SobolReeval/FishGame_Soln' + (soln_index+1) + '.obj') 

# Evaluate sensitivity to the first objective, NPVharvest 
obj_index = 0 
Si = sobol.analyze(problem, Y[:,obj_index], calc_second_order=True, conf_level=0.95, print_to_console=False) 
pretty_result = get_pretty_result(Si)

# Plot sensitivity
sns.set() 
fig1 = pretty_result.plot() 
fig2 = pretty_result.plot_sobol(threshold=0.01,groups={"Prey Growth Parameters" : ["a"], "Predator Growth Parameters" : ["b0"]})

soln1_obj1_barchart soln1_obj1_radialplot

So don’t feel like you need to run your optimization and re-evaluation in Python in order to use Rhodium!

January 21, 2017 by Bernardo Trindade

Alluvial Plots

We all love parallel coordinates plots and use them all the time to display our high dimensional data and tell our audience a good story. But sometimes we may have large amounts of data points whose tradeoffs’ existence or lack thereof cannot be clearly verified, or the data to be plotted is categorical and therefore awkwardly displayed in a parallel coordinates plot.

One possible solution to both issues is the use of alluvial plots. Alluvial plots work similarly to parallel coordinates plots, but instead of having ranges of values in the axes, it contains bins whose sizes in an axis depends on how many data points belong to that bin. Data points that fall within the same categories in all axes are grouped into alluvia (stripes), whose thicknesses reflect the number of data points in each alluvium.

Next are two examples of alluvial plots, the fist displaying categorical data and the second displaying continuous data that would normally be plotted in a parallel coordinates plot. After the examples, there is code available to generate alluvial plots in R (I know, I don’t like using R, but creating alluvial plots in R is easier than you think).

Categorical data

The first example (Figure 1) comes from the cran page for the alluvial plots package page. It uses alluvial plots to display data about all Titanic’s passengers/crew and group them into categories according to class, sex, age, and survival status.

Figure 1 – Titanic passenger/crew data. Yellow alluvia correspond to survivors and gray correspond to deceased. The size of each bin represents how many data points (people) belong to that category in a given axis, while the thickness of each alluvium represent how many people fall within the same categories in all axes. Source: https://cran.r-project.org/web/packages/alluvial/vignettes/alluvial.html.

Figure 1 shows that most of the passengers were male and adults, that the crew represented a substantial amount of the total amount of people in the Titanic, and that, unfortunately, there were more deceased than survivors. We can also see that a substantial amount of the people in the boat were male adult crew members who did not survive, which can be inferred by the thickness of the grey alluvium that goes through all these categories — it can also be seen by the lack of an alluvia hitting the Crew and Child bins, that (obviously) there were no children crew members. It can be also seen that 1st class female passengers was the group with the greatest survival rate (100%, according to the plot), while 3rd class males had the lowest (ballpark 15%, comparing the yellow and gray alluvia for 3rd class males).

Continuous data

The following example shows the results of policy modeling for a fictitious water utility using three different policy formulations. Each data point represents the modeled performance of a given candidate policy in six objectives, one in each axis. Given the uncertainties associated with the models used to generate this data, the client utility company is more concerned about whether or not a candidate policy would meet certain performance criteria according to the model (Reliability > 99%, Restriction Frequency < 20%, and Financial Risk < 10%) than about the actual objective values. The utility also wants to have a general idea of the tradeoffs between objectives.

Figure 2 was created to present the modeling results to the client water utility. The colored alluvia represent candidate policies that meet the utility’s criteria, and grey lines represent otherwise. The continuous raw data used to generate this plot was categorized following ranges whose values are meaningful to the client utility, with the best performing bin always put in the bottom of the plot. It is important to notice that the height of the bins represent the number of policies that belong to that bin, meaning that the position of the gap between two stacked bins does not represent a value in an axis, but the fraction of the policies that belong to each bin. It can be noticed from Figure 2 that it is relatively difficult for any of the formulations to meet the Reliability > 99% criteria established by the utility. It is also striking that a remarkably small number of policies from the first two formulations and none of the policies from the third formulation meet the criteria established by the utilities. It can also be easily seen by following the right alluvia that the vast majority of the solutions with smaller net present costs of infrastructure investment obtained with all three formulations perform poorly in the reliability and restriction frequency objectives, which denotes a strong tradeoff. The fact that such tradeoffs could be seen when the former axis is on the opposite side of the plot to the latter two is a remarkable feature of alluvial plots.

Figure 2 – Alluvial plot displaying modeled performance of candidate long-term planning policies. The different subplots show different formulations (1 in the top, 3 in the bottom).

The parallel coordinates plots in Figure 3 displays the same information as the alluvial plot in Figure 2. It can be readily seen that the analysis performed above, especially when it comes to the tradeoffs, would be more easily done with Figure 2 than with Figure 3. However, if the actual objective values were important for the analysis, Figure 3 would be needed either by itself or in addition to Figure 2, the latter being used likely as a pre-screening or for a higher level analysis of the results.

Figure 3 – Parallel coordinates plot displaying modeled performance of candidate long-term planning policies. The different subplots show different formulations (1 in the top, 3 in the bottom).

The R code used to create Figure 1 can be found here. The code below was used to create Figure 2 — The packages “alluvia”l and “dplyr” need to be installed before attempting to use the provided code, for example using the R command install.packages(package_name). Also, the user needs to convert its continuous data into categorical data, so that each row corresponds to a possible combination of bins in all axis (one column per axis) plus a column (freqs) representing the frequencies with which each combination of bins is seen in the data.

# Example datafile: snippet of file "infra_tradeoffs_strong_freqs.csv"
Reliability, Net Present Cost of Inf. Investment, Peak Financial Costs, Financial Risk, Restriction Frequency, Jordan Lake Allocation, freqs
2<99,0<60,0<25,0<10,2>20,0<70,229
0>99,2>60,0<25,0<10,2>20,0<70,0
2<99,2>60,0<25,0<10,2>20,0<70,168
0>99,0<60,2>25,0<10,2>20,0<70,0
2<99,0<60,2>25,0<10,2>20,0<70,3
0>99,2>60,2>25,0<10,2>20,0<70,2
2<99,2>60,2>25,0<10,2>20,0<70,45
0>99,0<60,0<25,2>10,2>20,0<70,0
2<99,0<60,0<25,2>10,2>20,0<70,317
0>99,2>60,0<25,2>10,2>20,0<70,0
2<99,2>60,0<25,2>10,2>20,0<70,114

# load packages and prepare data
library(alluvial)
library(dplyr)

itss <- read.csv('infra_tradeoffs_strong_freqs.csv')
itsw <- read.csv('infra_tradeoffs_weak_freqs.csv')
itsn <- read.csv('infra_tradeoffs_no_freqs.csv')

# preprocess the data (convert do dataframe)
itss %>% group_by(Reliability, Restriction.Frequency, Financial.Risk, Peak.Financial.Costs, Net.Present.Cost.of.Inf..Investment, Jordan.Lake.Allocation) %>%
summarise(n = sum(freqs)) -> its_strong
itsw %>% group_by(Reliability, Restriction.Frequency, Financial.Risk, Peak.Financial.Costs, Net.Present.Cost.of.Inf..Investment, Jordan.Lake.Allocation) %>%
summarise(n = sum(freqs)) -> its_weak
itsn %>% group_by(Reliability, Restriction.Frequency, Financial.Risk, Peak.Financial.Costs, Net.Present.Cost.of.Inf..Investment, Jordan.Lake.Allocation) %>%
summarise(n = sum(freqs)) -> its_no

# setup output file
svg(filename="tradeoffs_3_formulations.svg",
width=8,
height=8,
pointsize=18)
p <- par(mfrow=c(3,1))
par(bg = 'white')

# create the plots
alluvial(
its_strong[,1:6],
freq=its_strong$n,
col = ifelse(its_strong$Reliability == "0>99" &
its_strong$Restriction.Frequency == "0<20" &
its_strong$Financial.Risk == "0<10", "blue", "grey"),
border = ifelse(its_strong$Reliability == "0>99" &
its_strong$Restriction.Frequency == "0<20" &
its_strong$Financial.Risk == "0<10", "blue", "grey"),
# border = "grey",
alpha = 0.5,
hide=its_strong$n < 1
)
alluvial(
its_weak[,1:6],
freq=its_weak$n,
col = ifelse(its_strong$Reliability == "0>99" &
its_strong$Restriction.Frequency == "0<20" &
its_weak$Financial.Risk == "0<10", "chartreuse2", "grey"),
border = ifelse(its_strong$Reliability == "0>99" &
its_strong$Restriction.Frequency == "0<20" &
its_weak$Financial.Risk == "0<10", "chartreuse2", "grey"),
# border = "grey",
alpha = 0.5,
hide=its_weak$n < 1
)
alluvial(
its_no[,1:6],
freq=its_no$n,
col = ifelse(its_strong$Reliability == "0>99" &
its_strong$Restriction.Frequency == "0<20" &
its_no$Financial.Risk == "0<10", "red", "grey"),
border = ifelse(its_strong$Reliability == "0>99" &
its_strong$Restriction.Frequency == "0<20" &
its_no$Financial.Risk == "0<10", "red", "grey"),
# border = "grey",
alpha = 0.5,
hide=its_no$n < 1
)
dev.off()

Water Programming: A Collaborative Research Blog

Tips and tricks on programming, evolutionary algorithms, and doing research

Month: January 2017

Using Rhodium for RDM Analysis of External Dataset

Alluvial Plots