Using Rhodium for RDM Analysis of External Dataset

In my last blog post, I showed how to run an MORDM experiment using Rhodium. This process included the multi-objective optimization to an assumed state of the world (SOW) as well as the re-evaluation of the Pareto-approximate solutions on alternative SOWs, before using sensitivity and classification tools such as PRIM and CART for the scenario discovery analysis. However, there may be cases where you have to run the optimization and re-evaluation outside of Rhodium, for instance if your model is in another programming language than Python. There are two ways you can do this while still using Rhodium for the scenario discovery. The first option is to run the model through the Executioner. Another option is to run the model separately and import the output into the same format as is generated by Rhodium for post-analysis. I will explain the second method here for the fish game described in my last post.

The first step is to read the decision variables and objectives from the optimization into 2D arrays. Then the uncertainties, levers and responses can be defined as before, except they no longer have to be associated with an object of the class ‘Model‘.


# read in output of optimization
variables = np.loadtxt('FishGame/FishGame.resultfile',usecols=[0,1])
objectives = np.loadtxt('FishGame/FishGame.resultfile',usecols=[2,3])

# make maximization objectives positive
maxIndices = [0]
objectives[:,maxIndices] = -objectives[:,maxIndices]

# define X of XLRM framework
uncertainties = [UniformUncertainty("a", 1.5, 4.0),
    UniformUncertainty("b0", 0.25, 0.67)]

# define L of XLRM framework
levers = [RealLever("vars", 0.0, 1.0, length=2)]

# define R of XLRM framework
responses = [Response("NPVharvest", Response.MAXIMIZE),
    Response("std_z", Response.MINIMIZE)]

Note: If you are interested in using Rhodium’s plotting tools to visualize the results of the optimization, you can still make the uncertainties, levers and responses attributes of a model object. However, you will have to create a model function to instantiate the model. This is sloppy, but you can fake this by just creating a function that takes in the decision variables and model parameters, and returns the objective values, but doesn’t actually perform any calculations.


def fishGame(vars,
    a = 1.75, # rate of prey growth
    b0 = 0.6, # initial rate of predator growth
    F = 0, # rate of change of radiative forcing per unit time
    S = 0.5): # climate sensitivity)

    NPVharvest = None
    std_z = None

    return (NPVharvest, std_z)

model = Model(fishGame)

# define all parameters to the model that we will be studying
model.parameters = [Parameter("vars"),
    Parameter("a"),
    Parameter("b0"),
    Parameter("F"),
    Parameter("S")]

If using Rhodium for the optimization, this function would actually perform the desired calculation and Platypus could be used for the optimization. Since we have already performed the optimization, we just need to reformat the output of the optimization into that used by Rhodium for the RDM analysis. This can be done by mimicking the output structure that would be returned by the function ‘optimize‘.


# find number of solutions
nsolns = np.shape(objectives)[0]

# properly format output of optimization
output = DataSet()
for i in range(nsolns):
    env = OrderedDict()
    offset = 0

    for lever in levers:
        if lever.length == 1:
            env[lever.name] = list(variables[i,:])
        else:
            env[lever.name] = list(variables[i,offset:offset+lever.length])
            
        offset += lever.length

    for j, response in enumerate(responses):
        env[response.name] = objectives[i,j]

    output.append(env)

# write output to file
with open("FishGame/FishGame_data.txt","w") as f:
    json.dump(output, f)

Next we need to read in the uncertain parameters that were sampled for the re-evaluation and format the results of the re-evaluation into the same format as would be output by calling ‘evaluate‘ within Rhodium. Below is an example with the first solution (soln_index=0).

# read in LH samples of uncertain parameters and determine # of samples
LHsamples = np.loadtxt('FishGame/LHsamples.txt')
nsamples = np.shape(LHsamples)[0]

# load policy from optimization
soln_index = 0
policy = output[soln_index]

# load its objective values from re-evaluation and make maximization objectives positive
objectives = np.loadtxt('FishGame/MORDMreeval/FishGame_Soln' + str(soln_index+1) + '.obj')
objectives[:,maxIndices] = -objectives[:,maxIndices]

# convert re-evaluation output to proper format
results = DataSet()
for j in range(nsamples):
    env = OrderedDict()
    offset = 0

    for k, uncertainty in enumerate(uncertainties):
        env[uncertainty.name] = LHsamples[j,k]

    for k, response in enumerate(responses):
        env[response.name] = objectives[j,k]

    for lever in levers:
        if lever.length == 1:
            env[lever.name] = list(variables[soln_index,:])
        else:
            env[lever.name] = list(variables[soln_index,offset:offset+lever.length])

        offset += lever.length

    results.append(env)

# write results to file
with open("FishGame/FishGame_Soln" + str(soln_index+1) + "_reeval.txt","w") as f:
    json.dump(results, f)

Finally, you have to define the metrics.


# calculate M of XLRM framework
metric = ["Profitable" if v["NPVharvest"] >= 3.0 else "Unprofitable" for v in results]

Then you can run PRIM and CART.  This requires defining the names, or ‘keys’, of the uncertain parameters. If you created a fake model object, you can pass ‘include=model.uncertainties.keys()’ to the functions Prim() and Cart(). If not, you have to create your own list of ‘keys’ as I do below.


keys = []
for i in range(len(uncertainties)):
    keys.append(uncertainties[i].name)

# run PRIM and CART on metrics
p = Prim(results, metric, include=keys, coi="Profitable")
box = p.find_box()
box.show_details()
plt.show()

c = Cart(results, metrics[j], include=keys)
c.print_tree(coi="Profitable")
c.show_tree()
plt.show()

The above code creates the following two figures.

pasting_trajectory_24figure_1

 

If you had run the analysis using Sobol samples, you could use the SALib wrapper to calculate sensitivity indices and make bar charts or radial convergence plots of the results. (Note: My previous post did not show how to make these plots, but has since been updated. Check it out here.)


import numpy as np
import seaborn as sns
from rhodium import *
from SALib.analyze import sobol
from SALib.util import read_param_file

def _S2_to_dict(matrix, problem): 
    result = {} names = list(problem["names"]) 
    for i in range(problem["num_vars"]): 
        for j in range(i+1, problem["num_vars"]): 
            if names[i] not in result: 
                result[names[i]] = {} 
             if names[j] not in result: 
                result[names[j]] = {} 

            result[names[i]][names[j]] = result[names[j]][names[i]] = float(matrix[i][j]) 

    return result

def get_pretty_result(result):
    pretty_result = SAResult(result["names"] if "names" in result else problem["names"])

    if "S1" in result:
        pretty_result["S1"] = {k : float(v) for k, v in zip(problem["names"], result["S1"])}
    if "S1_conf" in result:
        pretty_result["S1_conf"] = {k : float(v) for k, v in zip(problem["names"], result["S1_conf"])}
    if "ST" in result:
        pretty_result["ST"] = {k : float(v) for k, v in zip(problem["names"], result["ST"])}
    if "ST_conf" in result:
        pretty_result["ST_conf"] = {k : float(v) for k, v in zip(problem["names"], result["ST_conf"])}
    if "S2" in result:
        pretty_result["S2"] = _S2_to_dict(result["S2"], problem)
    if "S2_conf" in result:
        pretty_result["S2_conf"] = _S2_to_dict(result["S2_conf"], problem)
    if "delta" in result:
        pretty_result["delta"] = {k : float(v) for k, v in zip(problem["names"], result["delta"])}
    if "delta_conf" in result:
        pretty_result["delta_conf"] = {k : float(v) for k, v in zip(problem["names"], result["delta_conf"])}
    if "vi" in result:
        pretty_result["vi"] = {k : float(v) for k, v in zip(problem["names"], result["vi"])}
    if "vi_std" in result:
        pretty_result["vi_std"] = {k : float(v) for k, v in zip(problem["names"], result["vi_std"])}
    if "dgsm" in result:
        pretty_result["dgsm"] = {k : float(v) for k, v in zip(problem["names"], result["dgsm"])}
    if "dgsm_conf" in result:
        pretty_result["dgsm_conf"] = {k : float(v) for k, v in zip(problem["names"], result["dgsm_conf"])}
    if "mu" in result:
        pretty_result["mu"] = {k : float(v) for k, v in zip(result["names"], result["mu"])}
    if "mu_star" in result:
        pretty_result["mu_star"] = {k : float(v) for k, v in zip(result["names"], result["mu_star"])}
    if "mu_star_conf" in result:
        pretty_result["mu_star_conf"] = {k : float(v) for k, v in zip(result["names"], result["mu_star_conf"])}
    if "sigma" in result:
        pretty_result["sigma"] = {k : float(v) for k, v in zip(result["names"], result["sigma"])}

    return pretty_result

# Read the parameter range file and Sobol samples 
problem = read_param_file('FishGame/uncertain_params.txt') 
param_values = np.loadtxt('FishGame/SobolSamples.txt') 

# Load the first solution 
Y = np.loadtxt('FishGame/SobolReeval/FishGame_Soln' + (soln_index+1) + '.obj') 

# Evaluate sensitivity to the first objective, NPVharvest 
obj_index = 0 
Si = sobol.analyze(problem, Y[:,obj_index], calc_second_order=True, conf_level=0.95, print_to_console=False) 
pretty_result = get_pretty_result(Si)

# Plot sensitivity
sns.set() 
fig1 = pretty_result.plot() 
fig2 = pretty_result.plot_sobol(threshold=0.01,groups={"Prey Growth Parameters" : ["a"], "Predator Growth Parameters" : ["b0"]})

soln1_obj1_barchartsoln1_obj1_radialplot

 

So don’t feel like you need to run your optimization and re-evaluation in Python in order to use Rhodium!

Rhodium – Open Source Python Library for (MO)RDM

Last year Dave Hadka introduced OpenMORDM (Hadka et al., 2015), an open source R package for Multi-Objective Robust Decision Making (Kasprzyk et al., 2013). If you liked the capabilities of OpenMORM but prefer coding in Python, you’ll be happy to hear Dave has also written an open source Python library for robust decision making (RDM) (Lempert et al., 2003), including multi-objective robust decision making (MORDM): Rhodium.

Rhodium is part of Project Platypus, which also contains Python libraries for multi-objective optimization (Platypus), a standalone version of the Patient Rule Induction method (PRIM) algorithm (Friedman and Fisher, 1999) implemented in Jan Kwakkel’s EMA Workbench, and a cross-language automation tool for running models (Executioner). Rhodium uses functions from both Platypus and PRIM, as I will briefly show here, but Jazmin will describe Platypus in more detail in a future blog post.

Dave provides an ipython notebook file with clear instructions on how to use Rhodium, with the lake problem given as an example. To give another example, I will walk through the fish game. The fish game is a chaotic predator-prey system in which the population of fish, x, and their predators, y, are co-dependent. Predator-prey systems are typically modeled by the classic Lotka-Volterra equations:

1) \frac{dx}{dt} = \alpha x - \beta x y

2) \frac{dy}{dt} = \delta x y - \gamma y_t

where α is the growth rate of the prey (fish), β is the rate of predation on the prey, δ is the growth rate of the predator, and γ is the death rate of the predator. This model assumes exponential growth of the prey, x, and exponential death of the predator. Based on a classroom exercise given at RAND, I modify the Lotka-Volterra model of the prey population for logistic growth (see the competitive Lotka-Volterra equations):

3) \frac{dx}{dt} = \alpha x - r x^2 - \beta x y

Discretizing equations 1 and 3 yields:

4) x_{t+1} = (\alpha + 1)x_t (1 - \frac{r}{\alpha + 1} x_t) - \beta x_t y_t and

5) y_{t+1} = (1 - \gamma)y_t + \delta x_t y_t

RAND simplifies equation 4 by letting a = α + 1, r/(α + 1) = 1 and β = 1, and simplifies equation 5 by letting b = 1/δ and γ = 1. This yields the following equations:

6) x_{t+1} = \alpha  x_t(1-x_t) - x_t y_t,

7) y_{t+1} = \frac{x_t y_t}{b}.

In this formulation, the parameter a controls the growth rate of the fish and b controls the growth rate of the predators. The growth rate of the predators is dependent on the temperature, which is increasing due to climate change according to the following equation:

8) C \frac{dT}{dt} = (F_0 + Ft) - \frac{T}{S}

where C is the heat capacity, assumed to be 50 W/m2/K/yr, F0 is the initial value of radiative forcing, assumed to be 1.0 W/m2, F is the rate of change of radiative forcing, S is the climate sensitivity in units of K/(W/m2), and T is the temperature increase from equilibrium, initialized at 0. The dependence of b on the temperature increase is given by:

9) b = \text{max} \Bigg( b_0 e^{-0.3T},0.25 \Bigg).

The parameters a, b, F, and S could all be considered deeply uncertain, but for this example I will use (unrealistically optimistic) values of F = 0 and S = 0.5 and assume possible ranges for a and b0 of 1.5 < a < 4 and 0.25 < b0 < 0.67. Within these bounds, different combinations of a and b parameters can lead to point attractors, strange attractors, or collapse of the predator population.

The goal of the game is to design a strategy for harvesting some number of fish, z, at each time step assuming that only the fish population can be observed, not the prey. The population of the fish then becomes:

10) x_{t+1} = \alpha x_t(1-x_t) - x_t y_t - z_t

For this example, I assume the user employs a strategy of harvesting some weighted average of the fish population in the previous two time steps:

11) z_t = \begin{cases}  \text{min} \Bigg( \alpha\beta x_t + \alpha(1-\beta)x_{t-1},x_{t} \Bigg),  t \geq 2\\  \alpha\beta x_t, t = 1  \end{cases}

where 0 ≤ α ≤ 1 and 0 ≤ β ≤ 1. The user is assumed to have two objectives: 1) to maximize the net present value of their total harvest over T time steps, and 2) to minimize the standard deviation of their harvests over T time steps:

12) Maximize: NPV = \sum^T_{t=1} 1.05^{-t} z_t

13) Minimize: s_z = \sqrt{\frac{1}{T-1} \sum^T_{t=1} (z_t - \bar{z})^2}.

As illustrated in the figure below, depending on the values of a and b0, the optimal Pareto sets for each “future” (each with initial populations of x0 = 0.48 and y0 = 0.26) can have very different shapes and attainable values.

Future 1 2 3 4 5
a 1.75 1.75 3.75 3.75 2.75
b0 0.6 0.3 0.6 0.3 0.45

fishfutures

For this MORDM experiment, I first optimize to an assumed state of the world (SOW) in which a = 1.75 and b = 0.6. To do this, I first have to write a function that takes in the decision variables for the optimization problem as well as any potentially uncertain model parameters, and returns the objectives. Here the decision variables are represented by the vector ‘vars’, the uncertain parameters are passed at default values of a=1.75, b0 = 0.6, F = 0 and S = 0.5, and the returned objectives are NPVharvest and std_z.


import os
import math
import json
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
from scipy.optimize import brentq as root
from rhodium import *
from rhodium.config import RhodiumConfig
from platypus import MapEvaluator

RhodiumConfig.default_evaluator = MapEvaluator()

def fishGame(vars,
    a = 1.75, # rate of prey growth
    b0 = 0.6, # initial rate of predator growth
    F = 0, # rate of change of radiative forcing per unit time
    S = 0.5): # climate sensitivity)

    # Objectives are:
    # 1) maximize (average NPV of harvest) and 
    # 2) minimize (average standard deviation of harvest)
    # x = population of prey at time 0 to t
    # y = population of predator at time 0 to t
    # z = harvested prey at time 1 to t

    tSteps = 100
    x = np.zeros(tSteps+1)
    y = np.zeros(tSteps+1)
    z = np.zeros(tSteps)

    # initialize predator and prey populations
    x[0] = 0.48
    y[0] = 0.26

    # Initialize climate parameters
    F0 = 1
    C = 50
    T = 0
    b = max(b0*np.exp(-0.3*T),0.25)

    # find harvest at time t based on policy
    z[0] = harvest(x, 0, vars)

    #Initialize NPV of harvest
    NPVharvest = 0

    for t in range(tSteps):
        x[t+1] = max(a*x[t]*(1-x[t]) - x[t]*y[t] - z[t],0)
        y[t+1] = max(x[t]*y[t]/b,0)
        if t < tSteps-1:
            z[t+1] = harvest(x, t+1, vars)

        NPVharvest = NPVharvest + z[t]*(1+0.05)**(-(t+1))

        #Calculate next temperature and b values
        T = T + (F0 + F*(t+1) - (1/S)*T)/C
        b = max(b0*np.exp(-0.3*T),0.25)

        # Calculate minimization objectives
        std_z = np.std(z)

    return (NPVharvest, std_z)

def harvest(x, t, vars):
    if t > 0:
        harvest = min(vars[0]*vars[1]*x[t] + vars[0]*(1-vars[1])*x[t-1],x[t])
    else:
        harvest = vars[0]*vars[1]*x[t]

    return harvest

Next, the model class must be defined, as well as its parameters, objectives (or “responses”) and whether they need to be minimized or maximized, decision variables (or “levers”) and uncertainties.


model = Model(fishGame)

# define all parameters to the model that we will be studying
model.parameters = [Parameter("vars"), Parameter("a"), Parameter("b0"), Parameter("F"), Parameter("S")]

# define the model outputs
model.responses = [Response("NPVharvest", Response.MAXIMIZE), Response("std_z", Response.MINIMIZE)]

# some parameters are levers that we control via our policy
model.levers = [RealLever("vars", 0.0, 1.0, length=2)]

# some parameters are exogeneous uncertainties, and we want to better
# understand how these uncertainties impact our model and decision making
# process
model.uncertainties = [UniformUncertainty("a", 1.5, 4.0), UniformUncertainty("b0", 0.25, 0.67)]

The model can then be optimized using a multi-objective evolutionary algorithm (MOEA) in Platypus, and the output written to a file. Here I use NSGA-II.


output = optimize(model, "NSGAII", 100)
with open("data.txt", "w") as f:
    json.dump(output, f)

The results can be easily visualized with simple commands. The Pareto sets can be plotted with ‘scatter2D’ or ‘scatter3D’, both of which allow brushing on one or more objective thresholds. Here I first brush on solutions with a NPV of harvest ≥ 1.0, and then add a condition that the standard deviation of harvest be ≤ 0.01.


# Use Seaborn settings for pretty plots
sns.set()

# Plot the points in 2D space
scatter2d(model, output)
plt.show()

# The optional interactive flag will show additional details of each point when
# hovering the mouse
# Most of Rhodiums's plotting functions accept an optional expr argument for
# classifying or highlighting points meeting some condition
scatter2d(model, output, x="NPVharvest", brush=Brush("NPVharvest >= 1.0"))
plt.show()

scatter2d(model, output, brush="NPVharvest >= 1.0 and std_z <= 0.01")
plt.show()

The above code creates the following images:

figure1figure2bfigure2

Rhodium can also plot Kernel density estimates of the solutions, or those attaining certain objective values.


# Kernel density estimation plots show density contours for samples. By
# default, it will show the density of all sampled points
kdeplot(model, output, x="NPVharvest", y="std_z")
plt.show()

# Alternatively, we can show the density of all points meeting one or more
# conditions
kdeplot(model, output, x="NPVharvest", y="std_z", brush=["NPVharvest >= 1.0", "std_z <= 0.01"], alpha=0.8)
plt.show()

figure4figure5

Scatterplots of all pairwise objective combinations can also be plotted, along with histograms of the marginal distribution of each objective illustrated in the pairwise scatterplots. These can also be brushed by objective thresholds specified by the user.


# Pairwise scatter plots shown 2D scatter plots for all outputs
pairs(model, output)
plt.show()

# We can also highlight points meeting one or more conditions
pairs(model, output, brush=["NPVharvest >= 1.0", "std_z <= 0.01"])
plt.show()

# Joint plots show a single pair of parameters in 2D, their distributions using
# histograms, and the Pearson correlation coefficient
joint(model, output, x="NPVharvest", y="std_z")
plt.show()

figure6figure7figure8

Finally, tradeoffs can also be viewed on parallel axes plots, which can also be brushed on user-specified objective values.


# A parallel coordinates plot to view interactions among responses
parallel_coordinates(model, output, colormap="rainbow", zorder="NPVharvest", brush=Brush("NPVharvest > 1.0")) 
plt.show()

figure10

But the real advantage of Rhodium is not visualization but uncertainty analysis. First, PRIM can be used to identify “boxes” best describing solutions meeting user-specified criteria. I define solutions with a NPV of harvest ≥ 1.0 as profitable, and those below unprofitable.


# The remaining figures look better using Matplotlib's default settings
mpl.rcdefaults()

# We can manually construct policies for analysis. A policy is simply a Python
# dict storing key-value pairs, one for each lever.
#policy = {"vars" : [0.02]*2}

# Or select one of our optimization results
policy = output[8]

# construct a specific policy and evaluate it against 1000 states-of-the-world
SOWs = sample_lhs(model, 1000)
results = evaluate(model, update(SOWs, policy))
metric = ["Profitable" if v["NPVharvest"] >= 1.0 else "Unprofitable" for v in results]

# use PRIM to identify the key uncertainties if we require NPVharvest >= 1.0
p = Prim(results, metric, include=model.uncertainties.keys(), coi="Profitable")
box = p.find_box()
box.show_details()
plt.show()

This will first show the smallest box with the greatest density but lowest coverage.

pasting_trajectory_26

Clicking on “Back” will show the next largest box with slightly lower density but greater coverage, while “Next” moves in the opposite direction. In this case, since the smallest box is shown, “Next” moves full circle to the largest box with the lowest density, but greatest coverage, and clicking “Next” from this figure will start reducing the box size.

pasting_trajectory_1

Classification And Regression Trees (CART; Breiman et al., 1984) can also be used to identify hierarchical conditional statements classifying successes and failures in meeting the user-specified criteria.

# use CART to identify the key uncertainties
c = Cart(results, metric, include=model.uncertainties.keys())
c.print_tree(coi="Profitable")
c.show_tree()
plt.show()

figure11

Finally, Dave has wrapped Rhodium around Jon Herman’s SALib for sensitivity analysis. Here’s an example of how to run the Method of Morris.


# Sensitivity analysis using Morris method
print(sa(model, "NPVharvest", policy=policy, method="morris", nsamples=1000, num_levels=4, grid_jump=2))

morrisprintedoutput

You can also create tornado and spider plots from one-at-a-time (OAT) sensitivity analysis.


# oat sensitivity
fig = oat(model, "NPVharvest",policy=policy,nsamples=1000)

oatplot

Finally, you can visualize the output of Sobol sensitivity analysis with bar charts of the first and total order sensitivity indices, or as radial plots showing the interactions between parameters. In these plots the filled circles on each parameter represent their first order sensitivity, the open circles their total sensitivity, and the lines between them the second order indices of the connected parameters. You can even create groups of similar parameters with different colors for easier visual analysis.

Si = sa(model, "NPVharvest", policy=policy, method="sobol", nsamples=1000, calc_second_order=True)
fig1 = Si.plot()
fig2 = Si.plot_sobol(threshold=0.01)
fig3 = Si.plot_sobol(threshold=0.01,groups={"Prey Growth Parameters" : ["a"],
            "Predator Growth Parameters" : ["b0"]})

sobolsi_plotfishgame_radialplotfishgame_radialplot_groups

As you can see, Rhodium makes MORDM analysis very simple! Now if only we could reduce uncertainty…

Works Cited

Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone (1984). Classification and Regression Trees. Wadsworth.

Friedman, J. H. and N. I. Fisher (1999). Bump-hunting for high dimensional data. Statistics and Computing, 9, 123-143.

Hadka, D., Herman, J., Reed, P., and Keller, K. (2015). An open source framework for many-objective robust decision making. Environmental Modelling & Software, 74, 114-129.

Kasprzyk, J. R., S. Nataraj, P. M. Reed, and R. J. Lempert (2013). Many objective robust decision making for complex environmental systems undergoing change. Environmental Modelling & Software, 42, 55-71.

Lempert, R. J. (2003). Shaping the next one hundred years: new methods for quantitative, long-term policy analysis. Rand Corporation.

Announcing version 1.0 of pareto.py

Jon and I are pleased to announce version 1.0 of pareto.py, the free, open-source ε-nondominated sorting utility. Find the ε-nondominated solutions in your data today! ε-nondominated sorting lets you specify the resolution of your objectives to obtain a set of meaningfully distinct efficient points in your data set. (Confused? See this post for more context.)

ε-nondominated sorting

ε-nondominated sorting: Red ε-boxes are dominated, yellow solutions are ε-nondominated.

  • available on GitHub
  • licensed under LGPL
  • pure Python implementation with minimal dependencies (only standard library, does not require numpy)
  • ε-nondominated sorting
  • “importable”: use pareto.pyfrom other Python scripts
  • “pipelineable”: include pareto.py in a Unix pipeline
  • tag-along variables: columns (even non-numeric ones) other than those to be sorted may be retained in the output
  • verbatim transcription of input: nondominated rows in the output appear exactly the same as they did in the input
  • flexible column specification: e.g. 0-5 6 10-7
  • mix minimization and maximization objectives
  • skip header rows, commented rows, and blank lines
  • annotate each row in the output with the file it came from, including stdin
  • add line numbers to annotations
  • index columns from the end of the row, allowing rows of varying lengths to be sorted together

(source code for the impatient)

(previous announcement: version 0.2)

Announcing pareto.py: a free, open-source nondominated sorting utility

Jon Herman and I are pleased to announce version 0.2 of pareto.py, a free, open-source nondominated sorting utility. Released under the LGPL, pareto.py performs a nondominated sort on the data in any number of input files. To the best of our knowledge, this is the only standalone nondominated sorting utility available.

Implemented using only the Python standard library, pareto.py features:

  • epsilon-nondominated sorting
  • “importable” design: can be used from other Python scripts
  • tag-along variables: columns (even non-numeric ones) other than those to be sorted may be retained in the output
  • verbatim transcription of input: nondominated rows in the output appear exactly the same as they did in the input
  • optionally skip header rows, commented rows, and blank lines

We welcome feature requests, bug reports, and pull requests. If you’re in a hurry, here is the source code for version 0.2.

(Edit: bump to version 0.2)

“The Problem” is the Problem Formulation! Definitions and Getting Started

Based on a conversation I had with Eric Simmons, I’d like to make a few comments about some basic terminology you need to think about when starting out in the group.  One of the most important things you can learn how to do when starting out in the group is framing “the problem”.  This requires a set of terms that we’re fairly precise about, so it’s good to know them in detail.  Here goes:

Scope of the Problem

What is it that you are trying to solve?  Let’s use the example of acid mine drainage.  We all know that acid mine drainage is a “problem” per se, but it’s important to define the scope of the problem when putting it in the context of many objective analysis.  Generally speaking, we want to answer:

  • What needs to be done?: In our hypothetical example, we may want to prevent future acid mine drainage from occurring, or remediate some environmental damages that have already taken place.
  • Who is going to do it?: We must think about the decision makers actually making the decision.  For our example, a governmental agency that is funding the work may have a different set of priorities (and a different level of decisions) than an engineering firm hired to carry out the technical work.  Define the stakeholders, those who stand to gain or lose in the process.  Do the decision makers have to keep the interests of the stakeholders in mind? For example, the governmental body that makes a decision about how to fund an acid mine drainage project may have to think about the people who own property around the affected site.
  • What are the limits of the analysis?: In fluid mechanics, we draw a “control volume” around a system, which indicates the area that we’d like to study.  Similarly, are there limitations to what we are allowed to do in the system?  Perhaps the best option is to move everyone out of the affected area, but this may not be feasible.

Analysis Components

The paradigm of many-objective analysis allows us to generate alternatives for a system, evaluate them with respect to multiple performance outcomes, and visualize the results to negotiate a final solution, if needed.  But we’re getting ahead of ourselves!  When formulating a problem, we have to think about the following analysis components:

Decision Variables

Decision variables answer the question: How is it that I am going to do the task at hand?  Let’s think about acid mine drainage.  In our “scope” section we discussed that we might want to prevent the problem, or remediate a problem that has already occurred.  For the purpose of remediation, perhaps we want to design a new system that can pump out the contaminated water, and treat it offsite.

Our actions are encoded in decision variables within our problem.  Decision variables for this problem may be the pumping rate, the location of wells that are used to pump the water, or a treatment chemical that can be added to the water.  Each of these decision variables should be coded in some way — typically as real or integer-valued numbers that can be modified.

Decision variables are also called levers in the policy analysis literature.  It’s a nice way to think of them: if I push a lever, something happens.  In the realm of public policy, a lever may be something like the interest rate.  The federal reserve changes the interest rate in the hopes of affecting the economy in some way — in an engineering system, you may design the diameter of a pipe in a certain way as to influence the performance of your system.

Objectives

We’ve thought about what may change in our system or design, but we need ways of measuring the performance of the system.  In our acid mine drainage system, it could be the cost of the remediation system, the reduction of contaminants over time, and the disturbance to the landowners around the project area.

These evaluation objectives most likely conflict with each other — the “do nothing” strategy has optimal cost and no disturbance around the project area, but it has poor performance as far as reducing contaminants.  In contrast, systems that have great contaminant reduction do so at high costs and high disruption to the landowners.  Multiobjective evolutionary algorithms give us high-quality alternatives that are “nondominated” with respect to multiple objectives — solutions that are better in at least one objective than all other solutions considered.  In our example, it is the lowest cost at each level of contaminant reduction and land disturbance (or conversely, the highest level of contaminant reduction at each cost and land disturbance level).  Finding these objectives is great for the decision makers — he can see good levels of performance, while making a decision about how much to spend or how much disturbance to allow, and so forth.

In the policy analysis literature, objectives are also called “measures” — how do you measure your performance?

Constraints

Often, we want to set a limit on the values of objectives to consider.  For example, there may be a regulatory limit on how poor the performance of a remediation system should be.  Or, we may have other targets that need to be met, such as resource limitations.  These can be encoded as constraints in our problem.

Simulation, or the Relationship Between Actions and Outcomes

Now that we have our analysis components in place, we need to figure out a way to calculate the actual results.  This is something that’s often ignored in great detail in classical formulations.

Let’s think about a linear program for resource allocation.  One might say: Maximize the net profit from making widgets A and B, such that resource limitations are met.  Here, the relationship between “actions” and “outcomes” is simple: a linear equation maps the actions (the number of A and B that are manufactured) with the outcome (the amount of profit made).  There are some clear limitations with this, though:

  • Are there economies of scale? That is, is it cheaper to make the 10,000th unit of A than the 1st unit of A?
  • Are there non-linear effects for making the products? Maybe the workers get tired after making their 15th unit of A, and the quality suffers.
  • The optimal mix of A and B assumes that some resources are “tight” and others have quantities that are unused.  Milan Zeleny explored this in great detail, showing that the optimal mix changes depending on modifying your resource constraints.
  • Classical optimization solvers can only solve problems that have certain properties — i.e., linear, convex, and so forth

A much better approach is to link a trusted simulation of the system to a competent solver such as a multiobjective evolutionary algorithm.  While there isn’t a claim of “global optimality” with these solutions, you can see that the challenges above limit the effectiveness of such solutions in real policy recommendations.  Each design is fully evaluated inside a simulation, and you can trust that the fidelity of the objectives or measures that come out of the system match what the simulation would calculate as the outcomes.

In general, the policy analysis literature would call this a “relationship” between actions and outcomes.  If we don’t have a good simulation model, it may be a survey-based collection of experts’ opinions about the system. Even in situations like this, though, we can still use this data within a search procedure to recommend non-dominated policies or designs.