Re-evaluating solutions using Python subprocesses

Our series of Python examples continues!  If you’ve missed our previous posts, we have some tutorials in parts one and two, tips on setting up Python and Eclipse, cluster submission guides, and so forth.

Matt has been helping me get up to speed using Python.  He always told me that some processes are a lot easier in Python than in C++, and I suppose I didn’t believe him til recently.  Here is my first shot at a Python program that’s all my own, and I wanted to share it with you here.  The code is below, then some comments follow.  The group has recently begun several GitHub code repositories!  You can link to this code sample at my repository.

import re
import os
import sys
import time
from subprocess import Popen
from subprocess import PIPE

LINE_BUFFERED = 1
        
def main():

    #Define the command that you would like to run through the pipe.  This will typically be your
    #executable set up to work with MOEAframework, and associated arguments.  Specifically here
    #we are working with the LRGV problem.
    cmd = ['./lrgvForMOEAFramework',  '-m', 'std-io', '-c', 'combined', '-b', 'AllDecAll']

    #Verify the command
    print "The command to run is: %s" % cmd

    #Use popen to open a child process.  The standard in, out, and error are sent through a pipe.
    child = Popen(cmd, cwd='//home//joka0958//re-evaluator_2013-04-05//', bufsize=LINE_BUFFERED, stdin=PIPE, stdout=PIPE, stderr=PIPE)

    #The current version of the model spits out some lines to the console when it initializes them.
    #When using this python child process, we need to intentionally send and receive all output (i.e. it doesn't
    #automatically do it for us.  Here there are 3 initialization lines to catch:
    print "Reading initializer lines."
    for i in range(0,3):
       line = child.stdout.readline()
       if line:
         print line
       else:
         raise Exception("Evaluator died!")

    #Now we want to step through an existing Borg output file, which already contains decision variables and objectives.
    #We are going to step through each line, read in the decision variables from the file, and then evaluate those decision
    #variables in our external program.
    myFilename = "AllDecAllExperimentData.txt"
    fp = open(myFilename, 'rb')
    for line in fp:
        if "#" in line:
            #This "if" statement is helpful if you want to preserve the generation separators and header lines that appeared
            #in the original file.
            print line
        else:
            #Read in all the variables on the line (this is both decisions and objectives)
            allVariables = [float(xx) for xx in re.split("[ ,\t]", line.strip())]

            #Only keep what you want
            variables = allVariables[0:8]

            #We want to send the decision variables, separated by a space, and terminated by a newline character
            decvarsAsString = '%f %f %f %f %f %f %f %f\n' % (variables[0], variables[1], variables[2], variables[3], variables[4], variables[5], variables[6], variables[7])

            #We send that string to the child process and catch the result.
            print "Sending to process"
            child.stdin.write(decvarsAsString)
            child.stdin.flush() #you flush, so that the program knows the line was sent

            #Now obtain the program's result
            print "Result:"
            outputLine = child.stdout.readline()
            print outputLine

            #Since this is in a loop, it will operate for every line in the input file.

if __name__ == "__main__":
    main()

Basically, your child process gets created in the beginning of this script. It is hanging out, waiting for input. The cool thing about the way input and output is handled here, is that you have complete control over what gets sent to the child program and what gets read in from it. Also, you can send multiple solutions to the program, and read their output sequentially.

Python also gives you lots of great control over input and output formatting in a relatively simple fashion. Notice how we are moving back and forth from strings and floats effortlessly. I also like how the control sequences (for loops, if else statements) are very straightforward and easy to read.

Feel free to adapt this code for your own purposes, and provide comments below!

Many Objective Robust Decision Making (MORDM): Concepts and Methods

This post provides an informal discussion of how to carry out the Many Objective Robust Decision Making (MORDM) procedure. The blog post was written by Jon Herman and Joe Kasprzyk. For the journal article describing MORDM, please click here.

Introduction

Numerical simulations of engineered systems define the relationship between decisions (inputs) and some measures of performance (objective values). The relationship between decisions and performance often depends on exogenous factors beyond the control of the decision maker, e.g., climate, economic variables, etc., which are liable to be highly uncertain. When such models account for uncertainty, they typically do so by calculating the expected value of performance under well-characterized probability distributions. They do not, however, account for deep uncertainty, where decision makers do not agree on the full set of risks to a system or their associated probabilities [1,2]. Robust Decision Making (RDM) is designed to address this challenge by identifying sets of decisions that perform well across a range of assumptions on deeply uncertain variables (i.e., decisions that are robust to uncertain states of the world).

This is an important distinction: by measuring performance across uncertain states of the world, RDM avoids the common problem of assigning probabilities to these outcomes. Instead, decision makers can explore which scenarios lead to vulnerabilities, and then determine a posteriori how likely these outcomes might be. Thus, RDM can shed light on two key questions:

  • Which deeply uncertain variables (and combinations thereof) are most responsible for changes in performance?
  • Which candidate solutions are most robust to these uncertain variables?

In our research, we have combined concepts from RDM and many objective analysis to propose a new framework, Many Objective Robust Decision Making (MORDM). The MORDM process consists of four main steps: (1) Problem formulation, (2) Generating alternatives, (3) uncertainty analysis, and (4) Scenario discovery and tradeoff analysis [3,4,5].

1. Problem Formulation

A “problem” in the context of RDM is defined by: exogenous uncertain variables, decision variables, a simulation model, and objective values. Following [6], these can be described with the acronym XLRM: uncertainties (X), decisions or “levers” (L), relationship between decisions and performance (R), and measures of performance (M).

Many of the existing applications that use the tools discussed on this blog will already have decision variables (levers), measures of performance, and a quantitative relationship or simulation. The new concept for creating MORDM analyses of these problems will be to identify a set of uncertain variables (X) that will collectively account for the primary exogenous sources of uncertainty in the system. The idea is to convert these concepts from the realm of deep uncertainty (i.e., stakeholders cannot agree on the full range of risks to the system) to a set of quantitative variables (creating an ensemble of feasible “states of the world” that describe uncertainties).

No two models will have the same set of uncertain variables, but here are some helpful guidelines:

  • Does the model contain variables that reflect future change? Is it possible that these values will be different than currently projected?
  • Does the model contain assumptions about the current state of the world that may not be correct? Many assumptions in the model will be well-defined from data, but others will likely be more suspect. It is worth exploring what impact these assumptions have on performance.
  • Are there any variables omitted in the current state of the world but which could become relevant?

Again, this is not a definitive list; your set of alternatives will be specific to your application. Once you have a set of XLRM values defined, you can start the next step.

2. Generating Alternatives

Alternatives are sets of model simulations (decisions and performance measures) of interest in the base state of the world. These are the solutions that will be subjected to the sources of uncertainty, X, defined above (this occurs later in Step #3). Different approaches exist for generating alternatives. Bryant and Lempert (2010) [7] propose a Latin hypercube sample over the decision variable space. Kasprzyk et al. (2013) [8] propose using a set of Pareto-approximate solutions found using a multi-objective evolutionary algorithm (MOEA) in an extension known as Many-Objective RDM. The MORDM approach confers several advantages: it allows the analysis of multiple performance objectives, and it ensures that decision makers are starting from a set of the best known solutions in the base state of the world. That is, the decision makers will be exploring the uncertainties associated with solutions that they would be likely to choose in the absence of RDM analysis.

To generate alternatives using the MORDM approach, you will need to perform a multi-objective optimization on your problem. This has been covered in more detail elsewhere, but here are some links to get started. For software, see MOEAFramework and Borg; for documentation about these, see here, here, and here.

3. Uncertainty Analysis

Uncertainty analysis involves running the set of alternatives generated above through a range of states of the world defined by the deeply uncertain variables (X). These states of the world can be generated, for example, with a Latin hypercube sample of the uncertain variables. The following Bash example shows how to generate such a sample using MOEAFramework:

#!/bin/bash

JAVA_ARGS="-Xmx256m -classpath MOEAFramework-1.16-Executable.jar"
NUM_SAMPLES=10000
METHOD=latin
RANGES_FILENAME=RDMFactors.txt
OUTPUT_FILENAME=RDMSamples.txt
CSV_FILENAME=RDMSamples.csv

java ${JAVA_ARGS} org.moeaframework.analysis.sensitivity.SampleGenerator -m ${METHOD} -n ${NUM_SAMPLES} -p ${RANGES_FILENAME} -o ${OUTPUT_FILENAME}

# The default output is space-separated. Convert to comma-separated file as follows: (optional)
sed 's/ /,/g' ${OUTPUT_FILENAME} > ${CSV_FILENAME}

This example generates 10,000 Latin hypercube samples of the variables defined in RDMFactors.txt, which contains the name, lower, and upper bound for each variable, like so:

Inflows 0.8 1.2
Evaporation 0.8 1.2
...

The uncertain variables should be explored over reasonable ranges of values, but should not be restricted to only those scenarios considered “possible”. By the definition of deep uncertainty, these variables are likely to encounter scenarios previously considered impossible, so it is valuable to run the RDM analysis even in extreme scenarios. Remember, we’re running a series of “What-If” experiments, not trying to determine the most likely future scenario.

There is no requirement for how many samples to generate. The more uncertain variables you have, the more samples you will want to run to get good coverage of the space. The sample size used here (10,000) provides reasonably good coverage for experiments on the order of tens of variables.

Once you’ve generated your set of uncertain states of the world (stored in RDMSamples.txt above), run each alternative solution for the entire ensemble of states of the world. For example, if you generated 100 alternatives in Step #2, and an ensemble of 10,000 states of the world in this step, you will need to perform 100 * 10,000 = 1 million model evaluations. This will be trivial for some models, and impossible for others—adjust accordingly. Some model-specific modifications will be required to perform these evaluations. You’ll need to read in the variable values from RDMSamples.txt, and the decision variables defined for your set of alternatives, and make sure these are assigned properly within the model. Depending on the complexity of your model, you may also need to get access to a computing cluster.

These model evaluations should output the performance measures calculated for each solution in each state of the world. Again, depending on the size of your experiment and the number of performance measures, this may be quite a bit of data. Make sure you save these somehow, either in files or a database, for the next step.

4. Scenario Discovery and Tradeoff Analysis

With our alternatives evaluated across all sampled states of the world, it’s now possible to address the two questions posed at the top of this post. First, which deeply uncertain variables, and combinations thereof, are most responsible for changes in performance? And second, which candidate solutions are most robust to these changes, and what visualization techniques can we use to identify them?

The first question can be answered using the process of scenario discovery [9,10], where clustering analyses are used to find combinations of uncertain variables that best predict a particular outcome defined in terms of performance measure thresholds. The outcome defined by these thresholds can be either good or bad, but typically it will reflect a critical vulnerability in the system. Following Kasprzyk et al. (2013), the MORDM approach allows these thresholds to be defined in terms of multiple objectives. Lempert et al. (2008) [11] compared different clustering approaches and favored the Patient Rule Induction Method (PRIM, [12])  for its ease-of-use and interactivity. PRIM works by identifying a subsection of the space of uncertain variables in which the performance thresholds are likely to be crossed. It returns which uncertainties are most likely to contribute to these vulnerabilities and, importantly, at which values this is likely to occur. An implementation of PRIM in the R language is freely available (Bryant, 2009).

The second question—the selection of a robust solution—is a highly interactive process and thus cannot follow a concrete set of steps. Particularly in the case of MORDM, identifying a robust solution strongly depends on the ability to visualize data in multiple dimensions (see Kasprzyk et al., 2013 for examples). Ideally, a robust solution will have good performance in the base state of the world, as well as minimal deviation from that performance across the ensemble of sampled states of the world. It is not uncommon for the solutions with the best performance in the base state of the world to be vulnerable to deviation otherwise, as this represents overfitting to the base state without considering deep uncertainties. The outcome of this analysis will be model-specific, however. Some uncertain variables may not affect performance at all, while others may have major impacts.

This has been a high-level overview of the concepts and methods related to RDM. For in-depth studies and example figures, please refer to the references below. Thanks for reading!

References:

[1] Knight, F.H. 1921. Risk, Uncertainty, and Profit. Houghton Mifflin, Boston, MA.

[2] Lempert, R.J. 2002. A new decision sciences for complex systems. Proceedings of the National Academy of Sciences 99, 7309-7313.

[3] Ibid.

[4] Bryant, B.P., Lempert, R.J., 2010. Thinking inside the box: a participatory, computer-assisted approach to scenario discovery. Technological Forecasting and Social Change 77, 34-49.

[5] Joseph R. Kasprzyk, Shanthi Nataraj, Patrick M. Reed, Robert J. Lempert, Many objective robust decision making for complex environmental systems undergoing change, Environmental Modelling & Software, Volume 42, April 2013, Pages 55-71, ISSN 1364-8152, 10.1016/j.envsoft.2012.12.007.

[6] Lempert, R.J., Popper, S.W., Bankes, S.C., 2003. Shaping the Next One Hundred Years: New Methods for Quantitative, Long-term Policy Analysis. RAND, Santa Monica, CA.

[7] Bryant and Lempert, 2010.

[8] Kasprzyk et al. (2013)

[9] Lempert, R.J., Bryant, B.P., Bankes, S.C., 2008. Comparing algorithms for scenario discovery. Technical Report WR-557-NSF. RAND.

[10] Lempert, R.J., 2012. Scenarios that illuminate vulnerabilities and robust responses. Climatic Change.

[11] Lempert et al., 2008.

[12] Friedman, J.H, Fisher, N., 1999. Bump hunting in high-dimensional data. Statistics and Computing 9, 123-143.