MORDM VII: Optimality, robustness, and reevaluation under deep uncertainty

In the previous MORDM post, we visualized the reference set of performance objectives for the North Carolina Research Triangle and conducted a preliminary multi-criterion robustness analysis using two criteria: (1) regional reliability should be at least 98%, and (2) regional restriction frequency should be not more than 20%. Using these metrics, we found that Pareto-optimality does not guarantee satisfactory robustness, a statement that is justified by showing that not all portfolios within the reference set satisfied the robustness criteria.

In this post, we will explain the differences between optimality and robustness, and justify the importance of robust optimization instead of sole reliance on a set of optimal solutions (aka an optimal portfolio). To demonstrate the differences better, we will also reevaluate the Pareto optimal set of solutions under a more challenging set of states of the world (SOWs), a method first used in Herman et al (2013, 2014) and Zeff et al (2014). The formal term for this method, Deeply Uncertain (DU) Re-evaluation was coined in a 2019 paper by Trindade et al.

Optimality vs robustness

The descriptions of optimality are many. From a purely technical perspective, a Pareto optimal set is a set of decision variables or solutions that maps to a Pareto front, or a set of performance objectives where improving one objective cannot be improved without causing performance degradation in another. For the purposes of this blog post, we shall use the definition of optimality as laid out by Beyer and Sendoff in their 2007 paper:

The global optimal design…depends on the…(objective) functions and constraints…however, these functions always represent models and/or approximations of the real world.

Beyer and Sendhoff (2007)

In other words, a Pareto reference set is only optimal within the bounds of the model it was generated from. This makes sense; models are only approximations of the real world. It is difficult and computationally expensive to have bounds on the degree of certainty to which the model optimum maps to the true optimum due to uncertainties driven by human action, natural variability, and incomplete knowledge. Optimization is static in relation to reality – the set of solutions found do not change with time, and only account for the conditions within the model itself. Any deviation from this set of solutions or unaccounted differences between the actual system and model may result in failure (Herman et al, 2015; Read et al 2014).

This is why searching the set of optimal solutions for robust solutions is important. Herman et al (2015) quotes an earlier works by Matalas and Fiering (1977) that defines robustness as the insensitivity a system’s portfolio to uncertainty. Within the MORDM context, robustness was found to be best defined using the multi-criterion satisficing robustness measure (Herman et al, 2015), which refers to the ability of a solution to meet one or more requirements (or criteria) set by the decision-makers when evaluated under a set of challenging scenarios. More information on alternative robustness measures can be found here.

In this blog post, we will begin to explore this concept of robustness by conducting DU Re-evaluation, where we will perform the following steps:

Generate a set of ROF tables from a more challenging set of SOWs

Recall that we previously stored our Pareto-optimal solution set in a .csv file names ‘NC_refset.csv’ (find the original Git Repository here). Now, we will write a quick Python script (called rof_tables_reeval.py in the Git Repository) using MPI4PY that will parallelize and speed up the ROF table generation and the bash script to submit the job. More information on parallelization using MPI4PY can be found in this handy blog post by Dave Gold.

First, create a Python virtual environment within the folder where all your sourcecode is kept and activate the virtual environment. I called mine python3_venv:

python3 -m venv python_venv
source python_venv/bin/activate

Next, install the numpy and mpi4py libraries:

pip install numpy mpi4py

Then write the Python script as follows:

# -*- coding: utf-8 -*-
"""
Created on Tues March 1 2022 16:16
@author: Lillian Bei Jia Lau
"""

from mpi4py import MPI
import numpy as np
import subprocess, sys, time
import os

# 5 nodes, 50 RDMs per node
# 16 tasks per node
# 5 RDMs per task
comm = MPI.COMM_WORLD
rank = comm.Get_rank() # up to 20 processes
print('rank = ', rank)

N_RDMs_needed = 100
N_REALIZATIONS = 100
N_RDM_PER_NODE = 20
N_TASKS_PER_NODE = 10 
N_RDM_PER_TASK = 2 # each task handles two RDMs
N_TASKS = 50 # rank ranges from 0 to 50

DATA_DIR = "/scratch/lbl59/blog/WaterPaths/"
SOLS_FILE_NAME = "NC_dvs_all_noheader.csv"
N_SOLS = 1

OMP_NUM_THREADS = 32

for i in range(N_RDM_PER_TASK):
    current_RDM = rank + (N_TASKS * i)

    command_gen_tables = "./waterpaths -T {} -t 2344 -r {} -d {} -C 1 -O rof_tables_reeval/rdm_{} -e 0 \
            -U TestFiles/rdm_utilities_test_problem_reeval.csv \
            -W TestFiles/rdm_water_sources_test_problem_reeval.csv \
            -P TestFiles/rdm_dmp_test_problem_reeval.csv \
            -s {} -f 0 -l {} -R {}\
            -p false".format(OMP_NUM_THREADS, N_REALIZATIONS, DATA_DIR, current_RDM, SOLS_FILE_NAME, N_SOLS, current_RDM)

    print(command_gen_tables)
    os.system(command_gen_tables)

comm.Barrier()

Before proceeding, a quick explanation on what all this means:

  • Line 12: We are parallelizing this job across 5 nodes on Cornell’s THECUBE (The Cube) computing cluster.
  • Lines 19-28: On each node, we are submitting 10 tasks to each of the 5 nodes requested. Each task, in turn, is handling 2 robust decision-making (RDM) multiplier files that scale up or scale down a hydroclimatic realization make a state of the world more (or less) challenging. In this submission, we are creating 400 different variations of each hydroclimatic scenario using the 400 RDM files, and running it across only one solution
  • Line 16 and 31: The ‘rank’ is the order of the tasks in which there are submitted. Since there are 10 tasks over 5 nodes, there will be a total of 50 tasks being submitted. Note and understand how the current_RDM is calculated.
  • Lines 32 to 37: This is the command that you are going to submit to The Cube. Note the -C and -O flags; a value of 1 for the -C flag tells WaterPaths to generate ROF tables, and the -O tells WaterPaths to output each ROF table file into a folder within rof_tables_reeval/rdm_{}for each RDM. Feel free to change the filenames as you see fit.

To accompany this script, first create the following folders: output, out_reeval, and rof_tables_reeval. The output folder will contain the simulation results from running the 1 solution across the 100 hydroclimatic realizations. The out_reeval folder will store any output or error messages such as script runtime.

Then, write the following bash submission script:

#!/bin/bash
#SBATCH -n 50 -N 5 -p normal
#SBATCH --job-name=rof_tables_reeval
#SBATCH --output=out_reeval/rof_tables_reeval.out
#SBATCH --error=out_reeval/rof_tables_reeval.err
#SBATCH --time=200:00:00
#SBATCH --mail-user=lbl59@cornell.edu
#SBATCH --mail-type=all

export OMP_NUM_THREADS=32

module load openmpi3/3.1.4
module spider py3-mpi4py
module spider py3-numpy/1.15.3

START="$(date +%s)"

mpirun python3 rof_tables_reeval.py

DURATION=$[ $(date +%s) - ${START} ]

echo ${DURATION}

You can find the bash script under the filename rof_table_gen_reeval.sh. Finally, submit the script using the following line:

sbatch ./rof_table_gen_reeval.sh

The run should take roughly 5 hours. We’re good for some time!

Re-evaluate your solutions (and possibly your life choices, while you’re at it)

Once the ROF tables are generated, it’s time to get a little hands-on with the underlying WaterPaths code. Navigate to the following PaperTestProblem.cpp file using:
cd /yourfilepath/WaterPaths/src/Problem/

Follow the next steps carefully.

  1. Delete PaperTestProblem.cpp and replace it with the file PaperTestProblem-reeval.cpp. It can be found in the the main Git Repository.
  2. Rename the latter file PaperTestProblem.cpp – it will be the new PaperTestProblem file that will be able to individually read each RDM scenario’s ROF tables.
  3. Re-make WaterPaths by calling make clean and then make gcc in the command line. This will ensure that WaterPaths has no problems running the new PaperTestProblem.cpp file.

Next, write the following Python script (called run_du_reeval.py in the Git repository):

# -*- coding: utf-8 -*-
"""
Created on Tues March 1 2022 16:16

@author: Lillian Bei Jia Lau
"""

from mpi4py import MPI
import numpy as np
import subprocess, sys, time
import os

N_RDMs_needed = 100  # N_TASKS_PER_NODE * N_RDM_PER_TASK * num nodes
N_REALIZATIONS = 100
N_RDM_PER_NODE = 20
N_TASKS_PER_NODE = 10 # rank ranges from 0 to 15
N_RDM_PER_TASK = 2 # each task handles five RDMs
N_TASKS = 50

comm = MPI.COMM_WORLD
rank = comm.Get_rank() # up to 20 processes
print('rank = ', rank)

DATA_DIR = "/scratch/lbl59/blog/WaterPaths/"
SOLS_FILE_NAME = "NC_dvs_all_noheader.csv"
N_SOLS = 69
OMP_NUM_THREADS = 32

for i in range(N_RDM_PER_TASK):
    current_RDM = rank + (N_TASKS * i)

    command_run_rdm = "./waterpaths -T {} -t 2344 -r {} -d {} -C -1 -O rof_tables_reeval/rdm_{} -e 0 \
            -U TestFiles/rdm_utilities_test_problem_reeval.csv \
            -W TestFiles/rdm_water_sources_test_problem_reeval.csv \
            -P TestFiles/rdm_dmp_test_problem_reeval.csv \
            -s {} -R {} -f 0 -l 69\
            -p false".format(OMP_NUM_THREADS, N_REALIZATIONS, DATA_DIR, \
                    current_RDM , SOLS_FILE_NAME, current_RDM)

    print(command_run_rdm)
    os.system(command_run_rdm)

comm.Barrier()

Note the change in the -C flag; its value is now -1, telling WaterPaths that it should import the ROF table values from the folder indicated by the -O flag. The resulting objective values for each RDM will be saved in the output folder we previously made.

The accompanying bash script, named du_reeval.sh is as follows:

#!/bin/bash
#SBATCH -n 50 -N 5 -p normal
#SBATCH --job-name=mordm_training_du_reeval
#SBATCH --output=out_reeval/mordm_training_du_reeval.out
#SBATCH --error=out_reeval/mordm_training_du_reeval.err
#SBATCH --time=200:00:00
#SBATCH --mail-user=lbl59@cornell.edu
#SBATCH --mail-type=all

export OMP_NUM_THREADS=32
module load openmpi3/3.1.4
module spider py3-numpy/1.15.3

START="$(date +%s)"

mpirun python3 run_du_reeval.py

DURATION=$[ $(date +%s) - ${START} ]

echo ${DURATION}

This run should take approximately three to four days. After that, you will have 1000 files containing 69 objective value sets resulting from running the 69 solutions across 1000 deeply-uncertain states of the world.

Summary

In this post, we defined optimality and robustness. We demonstrated how to run a DU re-evaluation across 100 challenging SOWs to observe how these ‘optimal’ solutions perform in more extreme scenarios. This is done to show that optimality is bound by current model states, and any deviation from the expected circumstances as defined by the model may lead to degradations in performance.

In the next blog post, we will be visualizing these changes in performance using a combination of sensitivity analysis, scenario discovery, and tradeoff analysis.

References

Beyer, H. and Sendhoff, B., 2007. Robust optimization – A comprehensive survey. Computer Methods in Applied Mechanics and Engineering, 196(33-34), pp.3190-3218.

Herman, J., Reed, P., Zeff, H. and Characklis, G., 2015. How Should Robustness Be Defined for Water Systems Planning under Change?. Journal of Water Resources Planning and Management, 141(10), p.04015012.

Herman, J., Zeff, H., Reed, P. and Characklis, G., 2014. Beyond optimality: Multistakeholder robustness tradeoffs for regional water portfolio planning under deep uncertainty. Water Resources Research, 50(10), pp.7692-7713.

Matalas, N. C., and Fiering, M. B. (1977). “Water-resource systems planning.” Climate, climatic change, and water supply, studies in geophysics, National Academy of Sciences, Washington, DC, 99–110.

Read, L., Madani, K. and Inanloo, B., 2014. Optimality versus stability in water resource allocation. Journal of Environmental Management, 133, pp.343-354.

Zeff, H., Kasprzyk, J., Herman, J., Reed, P. and Characklis, G., 2014. Navigating financial and supply reliability tradeoffs in regional drought management portfolios. Water Resources Research, 50(6), pp.4906-4923.

Debug in Real-time on SLURM

Debugging a code by submitting jobs to a supercomputer is an inefficient process. It goes something like this:

  1. Submit job and wait in queue
  2. Check for errors/change code
  3. (repeat endlessly until your code works)

Debugging in Real-Time:

There’s a better way to debug that doesn’t require waiting for the queue every time you want to check your code. On SLURM, you can debug in real-time like so:
  1. Request a debugging or interactive node and wait in queue
  2. Check for errors/change code continuously until code is fixed or node has timed out

Example (using Summit supercomputer at University of Colorado Boulder):

  1. Log into terminal (PuTTY, Cygwin, etc.)
  2. Navigate to directory where the file to be debugged is located using ‘cd’ command
  3. Load SLURM
    • $module load slurm
  4. Enter the ‘sinteractive’ command
    • $sinteractive
  5. Wait in line for permission to use the node (you will have a high priority with a debugging QOS so it shouldn’t take long)
  6. Once you are granted permission, the node is yours! Now you can debug to your hearts content (or until you run out of time).
I’m usually debugging shell scripts on Unix. If you want advice on that topic check out this link. I prefer the ‘-x’ command (shown below) but there are many options available.
Debugging shell scripts in Unix using ‘-x’ command: 
 $bash -x mybashscript.bash
Hopefully this was helpful! Please feel free to edit/comment/improve as you see fit.

Some ideas for your Bash submission scripts

I’ve been playing around with some design options for PBS submission scripts that may help people doing cluster work.  Some things to look for in the source code:

  • You can use a list in bash that contains multiple text entries, and then access those text entries to create strings for your submissions.  Note that you can actually display the text first (see the ‘echo ${PBS}’) before you do anything; that way you aren’t requesting thousands of jobs that have a typo in them!
  • Using “read” allows the bash programmer to interact with the user.  Well, in reality you are usually both the programmer and the user.  But lots of times, I want to write a script and try it out first, before I submit hundreds of hours of time on the cluster.  The flags below can help with that process.
  • I added commands to compile the source code before actually submitting the jobs.  Plus, by using flags and pauses intelligently, you can bail out of the script if there’s a problem with compilation.
#!/bin/bash
NODES=32
WALLHOURS=5

PROBLEMS=("ProblemA" "ProblemB")
NSEEDS=10
SEEDS=$(seq 1 ${NSEEDS}) #note there are multiple ways to declare lists and sequences in bash

NFES=1000000
echo "NFEs is ${NFES}" #echo statements can improve usability of the script, especially if you're modifying it a lot for various trials

ASSUMEPERMISSIONFLAG=No #This is for pausing the submission script later

echo "Compile? Y or N."

read COMPILEFLAG
if [ "$COMPILEFLAG" = "Y" ]; then
    echo "Cleaning.."
    make clean -f MakefileParallel
    echo "Compiling.."
    make -f MakefileParallel
else
        echo "Not compiling."
fi

for PROBINDEX in ${!PROBLEMS[*]}
do
    PROBLEM=${PROBLEMS[$PROBINDEX]} #note the syntax to pull a list member out here
    echo "Problem is ${PROBLEM}"

    for SEED in ${SEEDS}
    do
        NAME=${PROBLEM}_${SEED} #Bash is really nice for manipulating strings like this
        echo "Submitting: ${NAME}"

        #Here is the actual PBS command, with bash variables used in place of different experimental parameters.  Note the use of getopt-style command line parsing to pass different arguments into the myProgram executable.  This implementation is also designed for parallel processing, but it can also be used for serial jobs too.

        PBS="#PBS -l nodes=32\n\
        #PBS -N ${NAME}\n\
        #PBS -l walltime=05:00:00\n\
        #PBS -j oe\n\
        #PBS -o ${NAME}.out\n\
        cd \$PBS_O_WORKDIR\n\
        module load openmpi/intel\n\
        mpirun ./myProgram -b ${PROBLEM} -c combined -f ${NFES} -s ${SEED}"

        #The first echo shows the user what is about to be passed to PBS.  The second echo then pipes it to the command qsub, and actually submits the job.

        echo ${PBS}

        if [ "$ASSUMEPERMISSIONFLAG" = "No" ]; then

            echo "Continue submitting? Y or N."

            read SUBMITFLAG

            #Here, the code is designed to just keep going after the user says Y once.  You can redesign this for your own purposes.  Also note that this code is fairly brittle in that the user MUST say Y, not y or yes.  You can build that functionality into the if statements if you'd like it.

            if [ "$SUBMITFLAG" = "Y" ]; then
                 ASSUMEPERMISSIONFLAG=Yes #this way, the user won't be asked again
                 echo -e ${PBS} | qsub
                 sleep 0.5
                 echo "done."
            fi
        else
            echo -e ${PBS} | qsub
            sleep 0.5
            echo "done."
         fi
    done
done

Using linux “cut”

The following code takes a file that has 16 columns and outputs a file with 5 of those columns.  Some notes:

  • Don’t use PATH as a variable.  The program won’t work, because PATH is a system variable!
  • Note the C++ style syntax of the loop.  Versions of bash greater than 3.0 allow you to use curly brackets, like: for i in {1..50}.  But when you want to use variables inside the range, you have to do something else, such as my example.  Others are discussed here.
  • The star of this script is the ‘cut’ command.  d tells what delimiter you’d like.  f tells what fields you want to cut.
  • Then there are some simple commands around cut.  ‘cat’ displays the contents of the file.  Then, the | operator pipes the output of cat into the next command, which is cut.  Finally, you then use the > operator to direct the output of this command into a new file.
  • Save this file on the cluster or a Linux system as myFileNameHere.sh.  Then, to run the code, simply type “sh myFileNameHere.sh”

#!/bin/bash

# Cut out only the objective function values from the CBorg output files.

MYPATH=./output/
INPUT_NAME_BASE=CBorg_LRGV_
OUTPUT_NAME_ADDENDUM=_ObjOnly
EXTENSION=.out
START_COLUMN=9
FINISH_COLUMN=13
NUM_SEEDS=50

echo "Beginning..."
for ((I=1; I<=$NUM_SEEDS; I++));
do
 echo "Processing $I"
 cat ${MYPATH}${INPUT_NAME_BASE}${I}${EXTENSION} | cut -d ' ' -f ${START_COLUMN}-${FINISH_COLUMN} > ${MYPATH}${INPUT_NAME_BASE}${I}${OUTPUT_NAME_ADDENDUM}${EXTENSION}
done
echo "Totally done."