Simple Bash shell scripts that have made my life easier

I’ve recently been using Bash shell scripts to improve the efficiency of my workflow when working on Linux systems and I thought I would share some of them here. I’m fairly new to Linux so this post is not meant to be a comprehensive guide on how to write shell scripts rather, I hope the scripts in this post can serve as examples for those who may also be learning Linux and unsure of where or how to start writing shell scripts. I didn’t write any of these from scratch, most of the scripts are based off files shared with me by group members Julie Quinn, Bernardo Trindade and Jazmin Zatarian Salazar. If you’re interested in learning more about any of the commands used in these scripts I’ve put some references I found useful at the end of this post. If you’re more experienced in writing shell scripts, please feel free to put tips or suggestions in the comments.

1. A simple script for making directories

For my research I’m processing results of a monte carlo simulation for several solutions found through multi-objective search and I needed to make folders in several locations to store the output from each solution. My first instinct was to make each directory separately using the mkdir command in the command line, but this quickly got tedious. Instead I used a bash script to loop through all the solution numbers and create a new directory for each. For more on using loops in Bash, check out this reference.

#!/bin/bash#!/bin/bash

# This script will create directories named "Solution_*.txt" for
# a set of numbered solutions 

# specify solution numbers
SOLUTIONS=('162' '1077' '1713' '1725' '1939' '2191' '2290' '2360')

# create a variable to store the string "Solution_"
DIRECTORY="Solution_" 

# loop over solution numbers
for i in ${SOLUTIONS[@]}
do
# create a separate directory for each solution
mkdir $DIRECTORY${i}
done

2. Calling a Java function and saving the output

The MOEA framework is a tool written in Java with all sorts of cool functions. I used it to generate 1024 latin hypercube samples across a given range for each of the 8 solutions mentioned above. Using a shell script allows for you to easily set up the arguments needed for the MOEA framework, call the Java function and save the output to your desired file format. The MOEA framework’s tool spits out a .txt file, but this script uses the “sed” command to save it as a .csv file. More on “sed” can be found in the reference at the end of this post.

#!/bin/bash#!/bin/bash
# this shell script will call the MOEA framework's Latin Hypercube
# Sampling tool to create 1024 samples from a set of
# prespecified ranges for each of 8 solutions

# create variables to store Java arguments
JAVA_ARGS="-Xmx1g -classpath MOEAFramework-1.16-Executable.jar"
NUM_SAMPLES=1024
METHOD=latin

# these are the solutions we will create samples from
SOLUTIONS=('162' '1077' '1713' '1725' '1939' '2191' '2290' '2360')

# loop through solutions
for i in ${SOLUTIONS[@]}
do
    # define names for input (ranges) and output file names
    RANGES_FILENAME=${i}ranges.txt
    OUTPUT_FILENAME=Solution${i}_Samples.txt
    CSV_FILENAME=Solution${i}_Samples.csv

    # Call MOEA framework from JAVA using specified arguments to
    # create LHS Samples, specify OUTPUT_FILENAME as output
    java ${JAVA_ARGS} org.moeaframework.analysis.sensitivity.SampleGenerator -m ${METHOD} -n ${NUM_SAMPLES} -p ${RANGES_FILENAME} -o ${OUTPUT_FILENAME}

    # Use the sed command tocreate new comma separated values file
    # from original output .txt file
    sed 's/ /,/g' ${OUTPUT_FILENAME} > ${CSV_FILENAME} 

    # remove .txt files
    rm $OUTPUT_FILENAME
done

3. A piping example

Piping allows you to link together programs by making the output from one program or function the input to another. The script below was originally written by my friend Shrutarshi Basu for a class project we were working on together. This script is made to process the output from the Borg MOEA for 9 random seeds of the DTLZ2 benchmarking problem across several different algorithmic configurations, seen in the code as “masters” (for more on this see Jazmin’s post here). In addition to calling Java tools from the MOEAframework, Basu uses piping to link the Linux commands “tac”, “sed”, “grep” and “cut”.  For more on each of these commands, see the links at the bottom of this post.


# loop over each of 9 seeds
for i in {0..9}
do
obj=DTLZ2_S${i}.obj
output=dtlz2.volume

# loop over masters
for m in $(seq 0 $1)
do
runtime=DTLZ2_S${i}_M${m}.runtime
mobj=DTLZ2_S${i}_M${m}.obj

# extract objectives from output
echo "Extracting objectives"
tac ${runtime} | sed -n '1,/\/\// p' | grep -v "//" | cut -d' ' -f15-19 | tac > ${mobj};
done

# combine objectives into one file
echo "Combining objectives"
java -cp ../../moea.jar org.moeaframework.analysis.sensitivity.ResultFileSeedMerger \
-d 5 -e 0.01,0.01,0.01,0.01,0.01 \
-o ${obj} DTLZ2_S${i}_M*.obj

# calculate the hypervolume
echo "Finding final hypervolume"
hvol=$(java -cp ../../moea.jar HypervolumeEval ${obj})

printf "%s %s\n" "$i" "$hvol" >> ${output}
echo "Done with seed $i"
done

Additional References and Links

 

Advertisements

Introduction to Docker

In this post we’ll learn the principles of Docker, and how to use Docker with large quantities of data in input / output.

1. What is Docker?

Docker is a way to build virtual machines from a file called the Docker file. That virtual machine can be built anywhere with the help of that Docker file, which makes Docker a great way to port models and the architecture that is used to run them (e.g., the Cube: yes, the Cube can be ported in that way, with the right Docker file, even though that is not the topic of this post). Building it creates an image (a file), and a container is a running instance of that image, where one can log on and work. By definition, containers are transient and removing does not affect the image.

2. Basic Docker commands

This part assumes that we already have a working Docker file. A docker file runs a series of instructions to build the container we want to work in.

To build a container for the WBM model from a Docker file, let us go to the folder where the Docker file is and enter:

docker build -t myimage -f Dockerfile .

The call docker build means that we want to run a Docker file; -t means that we name, or “tag” our image, here by giving it the name of “myimage”; -f specifies which Docker file we are using, in case there are several in the current folder, and “.” says that we run the Docker file and build the container in the current folder. Options -t and -f are optional in theory, but the tag -t is very important as it gives a name to your built image. If we don’t do that, we’ll have to go through the whole build every time we want to run a Docker container from the Docker file. This would waste a lot of time.

Once the Docker image is built, we can run it. In other words, have a virtual machine running on the computer / cluster / cloud where we are working. To do that, we enter:

docker run -dit myimage

The three options are as follows: -d means that we do not directly enter the container, and instead have it running in the background, while the call returns the containers hexadecimal ID. -i means that we keep the standard input open. Finally, -t is our tag, which is the name of the docker image (here, “myimage”).

We can now check that the image is running by listing all the running images with:

docker ps

In particular, this lists displays a list of hexadecimal IDs associated to each running image. After that, we can enter the container by typing:

 docker exec -i -t hexadecimalID /bin/bash 

where -i is the same as before, but -t now refers to the hexadecimal ID of the tagged image (that we retrieved with docker ps). The second argument /bin/bash simply sets the directory of the shell in a standard way.

Once in the container, we can run all the processes we want. Once we are ready to exit the container, we can exit it by typing… exit.

Once outside of the container, we can re-enter it as long as it still runs. If we want it to stop running, we use the following command to “kill” it (not my choice of words!):

 docker kill hexadecimalID 

A short cut to calling all these commands in succession is to use the following version of docker run:

 docker run -it myimage /bin/bash 

This command logs us onto the image as if we had typed run and exec at the same time (using the shell /bin/bash). Note that option -d is not used in this call. Also note that upon typing exit, we will not only exit the container, but also kill the running Docker image. This means that we don’t have to retrieve its hexadecimalID to log on to the image, nor to kill it.

Even if the container is not running any more, it can be re-started and re-entered by retrieving its hexadecimal ID. The docker ps command only lists running containers, so to list all the containers, including those that are no longer running, we type:

 docker ps -a

We can then restart and re-enter the container with the following commands:


docker restart hexadecimalID

docker exec -it hexadecimalID /bin/bash

Note the absence of options for docker restart. Once we are truly done with a container, it can be removed from the lists of previously running containers by using:

 docker rm hexadecimalID 

Note that you can only remove a container that is not running.

3. Working with large input / output data sets.

Building large quantities of data directly into the container when calling docker build has three major drawbacks. First, building the docker image will take much more time because we will need to transfer all that data every time we call docker build. This will waste a lot of time if we are tinkering with the structure of our container and are running the Docker file several times. Second, every container will take up a lot of space on the disk, which can prove problematic if we are not careful and have many containers for the same image (it is so easy to run new containers!). Third, output data will be generated within the container and will need to be copied to another place while still in the container.

An elegant workaround is to “mount” input and output directories to the container, by calling these folders with the -v option as we use the docker run command:

 docker run -it -v path/to/inputs -v path/to/outputs myimage /bin/bash 

or

 docker run -dit -v path/to/inputs -v path/to/outputs myimage 

The -v option is abbreviation for “volume”. This way, the inputs and outputs directories (set on the same host as the container) are used directly by the Docker image. If new outputs are produced, they can be added directly to the mounted output directory, and that data will be kept in that directory when exiting / killing the container. It is also worth noting that we don’t need to call -v again if we restart the container after killing it.

A side issue with Docker is how to manage user permissions on the outputs a container produces, but 1) that issue arises whether or not we use the -v option, and 2) this is a tale for another post.

Acknowledgements: thanks to Julie Quinn and Bernardo Trindade from this research group, who started exploring Docker right before me, making it that much easier for me to get started. Thanks also to the Cornell-based IT support of the Aristotle cloud, Bennet Wineholt and Brandon Baker.

 

 

 

 

 

 

 

Enhance your (Windows) remote terminal experience with MobaXterm

Jazmin and Julie recently introduced me to a helpful program for Windows called “MobaXterm” that has significantly sped up my workflow when running remotely on the Cube (our cluster here at Cornell). MobaXterm bills itself as an “all in one” toolbox for remote computing. The program’s interface includes a terminal window as well as a graphical SFTP browser. You can link the terminal to the SFTP browser so that as you move through folders on the terminal the browser follows you. The SFTP browser allows you to view and edit files using your text editor of choice on your windows desktop, a feature that I find quite helpful for making quick edits to shell scripts or pieces of code as go.

mobaxtermsnip

A screenshot of the MobaXterm interface. The graphical SFTP browser is on the left, while the terminal is on the right (note the checked box in the center of the left panel that links the browser to the terminal window).

 

You can set up a remote Cube session using MobaXterm with the following steps:

  1. Download MobaXterm using this link
  2.  Follow the installation instructions
  3. Open MobaXterm and select the “Session” icon in the upper left corner.
  4. In the session popup window, select a new SSH session in the upper left, enter “thecube.cac@cornell.edu” as the name of the remote host and enter your username.
  5. When the session opens, check the box below the SFTP browser on the left to link the browser to your terminal
  6. Run your stuff!

Note that for a Linux system, you can simply link your file browser window to your terminal window and get the same functionality as MobaXterm. MobaXterm is not available for Mac, but Cyberduck and Filezilla are decent alternatives. An alternative graphical SFTP browser for Windows is WinSCP, though I prefer MobaXterm because of its linked terminal/SFTP interface.

For those new to remote computing, ssh or UNIX commands in general, I’d recommend checking out the following posts to get familiar with running on a remote cluster:

 

 

 

Debug in Real-time on SLURM

Debugging a code by submitting jobs to a supercomputer is an inefficient process. It goes something like this:

  1. Submit job and wait in queue
  2. Check for errors/change code
  3. (repeat endlessly until your code works)

Debugging in Real-Time:

There’s a better way to debug that doesn’t require waiting for the queue every time you want to check your code. On SLURM, you can debug in real-time like so:
  1. Request a debugging or interactive node and wait in queue
  2. Check for errors/change code continuously until code is fixed or node has timed out

Example (using Summit supercomputer at University of Colorado Boulder):

  1. Log into terminal (PuTTY, Cygwin, etc.)
  2. Navigate to directory where the file to be debugged is located using ‘cd’ command
  3. Load SLURM
    • $module load slurm
  4. Enter the ‘sinteractive’ command
    • $sinteractive
  5. Wait in line for permission to use the node (you will have a high priority with a debugging QOS so it shouldn’t take long)
  6. Once you are granted permission, the node is yours! Now you can debug to your hearts content (or until you run out of time).
I’m usually debugging shell scripts on Unix. If you want advice on that topic check out this link. I prefer the ‘-x’ command (shown below) but there are many options available.
Debugging shell scripts in Unix using ‘-x’ command: 
 $bash -x mybashscript.bash
Hopefully this was helpful! Please feel free to edit/comment/improve as you see fit.

Algorithm Diagnostics Walkthrough using the Lake Problem as an example (Part 3 of 3: Metrics-based analysis of algorithm performance)

Now that you have your desired metrics based on part 2 of this series, it is possible to gain more insight into your algorithm performance. When I performed this analysis for the actual study, I used the AWRAnalysis.java, Analysis_Attainment_LakeProblem.sh and HypervolumeEval.java files found in the Github repository as explained in the README. I later discovered it was possible to do this within the framework, so that option will be discussed here.

It is possible to calculate the hypervolume of a Pareto Approximate Front within the framework using the SetHypervolume class. For more information on the MOEAFramework classes, please see the following link (http://moeaframework.org/javadoc/index.html).

I used the following command: (Note the change to version 2.3 because I reran this command today to check I remembered it correctly although it seems there is now a version 2.4. It is always best to use the newest version.)


java –cp MOEAFramework-2.3-Demo.jar org.moeaframework.analysis.sensitivity.SetHypervolume myLake4ObjStoch.reference –e 0.01,0.01,0.0001,0.0001 myLake4ObjStoch.reference

This returns a hypervolume value between 0 and 1 that is useful for threshold calculations as shown below.

To calculate threshold attainments, use the Analysis class. Below is an example of performing attainment analysis within the framework instead of using AWRAnalysis.java.  This approach generates a huge number of files, which are best understood when plotted, a subject for a future post.


#!/bin/bash
#source setup_LTM.sh

dim=4
problem=myLake4ObjStoch
epsilon="0.01,0.01,0.0001,0.0001"

algorithms="Borg eMOEA eNSGAII GDE3 MOEAD NSGAII"
seeds="1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50"
percentiles="`seq 1 1 100`"
thresholds=(`seq 0.01 0.01 1.0`)

#compute averages across metrics
#echo "Computing averages across metrics..."
#for algorithm in ${algorithms}
#do
# echo "Working on: " ${algorithm}
# java -classpath `cygpath -wp $CLASSPATH` org.moeaframework.analysis.sensitivity.MetricFileStatistics --mode average --output $WORK/metrics/${algorithm}_${problem}.average $WORK/metrics/${algorithm}_${problem}_*.metrics
#done
#echo "Done!"

#compute search control metrics (for best and attainment)
echo "Computing hypervolume search control metrics..."
for algorithm in ${algorithms}
do
 echo "Working on: " ${algorithm}
 counter=$1
 for percentile in ${percentiles}
 do
 java -classpath MOEAFramework-2.3-Demo.jar org.moeaframework.analysis.sensitivity.Analysis --parameterFile ./${algorithm}_params.txt --parameters ./${algorithm}_Latin --metric 0 --threshold ${thresholds[$counter]} --hypervolume 0.7986 ./SOW6/metrics/average_replace_NaNs/${algorithm}_${problem}.average > ./test/Hypervolume_${percentile}_${algorithm}.txt
 counter=$((counter+1))
 done
 done
echo "Done!"

echo "Computing generational distance search control metrics..."
for algorithm in ${algorithms}
do
 echo "Working on: " ${algorithm}
 counter=$1
 for percentile in ${percentiles}
 do
 java -classpath MOEAFramework-2.3-Demo.jar org.moeaframework.analysis.sensitivity.Analysis --parameterFile ./${algorithm}_params.txt --parameters ./${algorithm}_Latin --metric 1 --threshold ${thresholds[$counter]} ./SOW6/metrics/average_replace_NaNs/${algorithm}_${problem}.average > ./test/GenDist_${percentile}_${algorithm}.txt
 counter=$((counter+1))
 done
done
echo "Done!"

echo "Computing additive epsilon indicator search control metrics..."
for algorithm in ${algorithms}
do
 echo "Working on: " ${algorithm}
 counter=$1
 for percentile in ${percentiles}
 do
 java -classpath MOEAFramework-2.3-Demo.jar org.moeaframework.analysis.sensitivity.Analysis --parameterFile ./${algorithm}_params.txt --parameters ./${algorithm}_Latin --metric 4 --threshold ${thresholds[$counter]} ./SOW6/metrics/average_replace_NaNs/${algorithm}_${problem}.average > ./test/EpsInd_${percentile}_${algorithm}.txt
 counter=$((counter+1))
 done
done
echo "Done!"

I did encounter some caveats while working through this process. Scripts for handling them and instructions are provided in the Diagnostic-Source README on Github. One caveat that is not covered there is increasing the speed of the hypervolume calculation. Please see Dave Hadka’s Hypervolume repository for more information (https://github.com/dhadka/Hypervolume).

Algorithm Diagnostics Walkthrough using the Lake Problem as an example (Part 2 of 3: Calculate metrics for Analysis) Tori Ward

This post continues from Part 1, which provided examples of using the MOEAFramework to generate Pareto approximate fronts for a comparative diagnostic study.

Once one has finished generating all of the approximate fronts and respective reference sets one hopes to analyze, metrics may be calculated within the MOEAFramework. I calculated metrics for both my local reference sets and all of my individual approximations of the Pareto front. The metrics for the individual approximations were averaged for each parameterization across all seeds to determine the expected performance for a single seed.

Calculate Metrics

Local Reference Set Metrics

#!/bin/bash

NSAMPLES=50
NSEEDS=50
METHOD=Latin
PROBLEM=myLake4ObjStoch
ALGORITHMS=( NSGAII GDE3 eNSGAII MOEAD eMOEA Borg)

SEEDS=$(seq 1 ${NSEEDS})
JAVA_ARGS="-cp MOEAFramework-2.1-Demo.jar"
set -e

for ALGORITHM in ${ALGORITHMS[@]}
do
NAME=${ALGORITHM}_${PROBLEM}
PBS="\
#PBS -N ${NAME}\n\
#PBS -l nodes=1\n\
#PBS -l walltime=96:00:00\n\
#PBS -o output/${NAME}\n\
#PBS -e error/${NAME}\n\
cd \$PBS_O_WORKDIR\n\
java ${JAVA_ARGS} \
org.moeaframework.analysis.sensitivity.ResultFileEvaluator \
--b ${PROBLEM} --i ./SOW4/${ALGORITHM}_${PROBLEM}.reference \
--r ./SOW4/reference/${PROBLEM}.reference --o ./SOW4/${ALGORITHM}_${PROBLEM}.localref.metrics"
echo -e $PBS | qsub
done

Individual Set Metrics

#!/bin/bash

NSAMPLES=50
NSEEDS=50
METHOD=Latin
PROBLEM=myLake4ObjStoch
ALGORITHMS=( NSGAII GDE3 eNSGAII MOEAD eMOEA Borg)

SEEDS=$(seq 1 ${NSEEDS})
JAVA_ARGS="-cp MOEAFramework-2.1-Demo.jar"
set -e

for ALGORITHM in ${ALGORITHMS[@]}
do
for SEED in ${SEEDS}
do
NAME=${ALGORITHM}_${PROBLEM}_${SEED}
PBS="\
#PBS -N ${NAME}\n\
#PBS -l nodes=1\n\
#PBS -l walltime=96:00:00\n\
#PBS -o output/${NAME}\n\
#PBS -e error/${NAME}\n\
cd \$PBS_O_WORKDIR\n\
java ${JAVA_ARGS} \
org.moeaframework.analysis.sensitivity.ResultFileEvaluator \
--b ${PROBLEM} --i ./SOW4/sets/${ALGORITHM}_${PROBLEM}_${SEED}.set \
--r ./SOW4/reference/${PROBLEM}.reference --o ./SOW4/metrics/${ALGORITHM}_${PROBLEM}_${SEED}.metrics"
echo -e $PBS | qsub
done
done

Average Individual Set Metrics across seeds for each parameterization

#!/bin/bash
#PBS -l nodes=1:ppn=1
#PBS -N moeaevaluations
#PBS -j oe
#PBS -l walltime=96:00:00

cd "$PBS_O_WORKDIR"

NSAMPLES=50
NSEEDS=50
METHOD=Latin
PROBLEM=myLake4ObjStoch
ALGORITHMS=( NSGAII GDE3 eNSGAII MOEAD eMOEA Borg)

SEEDS=$(seq 1 ${NSEEDS})
JAVA_ARGS="-cp MOEAFramework-2.1-Demo.jar"
set -e

# Average the performance metrics across all seeds
for ALGORITHM in ${ALGORITHMS[@]}
do
echo -n "Averaging performance metrics for ${ALGORITHM}..."
java ${JAVA_ARGS} \
org.moeaframework.analysis.sensitivity.SimpleStatistics \
-m average --ignore -o ./metrics/${ALGORITHM}_${PROBLEM}.average ./metrics/${ALGORITHM}_${PROBLEM}_*.metrics
echo "done."
done

At the end of this script, I also calculated the set contribution I mentioned earlier by including the following lines.

# Calculate set contribution
echo ""
echo "Set contribution:"
java ${JAVA_ARGS} org.moeaframework.analysis.sensitivity.SetContribution \
-e 0.01,0.01,0.001,0.01 -r ./reference/${PROBLEM}.reference ./reference/*_${PROBLEM}.combined

Part 3 covers using the MOEAFramework for further analysis of these metrics.

Algorithm Diagnostics Walkthrough using the Lake Problem as an example (Part 1 of 3: Generate Pareto approximate fronts)

This three part series is an overview of the algorithm diagnostics I performed in my Lake Problem study with the hope that readers may apply the steps to any problem of interest. All of the source code for my study, including the scripts used for the diagnostics can be found at https://github.com/VictoriaLynn/Lake-Problem-Diagnostics.

The first step to using the MOEAFramework for comparative algorithm diagnostics was to create the simulation model on which I would be assessing algorithm performance. The Lake Problem was written in C++. The executable alone could be used for optimization with Borg and I created a java stub to connect the problem to the MOEAFramework. (https://github.com/VictoriaLynn/Lake-Problem-Diagnostics/blob/master/Diagnostic-Source/myLake4ObjStoch.java).  Additional information on this aspect of a comparative study can be found in examples 4 and 5 for the MOEAFramework (http://moeaframework.org/examples.html) and in Chapter 5 of the manual. I completed the study using version 2.1, which was the newest at the time. I used the all in one executable instead of the source code although I compiled my simulation code within the examples subfolder of the source code.

Once I had developed an appropriate simulation model to represent my problem, I could begin the diagnostic component of my study. I first chose algorithms of interest and determined the range of parameters from which I would like to sample. To determine parameter ranges, I consulted Table 1 of the 2013 AWR article by Reed et al.

Reed, P., et al. Evolutionary Multiobjective Optimization in Water Resources: The Past, Present, and Future. (Editor Invited Submission to the 35th Anniversary Special Issue), Advances in Water Resources, 51:438-456, 2013.

Example parameter files and the ones I used for my study can be found at https://github.com/VictoriaLynn/Lake-Problem-Diagnostics/tree/master/Diagnostic-Source/params. Once I had established parameter files for sampling, I found chapter 8 of the MOEAFramework manual to be incredibly useful.  Below I walk through the steps I took in generating approximations of the Pareto optimal front for my problem across multiple seeds, algorithms, and parameterizations.   All of the commands have been consolidated into the file Lake_Problem_Comparative_Study.sh on Github, but I had many separate files during my study, which will be separated into steps here. It may have been possible to automate the whole process, but I liked breaking it up into separate scripts to make sure I checked that the output made sense after each step.

Step 1: Generate Parameter Samples To generate parameter samples for each algorithm, I used the following code, which I kept in a file called sample_parameters.sh. I ran all .sh scripts using the general command sh script_name.sh.

NSAMPLES=500
METHOD=Latin
PROBLEM=myLake4ObjStoch
ALGORITHMS=(Borg MOEAD eMOEA NSGAII eNSGAII GDE3)
JAVA_ARGS="-cp MOEAFramework-2.1-Demo.jar"

# Generate the parameter samples
echo -n "Generating parameter samples..."
for ALGORITHM in ${ALGORITHMS[@]}
do
java ${JAVA_ARGS} \
org.moeaframework.analysis.sensitivity.SampleGenerator \
--method ${METHOD} --n ${NSAMPLES} --p ${ALGORITHM}_params.txt \
--o ${ALGORITHM}_${METHOD}
done

Step 2: Optimize the problem using algorithms of interest This step had two parts: optimization with Borg and optimization with the MOEAFramework algorithms. To optimize using Borg, one needs to request Borg at http://borgmoea.org/. This is the only step that needs to be completed outside of the MOEAFramework. I then used the following script to generate approximations to the Pareto front for all 500 samples and 50 random seeds. The –l and –u flags indicate upper and lower bounds for decision variable values. Fortunately, it should soon be possible to type one value and specify the number of variables with that bound instead of typing all 100 values as shown here.

#!/bin/bash
#50 random seeds

NSEEDS=50
PROBLEM=myLake4ObjStoch
ALGORITHM=Borg

SEEDS=$(seq 1 ${NSEEDS})

for SEED in ${SEEDS}
do
NAME=${ALGORITHM}_${PROBLEM}_${SEED}
PBS="\
#PBS -N ${NAME}\n\
#PBS -l nodes=1\n\
#PBS -l walltime=96:00:00\n\
#PBS -o output/${NAME}\n\
#PBS -e error/${NAME}\n\
cd \$PBS_O_WORKDIR\n\
./BorgExec -v 100 -o 4 -c 1 \
-l 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 \
-u 0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1 \
-e 0.01,0.01,0.0001,0.0001 -p Borg_params.txt -i Borg_Latin -s ${SEED} -f ./sets/${ALGORITHM}_${PROBLEM}_${SEED}.set -- ./LakeProblem4obj_control "
echo -e $PBS | qsub
done

Optimization with the MOEAFramework allowed me to submit jobs for all remaining algorithms and seeds with one script as shown below. In my study, I actually submitted epsilon dominance algorithms (included –e flag) and point dominance algorithms (did not include –e flag) separately; however, it is my understanding that it would have been fine to submit jobs for all algorithms with the epsilon flag, especially since I converted all point dominance approximations to the Pareto front to epsilon dominance when generating reference sets.


#!/bin/bash

NSEEDS=50
PROBLEM=myLake4ObjStoch
ALGORITHMS=(MOEAD GDE3 NSGAII eNSGAII eMOEA)

SEEDS=$(seq 1 ${NSEEDS})
JAVA_ARGS="-cp MOEAFramework-2.1-Demo.jar"
set -e

for ALGORITHM in ${ALGORITHMS[@]}
do
for SEED in ${SEEDS}
do
NAME=${ALGORITHM}_${PROBLEM}_${SEED}
PBS="\
#PBS -N ${NAME}\n\
#PBS -l nodes=1\n\
#PBS -l walltime=96:00:00\n\
#PBS -o output/${NAME}\n\
#PBS -e error/${NAME}\n\
cd \$PBS_O_WORKDIR\n\
java ${JAVA_ARGS}
org.moeaframework.analysis.sensitivity.Evaluator -p
${ALGORITHM}_params.txt -i ${ALGORITHM}_Latin -b ${PROBLEM}
-a ${ALGORITHM} -e 0.01,0.01,0.0001,0.0001 -s ${SEED} -o ./sets/${NAME}.set"
echo -e $PBS | qsub
done

done

Step 3: Generate combined approximation set for each algorithm and Global reference set Next, I generated a reference set for each algorithm’s performance. This was useful as it made it easier to generate the global reference set for all algorithms across all seeds and parameterizations and it allowed me to calculate a percent contribution for each algorithm to the global reference set. Below is the script for the algorithm reference sets:

#!/bin/bash

NSAMPLES=50
NSEEDS=50
METHOD=Latin
PROBLEM=myLake4ObjStoch
ALGORITHMS=( NSGAII GDE3 eNSGAII MOEAD eMOEA Borg)

JAVA_ARGS="-cp MOEAFramework-2.1-Demo.jar"
set -e

# Generate the combined approximation sets for each algorithm
for ALGORITHM in ${ALGORITHMS[@]}
do
echo -n "Generating combined approximation set for
${ALGORITHM}..."
java ${JAVA_ARGS} \
org.moeaframework.analysis.sensitivity.ResultFileMerger \
-b ${PROBLEM} -e 0.01,0.01,0.0001,0.0001 -o ./SOW4/reference/${ALGORITHM}_${PROBLEM}.combined \
./SOW4/sets/${ALGORITHM}_${PROBLEM}_*.set
echo "done."
done

In the same file, I added the following lines to generate the global reference set while running the same script.
# Generate the reference set from all combined approximation sets
echo -n "Generating reference set..."
java ${JAVA_ARGS} org.moeaframework.util.ReferenceSetMerger \
-e 0.01,0.01,0.0001,0.0001 -o ./SOW4/reference/${PROBLEM}.reference ./SOW4/reference/*_${PROBLEM}.combined > /dev/null
echo "done."

If one wants to keep the decision variables associated with the reference set solutions, it is possible to use org.moeaframework.analysis.sensitivity.ResultFileMerger on all of the pertinent .set files.

A final option for reference sets is to generate local reference sets for each parameterization of each algorithm. This was done with the following script:

#!/bin/bash
NSEEDS=50
ALGORITHMS=( GDE3 eMOEA Borg NSGAII eNSGAII MOEAD)
PROBLEM=myLake4ObjStoch

SEEDS=$(seq 1 ${NSEEDS})

# Evaluate all algorithms for all seeds
for ALGORITHM in ${ALGORITHMS[@]}
do
java -cp MOEAFramework-2.1-Demo.jar org.moeaframework.analysis.sensitivity.ResultFileSeedMerger -d 4 -e 0.01,0.01,0.0001,0.0001 \
--output ./SOW4/${ALGORITHM}_${PROBLEM}.reference ./SOW4/objs/${ALGORITHM}_${PROBLEM}*.obj
done

Part 2 of this post walks through my calculation of metrics.