Plotting Code added to Lake Problem Diagnostics Repository

The Reed group Matlab scripts I used to generate attainment and control maps after performing algorithm diagnostics are now on Github (https://github.com/VictoriaLynn/Lake-Problem-Diagnostics/tree/master/Plotting-Code). I tried to provide comments in the code wherever others may need to make changes to use it on their results. Otherwise, it should be possible to use them as is.

I used Illustrator to improve the Matlab figures. See Jon’s post for tips on using Illustrator.

Advertisements

Algorithm Diagnostics Walkthrough using the Lake Problem as an example (Part 3 of 3: Metrics-based analysis of algorithm performance)

Now that you have your desired metrics based on part 2 of this series, it is possible to gain more insight into your algorithm performance. When I performed this analysis for the actual study, I used the AWRAnalysis.java, Analysis_Attainment_LakeProblem.sh and HypervolumeEval.java files found in the Github repository as explained in the README. I later discovered it was possible to do this within the framework, so that option will be discussed here.

It is possible to calculate the hypervolume of a Pareto Approximate Front within the framework using the SetHypervolume class. For more information on the MOEAFramework classes, please see the following link (http://moeaframework.org/javadoc/index.html).

I used the following command: (Note the change to version 2.3 because I reran this command today to check I remembered it correctly although it seems there is now a version 2.4. It is always best to use the newest version.)


java –cp MOEAFramework-2.3-Demo.jar org.moeaframework.analysis.sensitivity.SetHypervolume myLake4ObjStoch.reference –e 0.01,0.01,0.0001,0.0001 myLake4ObjStoch.reference

This returns a hypervolume value between 0 and 1 that is useful for threshold calculations as shown below.

To calculate threshold attainments, use the Analysis class. Below is an example of performing attainment analysis within the framework instead of using AWRAnalysis.java.  This approach generates a huge number of files, which are best understood when plotted, a subject for a future post.


#!/bin/bash
#source setup_LTM.sh

dim=4
problem=myLake4ObjStoch
epsilon="0.01,0.01,0.0001,0.0001"

algorithms="Borg eMOEA eNSGAII GDE3 MOEAD NSGAII"
seeds="1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50"
percentiles="`seq 1 1 100`"
thresholds=(`seq 0.01 0.01 1.0`)

#compute averages across metrics
#echo "Computing averages across metrics..."
#for algorithm in ${algorithms}
#do
# echo "Working on: " ${algorithm}
# java -classpath `cygpath -wp $CLASSPATH` org.moeaframework.analysis.sensitivity.MetricFileStatistics --mode average --output $WORK/metrics/${algorithm}_${problem}.average $WORK/metrics/${algorithm}_${problem}_*.metrics
#done
#echo "Done!"

#compute search control metrics (for best and attainment)
echo "Computing hypervolume search control metrics..."
for algorithm in ${algorithms}
do
 echo "Working on: " ${algorithm}
 counter=$1
 for percentile in ${percentiles}
 do
 java -classpath MOEAFramework-2.3-Demo.jar org.moeaframework.analysis.sensitivity.Analysis --parameterFile ./${algorithm}_params.txt --parameters ./${algorithm}_Latin --metric 0 --threshold ${thresholds[$counter]} --hypervolume 0.7986 ./SOW6/metrics/average_replace_NaNs/${algorithm}_${problem}.average > ./test/Hypervolume_${percentile}_${algorithm}.txt
 counter=$((counter+1))
 done
 done
echo "Done!"

echo "Computing generational distance search control metrics..."
for algorithm in ${algorithms}
do
 echo "Working on: " ${algorithm}
 counter=$1
 for percentile in ${percentiles}
 do
 java -classpath MOEAFramework-2.3-Demo.jar org.moeaframework.analysis.sensitivity.Analysis --parameterFile ./${algorithm}_params.txt --parameters ./${algorithm}_Latin --metric 1 --threshold ${thresholds[$counter]} ./SOW6/metrics/average_replace_NaNs/${algorithm}_${problem}.average > ./test/GenDist_${percentile}_${algorithm}.txt
 counter=$((counter+1))
 done
done
echo "Done!"

echo "Computing additive epsilon indicator search control metrics..."
for algorithm in ${algorithms}
do
 echo "Working on: " ${algorithm}
 counter=$1
 for percentile in ${percentiles}
 do
 java -classpath MOEAFramework-2.3-Demo.jar org.moeaframework.analysis.sensitivity.Analysis --parameterFile ./${algorithm}_params.txt --parameters ./${algorithm}_Latin --metric 4 --threshold ${thresholds[$counter]} ./SOW6/metrics/average_replace_NaNs/${algorithm}_${problem}.average > ./test/EpsInd_${percentile}_${algorithm}.txt
 counter=$((counter+1))
 done
done
echo "Done!"

I did encounter some caveats while working through this process. Scripts for handling them and instructions are provided in the Diagnostic-Source README on Github. One caveat that is not covered there is increasing the speed of the hypervolume calculation. Please see Dave Hadka’s Hypervolume repository for more information (https://github.com/dhadka/Hypervolume).

Algorithm Diagnostics Walkthrough using the Lake Problem as an example (Part 2 of 3: Calculate metrics for Analysis) Tori Ward

This post continues from Part 1, which provided examples of using the MOEAFramework to generate Pareto approximate fronts for a comparative diagnostic study.

Once one has finished generating all of the approximate fronts and respective reference sets one hopes to analyze, metrics may be calculated within the MOEAFramework. I calculated metrics for both my local reference sets and all of my individual approximations of the Pareto front. The metrics for the individual approximations were averaged for each parameterization across all seeds to determine the expected performance for a single seed.

Calculate Metrics

Local Reference Set Metrics

#!/bin/bash

NSAMPLES=50
NSEEDS=50
METHOD=Latin
PROBLEM=myLake4ObjStoch
ALGORITHMS=( NSGAII GDE3 eNSGAII MOEAD eMOEA Borg)

SEEDS=$(seq 1 ${NSEEDS})
JAVA_ARGS="-cp MOEAFramework-2.1-Demo.jar"
set -e

for ALGORITHM in ${ALGORITHMS[@]}
do
NAME=${ALGORITHM}_${PROBLEM}
PBS="\
#PBS -N ${NAME}\n\
#PBS -l nodes=1\n\
#PBS -l walltime=96:00:00\n\
#PBS -o output/${NAME}\n\
#PBS -e error/${NAME}\n\
cd \$PBS_O_WORKDIR\n\
java ${JAVA_ARGS} \
org.moeaframework.analysis.sensitivity.ResultFileEvaluator \
--b ${PROBLEM} --i ./SOW4/${ALGORITHM}_${PROBLEM}.reference \
--r ./SOW4/reference/${PROBLEM}.reference --o ./SOW4/${ALGORITHM}_${PROBLEM}.localref.metrics"
echo -e $PBS | qsub
done

Individual Set Metrics

#!/bin/bash

NSAMPLES=50
NSEEDS=50
METHOD=Latin
PROBLEM=myLake4ObjStoch
ALGORITHMS=( NSGAII GDE3 eNSGAII MOEAD eMOEA Borg)

SEEDS=$(seq 1 ${NSEEDS})
JAVA_ARGS="-cp MOEAFramework-2.1-Demo.jar"
set -e

for ALGORITHM in ${ALGORITHMS[@]}
do
for SEED in ${SEEDS}
do
NAME=${ALGORITHM}_${PROBLEM}_${SEED}
PBS="\
#PBS -N ${NAME}\n\
#PBS -l nodes=1\n\
#PBS -l walltime=96:00:00\n\
#PBS -o output/${NAME}\n\
#PBS -e error/${NAME}\n\
cd \$PBS_O_WORKDIR\n\
java ${JAVA_ARGS} \
org.moeaframework.analysis.sensitivity.ResultFileEvaluator \
--b ${PROBLEM} --i ./SOW4/sets/${ALGORITHM}_${PROBLEM}_${SEED}.set \
--r ./SOW4/reference/${PROBLEM}.reference --o ./SOW4/metrics/${ALGORITHM}_${PROBLEM}_${SEED}.metrics"
echo -e $PBS | qsub
done
done

Average Individual Set Metrics across seeds for each parameterization

#!/bin/bash
#PBS -l nodes=1:ppn=1
#PBS -N moeaevaluations
#PBS -j oe
#PBS -l walltime=96:00:00

cd "$PBS_O_WORKDIR"

NSAMPLES=50
NSEEDS=50
METHOD=Latin
PROBLEM=myLake4ObjStoch
ALGORITHMS=( NSGAII GDE3 eNSGAII MOEAD eMOEA Borg)

SEEDS=$(seq 1 ${NSEEDS})
JAVA_ARGS="-cp MOEAFramework-2.1-Demo.jar"
set -e

# Average the performance metrics across all seeds
for ALGORITHM in ${ALGORITHMS[@]}
do
echo -n "Averaging performance metrics for ${ALGORITHM}..."
java ${JAVA_ARGS} \
org.moeaframework.analysis.sensitivity.SimpleStatistics \
-m average --ignore -o ./metrics/${ALGORITHM}_${PROBLEM}.average ./metrics/${ALGORITHM}_${PROBLEM}_*.metrics
echo "done."
done

At the end of this script, I also calculated the set contribution I mentioned earlier by including the following lines.

# Calculate set contribution
echo ""
echo "Set contribution:"
java ${JAVA_ARGS} org.moeaframework.analysis.sensitivity.SetContribution \
-e 0.01,0.01,0.001,0.01 -r ./reference/${PROBLEM}.reference ./reference/*_${PROBLEM}.combined

Part 3 covers using the MOEAFramework for further analysis of these metrics.

Algorithm Diagnostics Walkthrough using the Lake Problem as an example (Part 1 of 3: Generate Pareto approximate fronts)

This three part series is an overview of the algorithm diagnostics I performed in my Lake Problem study with the hope that readers may apply the steps to any problem of interest. All of the source code for my study, including the scripts used for the diagnostics can be found at https://github.com/VictoriaLynn/Lake-Problem-Diagnostics.

The first step to using the MOEAFramework for comparative algorithm diagnostics was to create the simulation model on which I would be assessing algorithm performance. The Lake Problem was written in C++. The executable alone could be used for optimization with Borg and I created a java stub to connect the problem to the MOEAFramework. (https://github.com/VictoriaLynn/Lake-Problem-Diagnostics/blob/master/Diagnostic-Source/myLake4ObjStoch.java).  Additional information on this aspect of a comparative study can be found in examples 4 and 5 for the MOEAFramework (http://moeaframework.org/examples.html) and in Chapter 5 of the manual. I completed the study using version 2.1, which was the newest at the time. I used the all in one executable instead of the source code although I compiled my simulation code within the examples subfolder of the source code.

Once I had developed an appropriate simulation model to represent my problem, I could begin the diagnostic component of my study. I first chose algorithms of interest and determined the range of parameters from which I would like to sample. To determine parameter ranges, I consulted Table 1 of the 2013 AWR article by Reed et al.

Reed, P., et al. Evolutionary Multiobjective Optimization in Water Resources: The Past, Present, and Future. (Editor Invited Submission to the 35th Anniversary Special Issue), Advances in Water Resources, 51:438-456, 2013.

Example parameter files and the ones I used for my study can be found at https://github.com/VictoriaLynn/Lake-Problem-Diagnostics/tree/master/Diagnostic-Source/params. Once I had established parameter files for sampling, I found chapter 8 of the MOEAFramework manual to be incredibly useful.  Below I walk through the steps I took in generating approximations of the Pareto optimal front for my problem across multiple seeds, algorithms, and parameterizations.   All of the commands have been consolidated into the file Lake_Problem_Comparative_Study.sh on Github, but I had many separate files during my study, which will be separated into steps here. It may have been possible to automate the whole process, but I liked breaking it up into separate scripts to make sure I checked that the output made sense after each step.

Step 1: Generate Parameter Samples To generate parameter samples for each algorithm, I used the following code, which I kept in a file called sample_parameters.sh. I ran all .sh scripts using the general command sh script_name.sh.

NSAMPLES=500
METHOD=Latin
PROBLEM=myLake4ObjStoch
ALGORITHMS=(Borg MOEAD eMOEA NSGAII eNSGAII GDE3)
JAVA_ARGS="-cp MOEAFramework-2.1-Demo.jar"

# Generate the parameter samples
echo -n "Generating parameter samples..."
for ALGORITHM in ${ALGORITHMS[@]}
do
java ${JAVA_ARGS} \
org.moeaframework.analysis.sensitivity.SampleGenerator \
--method ${METHOD} --n ${NSAMPLES} --p ${ALGORITHM}_params.txt \
--o ${ALGORITHM}_${METHOD}
done

Step 2: Optimize the problem using algorithms of interest This step had two parts: optimization with Borg and optimization with the MOEAFramework algorithms. To optimize using Borg, one needs to request Borg at http://borgmoea.org/. This is the only step that needs to be completed outside of the MOEAFramework. I then used the following script to generate approximations to the Pareto front for all 500 samples and 50 random seeds. The –l and –u flags indicate upper and lower bounds for decision variable values. Fortunately, it should soon be possible to type one value and specify the number of variables with that bound instead of typing all 100 values as shown here.

#!/bin/bash
#50 random seeds

NSEEDS=50
PROBLEM=myLake4ObjStoch
ALGORITHM=Borg

SEEDS=$(seq 1 ${NSEEDS})

for SEED in ${SEEDS}
do
NAME=${ALGORITHM}_${PROBLEM}_${SEED}
PBS="\
#PBS -N ${NAME}\n\
#PBS -l nodes=1\n\
#PBS -l walltime=96:00:00\n\
#PBS -o output/${NAME}\n\
#PBS -e error/${NAME}\n\
cd \$PBS_O_WORKDIR\n\
./BorgExec -v 100 -o 4 -c 1 \
-l 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 \
-u 0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1 \
-e 0.01,0.01,0.0001,0.0001 -p Borg_params.txt -i Borg_Latin -s ${SEED} -f ./sets/${ALGORITHM}_${PROBLEM}_${SEED}.set -- ./LakeProblem4obj_control "
echo -e $PBS | qsub
done

Optimization with the MOEAFramework allowed me to submit jobs for all remaining algorithms and seeds with one script as shown below. In my study, I actually submitted epsilon dominance algorithms (included –e flag) and point dominance algorithms (did not include –e flag) separately; however, it is my understanding that it would have been fine to submit jobs for all algorithms with the epsilon flag, especially since I converted all point dominance approximations to the Pareto front to epsilon dominance when generating reference sets.


#!/bin/bash

NSEEDS=50
PROBLEM=myLake4ObjStoch
ALGORITHMS=(MOEAD GDE3 NSGAII eNSGAII eMOEA)

SEEDS=$(seq 1 ${NSEEDS})
JAVA_ARGS="-cp MOEAFramework-2.1-Demo.jar"
set -e

for ALGORITHM in ${ALGORITHMS[@]}
do
for SEED in ${SEEDS}
do
NAME=${ALGORITHM}_${PROBLEM}_${SEED}
PBS="\
#PBS -N ${NAME}\n\
#PBS -l nodes=1\n\
#PBS -l walltime=96:00:00\n\
#PBS -o output/${NAME}\n\
#PBS -e error/${NAME}\n\
cd \$PBS_O_WORKDIR\n\
java ${JAVA_ARGS}
org.moeaframework.analysis.sensitivity.Evaluator -p
${ALGORITHM}_params.txt -i ${ALGORITHM}_Latin -b ${PROBLEM}
-a ${ALGORITHM} -e 0.01,0.01,0.0001,0.0001 -s ${SEED} -o ./sets/${NAME}.set"
echo -e $PBS | qsub
done

done

Step 3: Generate combined approximation set for each algorithm and Global reference set Next, I generated a reference set for each algorithm’s performance. This was useful as it made it easier to generate the global reference set for all algorithms across all seeds and parameterizations and it allowed me to calculate a percent contribution for each algorithm to the global reference set. Below is the script for the algorithm reference sets:

#!/bin/bash

NSAMPLES=50
NSEEDS=50
METHOD=Latin
PROBLEM=myLake4ObjStoch
ALGORITHMS=( NSGAII GDE3 eNSGAII MOEAD eMOEA Borg)

JAVA_ARGS="-cp MOEAFramework-2.1-Demo.jar"
set -e

# Generate the combined approximation sets for each algorithm
for ALGORITHM in ${ALGORITHMS[@]}
do
echo -n "Generating combined approximation set for
${ALGORITHM}..."
java ${JAVA_ARGS} \
org.moeaframework.analysis.sensitivity.ResultFileMerger \
-b ${PROBLEM} -e 0.01,0.01,0.0001,0.0001 -o ./SOW4/reference/${ALGORITHM}_${PROBLEM}.combined \
./SOW4/sets/${ALGORITHM}_${PROBLEM}_*.set
echo "done."
done

In the same file, I added the following lines to generate the global reference set while running the same script.
# Generate the reference set from all combined approximation sets
echo -n "Generating reference set..."
java ${JAVA_ARGS} org.moeaframework.util.ReferenceSetMerger \
-e 0.01,0.01,0.0001,0.0001 -o ./SOW4/reference/${PROBLEM}.reference ./SOW4/reference/*_${PROBLEM}.combined > /dev/null
echo "done."

If one wants to keep the decision variables associated with the reference set solutions, it is possible to use org.moeaframework.analysis.sensitivity.ResultFileMerger on all of the pertinent .set files.

A final option for reference sets is to generate local reference sets for each parameterization of each algorithm. This was done with the following script:

#!/bin/bash
NSEEDS=50
ALGORITHMS=( GDE3 eMOEA Borg NSGAII eNSGAII MOEAD)
PROBLEM=myLake4ObjStoch

SEEDS=$(seq 1 ${NSEEDS})

# Evaluate all algorithms for all seeds
for ALGORITHM in ${ALGORITHMS[@]}
do
java -cp MOEAFramework-2.1-Demo.jar org.moeaframework.analysis.sensitivity.ResultFileSeedMerger -d 4 -e 0.01,0.01,0.0001,0.0001 \
--output ./SOW4/${ALGORITHM}_${PROBLEM}.reference ./SOW4/objs/${ALGORITHM}_${PROBLEM}*.obj
done

Part 2 of this post walks through my calculation of metrics.

Overleaf: LaTeX Collaboration and Publication

Calling all LaTeX users! Life is about to get much better thanks to the help of Overleaf. This intent of this brief post is to introduce you to the free collaboration and publication system, Overleaf.

To get started, go to https://www.overleaf.com and sign up. Once you sign up, you may start a project by using a provided template, loading your own .tex files, or starting with a blank file. You may also import your bibliography from your own .bib file, or from citation library tools such as citeulike, zotero, and Mendeley. My personal favorite features are the collaboration capabilities. You may share your projects with collaborators, granting read/write access. There is an intelligent comment tool that easily dates any and all Microsoft Word commenting features. Change history that records the time and contributor of each change is just found with a simple click. Version control is also no sweat. You may save versions throughout the writing, editing, and collaboration process with unique labels. Lastly, you may publish to a number of open source publishers and repositories that will allow others to discover and cite your research.

There are many additional capabilities of Overleaf that I have not described here. However, the website is visually appealing and intuitive for any user. To me, these characteristics are the main advantages of a tool like Overleaf over using git and GitHub for collaboration and version control for a LaTeX document. I hope this tool may help with any of your scientific collaboration needs.

Programming language R is gaining prominence in the scientific community

Recently, R Balaji mentioned this piece in Nature, that talked about how many scientists are using the programming language R as an alternative to expensive, proprietary analysis packages.  It’s worth a read!

Also, we have covered R on this blog, with the posts below.

Parallel Plots in R

Citing Packages in R

R code examples on Jon Herman’s GitHub repository

As a quick reminder, the best way to navigate this site is to use the search box to the right of the text, as well as the categories and keywords listed below the box.  And if there’s something that we don’t cover and you would like to write about it, join us as a collaborator!  Email joseph.kasprzyk “at” colorado.edu for more information.

Thanks for a great 2014! Our year in review

The WordPress.com stats helper monkeys prepared a 2014 annual report for this blog.  We’d like to share it below!

Also, on behalf of all my collaborators I’d like to personally thank you for coming to this blog and contributing.  It wouldn’t be anything without you, the community!  I hope you had a great holiday season and I wish you a healthy, happy, and productive 2015.

Here’s an excerpt:

The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 33,000 times in 2014. If it were a concert at Sydney Opera House, it would take about 12 sold-out performances for that many people to see it.

Click here to see the complete report.