Parallelization of C/C++ and Python on Clusters

There are multiple ways of parallelizing C/C++ and Python codes to be used on clusters. The quick and dirty way, which often gets the job done, is to parallelize the code with MPI and launch one process per core across multiple nodes. Though easy to implement, this strategy may result in a fair waste of results because:

  • With too many processes on one node, inefficient use of memory may make the code memory bottle-necked, meaning that a lot of time will be spent accessing memory and the cores will be sitting twiddling their thumbs.
  • If you have a 48-core node with 192 GB of memory (as the Skylake nodes on Stampede 2) and your code uses 10 GB of RAM per process, you will not be able to have more than 19 out of 48 cores working at the same time because more than that would exceed the node’s memory, which would crash the node, which would get you kicked out of the cluster for a while without a warning — if you do not know how much RAM your code requires, you should look into the Remora tool.

If you are working on big runs on systems that charge per usage time or if you are creating a code you will have to run multiple times, it may be worth looking into more clever parallelization strategies. Some of the techniques described in the next subsections will also make your code two to four times faster on your desktop/laptop and are rather straight-forward to implement. Most of the code in this blog post can be freely downloaded from this repository. Lastly, if the #include statements in the examples below are not displayed as followed by a library between “<>”, please refer to the mentioned online repository.

Hybrid Parallelization in C/C++

The word hybrid refers in this context to a mixed use of shared and distributed memory parallelization techniques. In distributed parallelization techniques, such as the Message Passing Interface (MPI), each process has its own dedicated chunk of memory within a RAM chip. This means that each process will have its own copy of the input data (repeated across processes), variables within the program, etc. Conversely, shared memory parallelization techniques such as OpenMP allow all threads (parallel unit of computation) to access the same block of memory, which can make a mess in your code with variables being changed seemingly out of nowhere by various parallel runs, but which lessens the amount of RAM needed and in some cases allow for more efficient memory utilization.

If running the Borg Master-Slave multiobjective evolutionary optimization algorithm (Borg MS) with hybrid parallelization, Borg MS will have each function evaluation performed by one MPI process, which could run in parallel with OpenMP (if your model is setup for that) across the cores designated for that process. As soon as a function evaluation is complete, the slave MPI process in charge of it will submit the results back to Borg MS’s master process and be given another set of decision variables to work on. The figure below is a schematic drawing of such hybrid parallelization scheme:

BorgHybridParallelization.png

OpenMP

Parallelizing a for-loop in C or C++ in which each iteration is independent — such as in Monte Carlo simulations — is straight-forward:

#include
#include "path/to/my_model.h"

int main(int argc, char *argv[])
{
    int n_monte_carlo_simulations = omp_get_num_threads(); // One model run per core.
    vector system_inputs = ;
    vector results(n_monte_carlo_simulations, 0);
#pragma omp parallel for
    for (int i = 0; i < n_monte_carlo_simulations; ++i)
    {
        results[i] = RunMonteCarloSimulation(system_inputs[i]);
    }
}

The only caveat is that you have to be sure that the independent simulations are not writing over each other. For example, if by mistake you had written results[0] instead of results[i], the final value on results[0] would be that of whichever thread (parallel simulation) finished last, while the rest of the vector would be of zeros. It is important to notice here that the vector system_inputs is read only once by the parallel code and all n_monte_carlo_simulations threads would have access to it in memory.

Though this example is not rocket science, there is a lot more to OpenMP that can be found here and here.

MPI

MPI stands for Message Passing Interface. It creates processes which pass messages (information packages) to each other. The most basic MPI code is one that creates independent processes — each with its own allocated chunk of memory that cannot be accessed by any other process — and just has them running independently in parallel without communicating with each other. This is great to distribute Monte Carlo runs across various nodes, although it may lead to the limitations mentioned in the beginning of this post. Below is an example of a very basic MPI code to run Monte Carlo simulations in parallel:

#include
#include "path/to/my_model.h"

int main(int argc, char *argv[])
{
    MPI_Init(NULL, NULL);

    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

    vector system_inputs = ;
    RunMonteCarloSimulation(system_inputs[world_rank]);

    MPI_Finalize();
    return 0;
}

I will not cover in this post how to pass the output of the RunMonteCarloSimulation function back to a results vector, as in the previous example, because details of MPI is outside the scope of this post — a good MPI tutorial can be found here and great examples, including ones for Monte Carlo runs, can be found here. To run this code, you would have to compile it with mpicc mpi_demo.c -o MPIDemo and call it with the mpirun or mpiexec commands, which should be followed by the -n flag to specify the number of processes to be created, or mpirun -n 4 ./MPIDemo. The code above would run as many MonteCarlo simulations as there are processes (four processes if -n 4 was used), each with a rank (here a type of ID), and can be spread across as many nodes as you can get on a cluster. Each process of the code above would run on a core, which may result in the memory and/or processing limitations listed in the beginning of this post.

Mixing OpenMP and MPI

The ideal situation for most cases is when you use MPI to distribute processes across nodes and OpenMP to make sure that as many cores within a node as it makes sense to use are being used. To do this, you can use the structure of the code below.

#include
#include
#include
#include "path/to/my_model.cpp"

int main(int argc, char *argv[])
{
    MPI_Init(NULL, NULL);

    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

    int n_cores = omp_get_num_threads(); // get number of cores available for MPI process.
    vector system_inputs = ;
    vector results(n_cores , 0);
#pragma omp parallel for
    for (int i = 0; i < n_cores ; ++i)
    {
        // printf("C hybrid. Process-thread: %s-%d\n", world_rank, omp_get_thread_num());
        results[i] = RunMonteCarloSimulation(system_inputs[i]);
    }

    MPI_Finalize();
    return 0;

}

If you comment lines 4 and 20 and uncomment line 19, compile the code with mpicc hybrid.c -o Hybrid -fopenmp, set the number of threads in the terminal to eight by running set OMP_NUM_THREADS=8, and call the resulting executable with the command mpirun -n 4 ./Hybrid, the executable will spawn four processes, each of which spawning eight threads and printing the following output:

C hybrid. Process-thread: 0-0
C hybrid. Process-thread: 0-1
C hybrid. Process-thread: 0-4
C hybrid. Process-thread: 0-5
C hybrid. Process-thread: 0-7
C hybrid. Process-thread: 0-6
C hybrid. Process-thread: 0-2
C hybrid. Process-thread: 0-3
C hybrid. Process-thread: 1-6
C hybrid. Process-thread: 1-0
C hybrid. Process-thread: 1-1
C hybrid. Process-thread: 1-3
C hybrid. Process-thread: 1-5
C hybrid. Process-thread: 1-4
C hybrid. Process-thread: 1-7
C hybrid. Process-thread: 1-2
C hybrid. Process-thread: 2-6
C hybrid. Process-thread: 2-0
C hybrid. Process-thread: 2-3
C hybrid. Process-thread: 2-7
C hybrid. Process-thread: 2-2
C hybrid. Process-thread: 2-4
C hybrid. Process-thread: 2-5
C hybrid. Process-thread: 2-1
C hybrid. Process-thread: 3-0
C hybrid. Process-thread: 3-1
C hybrid. Process-thread: 3-3
C hybrid. Process-thread: 3-4
C hybrid. Process-thread: 3-6
C hybrid. Process-thread: 3-7
C hybrid. Process-thread: 3-2
C hybrid. Process-thread: 3-5

The code above can be used to have one process per node (and therefore one copy of system_inputs per node) using as many threads as there are cores in that node — depending on the number of cores, it is sometimes more efficient to have two or three MPI processes per node, each using half or a third of the cores in that node. Below is a PBS job submission script that can be used to generate the output above — job submission scripts for Slurm to be used on Stampede 2 can be found here.

#!/bin/bash
#PBS -N test_hybrid
#PBS -l nodes=2:ppn=16
#PBS -l walltime=:00:30
#PBS -o ./output/hybrid_c.out
#PBS -e ./error/hybrid_c.err
module load openmpi-1.10.7-gnu-x86_64
export OMP_NUM_THREADS=8

set -x
cd "$PBS_O_WORKDIR"

# Construct a copy of the hostfile with only 16 entries per node.
# MPI can use this to run 16 tasks on each node.
export TASKS_PER_NODE=2
uniq "$PBS_NODEFILE"|awk -v TASKS_PER_NODE="$TASKS_PER_NODE" '{for(i=0;i nodefile

#cat nodefile
mpiexec --hostfile nodefile -n 4 -x OMP_NUM_THREADS ./Hybrid

Parallelizing Python on a Cluster

Python was not designed with shared-memory parallelization in mind, so its native parallelization library, the Multiprocessing library, does distributed-memory parallelization instead of shared. The issue with the Multiprocessing library is that it does not work across nodes, so you still need something like MPI. Here, we will use MPI4Py.

The code below parallelizes the function call across cores within the same node — please refer to this blog post for details about and other ways to use the Multiprocessing library.

from multiprocessing import Process, cpu_count

def do_something_useful(rank, shared_process_number):
    # Do something useful here.
    print 'Python hybrid, MPI_Process-local_process (not quite a thread): {}-{}'.format(rank, shared_process_number)

# Create node-local processes
shared_processes = []
#for i in range(cpu_count()):
for i in range(8):
    p = Process(target=do_something_useful, args=(i,))
    shared_processes.append(p)

# Start processes
for sp in shared_processes:
    sp.start()

# Wait for all processes to finish
for sp in shared_processes:
    sp.join()

By adding MPI4Py to the code above, following the same logic of the C/C++ example, we get the following result:

from mpi4py import MPI
from multiprocessing import Process, cpu_count

def do_something_useful(rank, shared_process_number):
    # Do something useful here.
    print 'Python hybrid, MPI_Process-local_process (not quite a thread): {}-{}'.format(rank, shared_process_number)

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

print 'Calling Python multiprocessing process from Python MPI rank {}'.format(rank)

# Create shared-ish processes
shared_processes = []
#for i in range(cpu_count()):
for i in range(8):
    p = Process(target=do_something_useful, args=(rank, i))
    shared_processes.append(p)

# Start processes
for sp in shared_processes:
    sp.start()

# Wait for all processes to finish
for sp in shared_processes:
    sp.join()

comm.Barrier()

Calling the code above with mpirun -n 4 python hybrid_pure_python.py will first result in four MPI processes being spawned with ranks 1 through 4, each spawning eight local processes. The output should look similar to the one below:

Calling Python multiprocessing process from Python MPI rank 0
Calling Python multiprocessing process from Python MPI rank 1
Python hybrid, MPI_Process-local_process (not quite a thread): 1-0
Python hybrid, MPI_Process-local_process (not quite a thread): 0-0
Python hybrid, MPI_Process-local_process (not quite a thread): 1-1
Python hybrid, MPI_Process-local_process (not quite a thread): 0-1
Python hybrid, MPI_Process-local_process (not quite a thread): 1-2
Python hybrid, MPI_Process-local_process (not quite a thread): 0-2
Python hybrid, MPI_Process-local_process (not quite a thread): 0-3
Python hybrid, MPI_Process-local_process (not quite a thread): 1-3
Calling Python multiprocessing process from Python MPI rank 2
Calling Python multiprocessing process from Python MPI rank 3
Python hybrid, MPI_Process-local_process (not quite a thread): 1-4
Python hybrid, MPI_Process-local_process (not quite a thread): 0-4
Python hybrid, MPI_Process-local_process (not quite a thread): 0-5
Python hybrid, MPI_Process-local_process (not quite a thread): 1-5
Python hybrid, MPI_Process-local_process (not quite a thread): 1-6
Python hybrid, MPI_Process-local_process (not quite a thread): 0-6
Python hybrid, MPI_Process-local_process (not quite a thread): 1-7
Python hybrid, MPI_Process-local_process (not quite a thread): 0-7
Python hybrid, MPI_Process-local_process (not quite a thread): 2-0
Python hybrid, MPI_Process-local_process (not quite a thread): 3-0
Python hybrid, MPI_Process-local_process (not quite a thread): 3-1
Python hybrid, MPI_Process-local_process (not quite a thread): 2-1
Python hybrid, MPI_Process-local_process (not quite a thread): 3-2
Python hybrid, MPI_Process-local_process (not quite a thread): 2-2
Python hybrid, MPI_Process-local_process (not quite a thread): 3-3
Python hybrid, MPI_Process-local_process (not quite a thread): 2-3
Python hybrid, MPI_Process-local_process (not quite a thread): 3-4
Python hybrid, MPI_Process-local_process (not quite a thread): 2-4
Python hybrid, MPI_Process-local_process (not quite a thread): 2-5
Python hybrid, MPI_Process-local_process (not quite a thread): 3-5
Python hybrid, MPI_Process-local_process (not quite a thread): 3-6
Python hybrid, MPI_Process-local_process (not quite a thread): 3-7
Python hybrid, MPI_Process-local_process (not quite a thread): 2-6
Python hybrid, MPI_Process-local_process (not quite a thread): 2-7

Below is a PBS submission script that can be used with the code above:

#!/bin/bash
#PBS -N test_hybrid
#PBS -l nodes=2:ppn=16
#PBS -l walltime=00:01:00
#PBS -o ./output/hybrid_python.out
#PBS -e ./error/hybrid_python.err
module load python-2.7.5
export OMP_NUM_THREADS=8

set -x
cd "$PBS_O_WORKDIR"

# Construct a copy of the hostfile with only 16 entries per node.
# MPI can use this to run 16 tasks on each node.
export TASKS_PER_NODE=2
uniq "$PBS_NODEFILE"|awk -v TASKS_PER_NODE="$TASKS_PER_NODE" '{for(i=0;i nodefile

#cat nodefile
mpiexec --hostfile nodefile -n 4 -x OMP_NUM_THREADS python hybrid_pure_python.py

Parallelizing Monte Carlo Runs of C/C++ (or anything else) with Python

Very often we have to run the same model dozens to thousands of times for Monte-Carlo-style analyses. One option is to go on a cluster and submit dozens to thousand of jobs, but this becomes a mess for the scheduler to manage and cluster managers tend to not be very fond of such approach. A cleaner way of performing Monte Carlo runs in parallel is to write a code in Python or other language that spawns MPI processes which then call the model to be run. This can be done in Python with the subprocess library, as in the example below:

from mpi4py import MPI
import subprocess

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

print 'Calling OpenmpTest from Python MPI rank {}'.format(rank)
function_call = ['./OpenmpTest', '{}'.format(rank)]
subprocess.call(function_call)

comm.Barrier()

Here, OpenmpTest is an executable compiled from the C-OpenMP code below:

#include
#include 

int main(int argc, char *argv[])
{
#pragma omp parallel for
    for (int i = 0; i < omp_get_num_threads(); ++i)
    {
        printf("C OpenMP. Process-thread: %s-%d\n", argv[1], i);
    }
    return 0;
}

Submitting the code above with following PBS job submission script should result in the output below.

#!/bin/bash
#PBS -N test_hybrid
#PBS -l nodes=2:ppn=16
#PBS -l walltime=00:01:00
#PBS -o ./output/hybrid_python_c.out
#PBS -e ./error/hybrid_python_c.err
module load python-2.7.5
export OMP_NUM_THREADS=8

set -x
cd "$PBS_O_WORKDIR"

# Construct a copy of the hostfile with only 16 entries per node.
# MPI can use this to run 16 tasks on each node.
export TASKS_PER_NODE=2
uniq "$PBS_NODEFILE"|awk -v TASKS_PER_NODE="$TASKS_PER_NODE" '{for(i=0;i nodefile

#cat nodefile
mpiexec –hostfile nodefile -n 4 -x OMP_NUM_THREADS python hybrid_calls_c.py

Output:

Calling OpenmpTest from Python MPI rank 0
Calling OpenmpTest from Python MPI rank 1
Calling OpenmpTest from Python MPI rank 2
Calling OpenmpTest from Python MPI rank 3
C OpenMP. Process-thread: 0-7
C OpenMP. Process-thread: 0-5
C OpenMP. Process-thread: 0-0
C OpenMP. Process-thread: 0-2
C OpenMP. Process-thread: 0-4
C OpenMP. Process-thread: 0-1
C OpenMP. Process-thread: 0-3
C OpenMP. Process-thread: 0-6
C OpenMP. Process-thread: 1-1
C OpenMP. Process-thread: 1-2
C OpenMP. Process-thread: 1-6
C OpenMP. Process-thread: 1-4
C OpenMP. Process-thread: 1-5
C OpenMP. Process-thread: 1-0
C OpenMP. Process-thread: 1-3
C OpenMP. Process-thread: 1-7
C OpenMP. Process-thread: 2-3
C OpenMP. Process-thread: 2-7
C OpenMP. Process-thread: 2-0
C OpenMP. Process-thread: 2-4
C OpenMP. Process-thread: 2-6
C OpenMP. Process-thread: 2-5
C OpenMP. Process-thread: 2-1
C OpenMP. Process-thread: 2-2
C OpenMP. Process-thread: 3-5
C OpenMP. Process-thread: 3-1
C OpenMP. Process-thread: 3-4
C OpenMP. Process-thread: 3-2
C OpenMP. Process-thread: 3-7
C OpenMP. Process-thread: 3-3
C OpenMP. Process-thread: 3-0
C OpenMP. Process-thread: 3-6

End of the Story

As highlighted in the examples above (all of which can be download from here), creating parallel code is not rocket science and requires very few lines of code. The OpenMP and Multiprocessing parallelization tricks can save hours if you have to run a model several times on your personal computer, and paired with MPI these tricks can save you from hours to days in time and thousands of node-hours on clusters. I hope you make good use of it.

Performing random seed analysis and runtime diagnostics with the serial Borg Matlab wrapper

Search with Multiobjective Evolutionary Algorithms (MOEAs) is inherently stochastic. MOEAs are initialized with a random population of solutions that serve as the starting point for the multiobjective search, if the algorithm gets “lucky”, the initial population may contain points in an advantageous region of the decision space  that give the algorithm a head start on the search. On the other hand, the initial population may only contain solutions in difficult regions of the decision space, which may slow the discovery of quality solutions. To overcome the effects of initial parameterization, we perform a random seed analysis which involves running an ensemble of searches, each starting with a randomly sampled set of initial conditions which we’ll here on refer to as a “random seed”. We combine search results across all random seeds to generate a “reference set” which contains only the best (Pareto non-dominated) solutions across the ensemble.

Tracking the algorithm’s performance during search is an important part of a random seed analysis. When we use MOEAs to solve real world problems (ie. problems that don’t have analytical solutions), we don’t know the true Pareto set a priori. To determine if an algorithm has discovered an acceptable approximation of the true Pareto set, we must measure it’s performance across the search, and only end our analysis if we can demonstrate the search has ceased improving (of course this is not criteria for true convergence as it is possible the algorithm has simply failed to find better solutions to the problem, this is why performing rigorous diagnostic studies such as Zatarain et al., 2016 is important for understanding how various MOEAs perform in real world problems). To measure MOEA search performance, we’ll use hypervolume , a metric that captures both convergence and diversity of a given approximation set (Knowles and Corne, 2002; Zitzler et al., 2003). Hypervolume represents the fraction of the objective space that is dominated by an approximation set, as shown in Figure 1 (from Zatarain et al., 2017). For more information on MOEA performance metrics, see Joe’s post from 2013.

hv

Figure 1: A 2 objective example of hypervolume from Zatarain et al,. 2017. To calculate hypervolume, an offset, delta, is taken from the bounds of the approximation set to construct a “reference point”. The hypervolume is a measure of the volume of the objective space between the approximation set and the reference point. A larger hypervolume indicates a better approximation set.

This post will demonstrate how to perform a random seed analysis and runtime diagnostics using the Matlab wrapper for the serial Borg MOEA (for background on the Borg MOEA, see Hadka and Reed, 2013). I’ll use the DTLZ2 3 objective test problem as an example, which tasks the algorithm with approximating a spherical Pareto-optimal front (Deb et al,. 2002). I’ve created a Github repository with relevant code, you can find it here.

In this demonstration, I’ll use the Matlab IDE and Bash shell scripts called from a Linux terminal (Window’s machines can use Cygwin, a free Linux emulator). If you are unfamiliar with using a Linux terminal, you can find a tutorial here. To perform runtime diagnostics, I’ll use the MOEAFramework, a Java library that you can download here (the demo version will work just fine for our purposes).

A modified Matlab wrapper that produces runtime files

In order to track search performance across time, we need snapshots of Borg’s archive during the search. In the parallel “master-worker” and “multi-master” versions of Borg, these snapshots are generated by the Borg C library in the form of “runtime” files. The snapshots provided by the runtime files contain information on the number of function evaluations completed (NFE), elapsed time, operator probabilities, number of improvements, number of restarts, population size, archive size and the decision variables and objectives within the archive itself.

To my knowledge, the current release of the serial Borg Matlab wrapper does not print runtime files. To perform runtime diagnostics, I had to modify the wrapper file, nativeborg.cpp. I’ve posted my edited version to the aforementioned Github repository.

Performing random seed analysis and runtime diagnostics

To perform a random seed analysis and runtime diagnostics with the Matlab wrapper, follow these steps:

1) Download the Borg MOEA and compile the Matlab wrapper

To request access to the Borg MOEA, complete step 2 of Jazmin’s introduction to Borg, found here . To run Borg with Matlab you must compile a MEX file, instructions for compiling for Windows can be found here, and here for Linux/Mac.

Once you’ve downloaded and compiled Borg for Matlab, clone the Github repository I’ve created and replace the nativeborg.cpp file from the Borg download with the edited version from the repository. Next, create three new folders in your working directory, one called “Runtime” and another called “Objectives” and the third called “metrics”. Make sure your working directory contains the following files:

  • borg.c
  • borg.h
  • mt19937ar.c
  • mt19937ar.h
  • nativeborg.cpp (version from the Git repository)
  • borg.m
  • DTLZ2.m (test problem code, supplied from Github repository)
  • calc_runtime_metrics.sh
  • append_hash.sh
  • MOEAFramework-2.12-Demo.jar

2) Use Matlab to run the Borg MOEA across an ensemble of random seeds

For this example we’ll use 10 seeds with 30,000 NFE each. We’ll print runtime snapshots every 500 NFE.

To run DTLZ2 across 10 seeds,  run the following script in Matlab:

for i = [1:10]
    [vars, objs, runtime] = borg(12,3,0, @DTLZ2, 30000, zeros(1,12),ones(1,12), [0.01, 0.01, 0.01], {'frequency',500, 'seed', i});
    objFile = sprintf('Objectives/DTLZ2_3_S%i.obj',i);
    dlmwrite(objFile, objs, 'Delimiter', ' ');
end

The for loop above iterates across 10 random initialization of the algorithm. The first line within the for loop calls the Borg MOEA and returns decision variables (vars), objective values (objs) and a struct with basic runtime information. This function will also produce a runtime file, which will be printed in the Runtime folder created earlier (more on this later).

The second line within the for loop creates a string containing the name of a file to store the seed’s objectives and the third line prints the final objectives to this file.

3) Calculate the reference set across random seeds using the MOEAFramework

The 10 .obj files created in step two containing the final archives from each random seed. For our analysis, we want to generate a “reference set” of the best solutions across all seeds. To generate this set, we’ll use built in tools from the MOEAFramework. The MOEAFramework requires that all .obj files have “#” at the end of the file, which is annoying to add in Matlab. To get around this, I’ve written a simple Bash script called “append_hash.sh”.

In your Linux terminal navigate to the working directory with your files (the folder just above Objectives) and run the Bash script like this:

 ./append_hash.sh 

Now that the hash tags have been appended to each .obj files, create an overall reference set by running the following command in your Linux Terminal.

java -cp MOEAFramework-2.12-Demo.jar org.moeaframework.analysis.sensitivity.ResultFileSeedMerger -d 3 -e 0.01,0.01,0.01 -o Borg_DTLZ2_3.reference Objectives/*.obj

This command calls the MOEAFramework’s ResultFileMerger tool, which will merge results across random seeds. The -d flag specifies the number of objectives in our problem (3), the -e flag specifies the epsilons for each objective (.01 for all 3 objectives), the -o flag specifies the name of our newly created reference set file and the Objectives/*.obj tells the MOEAFramework to merge all files in the Objectives folder that have the extension “.obj”. This command will generate a new file named “Borg_DTLZ2_3.reference”, which will contain 3 columns, each corresponding to one objective. If we load this file into matlab and plot, we get the following plot of our Pareto approximate set.

Reference_set_front

Figure 2: The reference set generated by the Borg Matlab wrapper using 30,000 NFE.

4) Calculate and visualize runtime hypervolumes

We now have a reference set representing the best solutions across our random seeds. A final step in our analysis is to examine runtime data to understand how the search progressed across function evaluations. We’ll again use the MOEAFramework to examine each seed’s hypervolume at the distinct runtime snapshots provided in the .runtime files. I’ve written a Bash script to call the MOEAFramework, which is provided in the Git repository as “calc_runtime_metrics.sh” and reproduced below:

#/bin/bash

NSEEDS=10
SEEDS=$(seq 1 ${NSEEDS})
JAVA_ARGS="-cp MOEAFramework-2.12-Demo.jar"

for SEED in ${SEEDS}
do
	java ${JAVA_ARGS} org.moeaframework.analysis.sensitivity.ResultFileEvaluator -d 3 -i ./Runtime/runtime_S${SEED}.runtime -r Borg_DTLZ2_3.reference -o ./metrics/Borg_DTLZ2_3_S${SEED}.metrics
done

To execute the script in your terminal enter:

./calc_runtime_metrics.sh

The above command will generate 10 .metrics files inside the metrics folder, each .metric file contains MOEA performance metrics for one randome seed, hypervolume is in the first column, each row represents a different runtime snapshot. It’s important to note that the hypervolume calculated by the MOEAFramework here is the absolute hypervolume, but what we really want in this situation is the relative hypervolume to the reference set (ie the hypervolume achieved at each runtime snapshot divided by the hypervolume of the reference set). To calculate the hypervolume of the reference set, follow the steps presented in step 2 of Jazmin’s blog post (linked here), and divide all runtime hypervolumes by this value.

To examine runtime peformance across random seeds, we can load each .metric file into Matlab and plot hypervolume against NFE. The runtime hypervolume for the DTLZ2  3 objective test case I ran is shown in Figure 3 below.

Runtime_hv

Figure 3: Runtime results for the DTLZ2 3 objective test case

Figure 3 shows that while there is some variance across the seeds, they all approach the hypervolume of the reference set after about 10,000 NFE. This leveling off of our search across many initial parameterizations indicates that our algorithm has likely converged to a final approximation of our Pareto set. If this plot had yielded hypervolumes that were still increasing after the 30,000 NFE, it would indicate that we need to extend our search to a higher number of NFE.

References

Deb, K., Thiele, L., Laumanns, M. Zitzler, E., 2002. Scalable multi-objective optimization test problems, Proceedings of the 2002 Congress on Evolutionary Computation. CEC’02, (1),  825-830

Hadka, D., Reed, P., 2013. Borg: an auto-adaptive many-objective evolutionary computing
framework. Evol. Comput. 21 (2), 231–259.

Knowles, J., Corne, D., 2002. On metrics for comparing nondominated sets. Evolutionary
Computation, 2002. CEC’02. Proceedings of the 2002 Congress on. 1. IEEE, pp. 711–716.

Zatarain Salazar, J., Reed, P.M., Herman, J.D., Giuliani, M., Castelletti, A., 2016. A diagnostic assessment of evolutionary algorithms for multi-objective surface water
reservoir control. Adv. Water Resour. 92, 172–185.

Zatarain Salazar, J. J., Reed, P.M., Quinn, J.D., Giuliani, M., Castelletti, A., 2017. Balancing exploration, uncertainty and computational demands in many objective reservoir optimization. Adv. Water Resour. 109, 196-210

Zitzler, E., Thiele, L., Laumanns, M., Fonseca, C.M., Da Fonseca, V.G., 2003. Performance
assessment of multiobjective optimizers: an analysis and review. IEEE Trans. Evol.
Comput. 7 (2), 117–132.

MOEAFramework Training Part 1: Connecting an External Problem

The goal of this training is to step a user through becoming familiar with the capabilities of MOEAFramework, a free and open source Java library created by Dave Hadka, that allows the user to design, execute, and test out a variety of popular multi-objective evolutionary algorithms (MOEAs). In this series, we will demonstrate the capabilities of MOEAFramework in 4 parts. Part 1 demonstrates how to hook up an external optimization problem to MOEAFramework. Part 2 will cover the optimization of the problem using a variety of algorithms. Part 3 will illustrate how to calculate metrics to assess the performance of the algorithms. Finally, Part 4 will step through generation of relevant figures from the metrics that convey the effectiveness, efficiency, reliability, and controllability of the algorithms.

The example test case that will be used throughout this tutorial is the DPS version of the Lake Problem. The code is written and adapted by Dr. Julianne Quinn and can be found here. However, the tutorial will be set up so that the user can easily swap in their own problem formulation.

Finally, this guide is specifically built to connect an external problem that is written in C++ but Dave Hadka’s Beginner Guide to the MOEA Framework, has examples of how to create the Java executable for problems written in a language other than C++.

Connecting Your Problem

Before starting this training, it is recommended to set up a directory in a Linux environment with Java installed. If you are a student in Reed Research Group, create a folder in your Cube directory called “MOEA_Diagnostics_Tutorial”. Throughout the training, we will be adding various subdirectories and files to this folder. For this part of the tutorial, you will need the following in your directory:

  1. MOEAFramework Demo .jar file which can be found by clicking on “Demo Application” on the far right.
  2. The C++ version of the Lake Problem
  3. SOWs_Type6.txt (natural inflows read in by the .cpp file)
  4. moeaframework.c and moeaframework.h found here

Once you have the .cpp file, the next step is to get it set up to be recognized by MOEAFramework. I have made these changes in this file in my GitHub repo. If you compare with the file in 2, you will see some changes.

First, I have commented out all parts of the code related to Borg, which the problem was originally set up to be optimized with. For this tutorial, we will be optimizing with multiple algorithms and comparing their performance. Secondly, in lines 55-57, I have use the #define directive to declare the number of variables, objectives, and constraints to be constants. Lines 300-309 is where the direct connection to MOEAFramework comes in. MOEA_Init initializes the communication between C++ and MOEAFramework and takes the number of objectives and constraints as arguments. Then we can start reading and evaluating solutions. MOEA_Read_doubles extracts the decision variables and stores them in an array, vars. Line 304 calls the lake_problem function that we would like to evaluate, along with the variables, and constraints. This results in the objs array being filled. Finally, MOEA_Write writes the objectives back into the framework. When all solutions are done being read and written, MOEA_Terminate closes the connection. The important thing here is just to make sure that you are passing the names of the arguments of your lake_problem function correctly.

Next, we must make an executable of the C++ file that Java can read. The makefile is located here and takes lake.cpp, moeaframework.c, moeaframework.h, and some relevant libraries and compiles them into an executable called “lake”. In order to run this file, simply type “make”.

Finally, we must turn our executable into a java class. The relevant java file can be found here. This file can be found in the MOEAFramework documentation, but must be tailored to an external problem. Lines 21-25 import in the relevant tools from MOEAFramework that help to configure and solve the problem. Lines 40, 42, 43, and 48 show the first change from the original file. Here we insert the name of our executable that was generated in the last step, “lake”. In lines 53, 58, and 63, we state the number of decision variables, objectives, and constraints. Finally, in lines 67-75, we create a newSolution ( ) method to specify that our solution should have 6 real-valued decision variables between 0 and 1.

In order to create a class file, simply type:

 

javac -classpath MOEAFramework-2.12-Demo.jar lake.java 

 

A file called lake.class will be created.

The first part of training is done! Next time, we will set up some scripts, call these executables, and optimize the lake problem with a variety of different algorithms.

 

 

 

Geospatial Mapping in R

Introduction

Let me start this blog post by stating the obvious: Geospatial maps are interesting to look at and certainly make papers and presentations prettier and more impressive; however, those are not the only reasons that such maps exist. They are used to communicate various types of information including geographical locations of regions in the world.

Why R?

Several available platforms have been used for drawing spatial maps and conducting geospatial data management. An eminent example is ArcGIS, which is a popular, flexible, and user-friendly geospatial mapping tool. Although ArcGIS is powerful and has many features, I am personally interested in open source, and Linux-friendly software.

Although there are several GIS tools such as Python, GRASS, QGIS, and UbuntuGIS, in this blog post, I will explain how R as an alternative tool can be used for geospatial analysis and for drawing spatial maps. R offers several advantages. First, R is an open-source platform, whereas GIS is relatively expensive. Second, R is script based. In some situations, you might have to generate several hundred maps from post-processed results; a tool such as R could offer faster and more-flexible data processing. You can run R on Linux machines and computer clusters and link it to other models that work under the Linux operating system. Different packages in R have been developed for geospatial analysis. In this exercise, I am going to focus on “RGDAL,” a widely used R package. RGDAL is the R distribution of Geospatial Data Abstraction Library (GDAL).

I recently moved to Cornell University, and I am eager to learn more about this region, so I decided to focus on the Susquehanna River Basin (SRB) located in US mid-Atlantic. The SRB drains parts of New York, Pennsylvania, and Maryland to the Chesapeake Bay. Before I get entirely sidetracked by my interest in SRB, let’s go back to the original intention of this blog post, which is making geospatial maps in R.

Prerequisites

Download all necessary data from the following links, then unzip and save them in your preferred folder.

1- Susquehanna River Basin Boundary from here

2- Major Watersheds in the Susquehanna River Basin from here

3- Susquehanna River from here

Open a new R Script in your R-Studio, then install the following R packages, you can use the following commands to install and load the packages:

# install.packages("rgdal")
# install.packages("ggplot2")
# install.packages("RColorBrewer")

library(rgdal)
library(ggplot2)
library(RColorBrewer)

Step 1- Map of Susquehanna River Basin

The first map of this exercise is a simple map of Susquehanna River Basin.

# I) The first step is to draw the map of SRB using the following code

SRB_Boundary <- readOGR(dsn = "spatial maps/Code/Shapefiles/srb/srb.shp")
plot(SRB_Boundary, col="gray90",
     main="Figure 1", sub="Susquehanna River Basin", cex.main=2.5, cex.sub=2.5)

# II) Then we can add Susquehanna River to the map

SR=readOGR("spatial maps/Code/Shapefiles/WtrTrails/WtrTrails.shp")
plot(SR, col="skyblue3", add=T, lwd=2)

# III) Adding information from the attribute table
#Shapefiles usually contain helpful information, such as name of objects, 
#sub-basins, area/length of objects, etc. 
#We are often interested in adding some of that information to our maps. 
#Here is how we can do it in R:

text(SRB_Boundary$NAME, x=coordinates(SRB_Boundary)[1],
     y=coordinates(SRB_Boundary)[2]*1.2, cex=1.2, col="darkblue", font=2)

text(paste("Area=27,500 square miles"), x=coordinates(SRB_Boundary)[1],
     y=(coordinates(SRB_Boundary)[2]*1.15), font=3, cex=1, col="darkblue")

# Let's add coordinates to the map

llgridlines(SRB_Boundary, plotLabels = T, cex=1.5)

Step 2- Selection of objects from an attribute table

If you have already worked with Arc-GIS you probably used its selection tools. What we are doing here is equivalent to selection from the attribute table. If you are not familiar with attribute tables this short explanation from esri should be helpful.

#I) Let's add SRB map again

plot(SRB_Boundary, col="gray90", 
     main="Figure 2", sub="Sub-basins greater than 800 square kilometer", 
     cex.main=2.5, cex.sub=2.5)

#II) Then we can add all the subbasins in SRB to the map

Subbasin <- readOGR(dsn = "spatial maps/Code/Shapefiles/wshedmjr/wshedmjr.shp")
plot(Subbasin, add =T, col=alpha("darkolivegreen1", 0.9))

# III) For this exercise, we are going to select large sub-basins of SRB
# with area of greater than 800 square kilometer

LargestBasins=which(Subbasin$SQM>800) # square kilometer

# IV) Now we are going to change the color of these selected features on the map

plot(Subbasin[LargestBasins,], add =T, col=alpha("seagreen", 0.9))

Step 3- Adding a legend to the map

In this part of the exercise, we are going to add a legend to the map

# I) SRB map 

 plot(SRB_Boundary, col="gray70",lwd=4,
       main="Figure 3", sub="Precipitation contour lines", cex.main=2.5, cex.sub=2.5)

# Now let's add precipitation contours to the SRB map

 isohyet=readOGR(dsn = "spatial maps/Code/Shapefiles/precip_iso/precip_iso.shp")
 plot(isohyet, add=T, col=bpy.colors(11), lwd=4)
   
# We can use the following script to add a legend to the map
llgridlines(SRB_Boundary, plotLabels = T, cex=1.5)

legend("right",box.col = "white", legend = unique(isohyet$INCHES), 
       fill=bpy.colors(11), cex=1.75, title = "Precipitation (inches)")

In this short tutorial, we went over some basic features of RGDAL. However, R can be used for more sophisticated geospatial analysis tasks, which I might cover in my future blog posts.