This post walks through the implementation of the different parallel versions of the Borg MOEA: master-slave and multi-master slave. The implementation is demonstrated using the DTLZ2 example provided with the multi-master source code. We will break this post in four sections, the fist one describes the Main file were the problem is defined and the required libraries are specified. The second part describes the Makefile to compile and create the executables for the dtlz2 problem, the third section describes the Submission file to manage the distribution of jobs in a cluster, and finally we’ll cover a Job submission example in the fourth section.
Both of the parallel Borg implementations are described in detail in the following papers:
Hadka, D., and Reed, P.M., “Large-scale Parallelization of the Borg MOEA for Many-Objective Optimization of Complex Environmental Systems”, Environmental Modelling & Software, v69, 353-369, 2015.
Reed, P.M. and Hadka, D., “Evolving Many-Objective Water Management to Exploit Exascale Computing”, Water Resources Research, v50, n10, 8367–8373, 2014.
1. Main file
1.1. Required headers
The main file refers to the file were we specify the main function called : dtlz2_mm.c, dtlz2 is the test problem we are solving, the mm refers to multi-master implementation. First, you will need the mpi header (line 6), this is a message passing library required for parallel computers and clusters. You will also need to provide the Borg multi-master header file: borgmm.h (line 7), if your working directory is different from that of the multi-master directory, you will need to specify the path, for intstance: “./mm-borg-moea/borgmm.h”.
#include <stdio.h>;
#include <stdlib.h>;
#include <math.h>;
#include <time.h>;
#include <math.h>;
#include <mpi.h>;
#include "borgmm.h";
1.2. Problem definition
In the following lines the DTLZ2 problem is defined as it would be in the serial version. The following funciton is responsible for reading the decision variables and evaluating the problem. We will be using the 2 objective version of this problem; however, you can scale it up. The rule is nvars=nobjs+9. Hence, if you want to try up to 5 objectives you can simply change lines 2 and 3 to be: nvars= 14 and nobjs=5.
#define PI 3.14159265358979323846
int nvars = 11;
int nobjs = 2;
void dtlz2(double* vars, double* objs, double* consts) {
int i;
int j;
int k = nvars - nobjs + 1;
double g = 0.0;
for (i=nvars-k; i<nvars; i++) {
g += pow(vars[i] - 0.5, 2.0);
}
for (i=0; i<nobjs; i++) {
objs[i] = 1.0 + g;
for (j=0; j<nobjs-i-1; j++) {
objs[i] *= cos(0.5*PI*vars[j]);
}
if (i != 0) {
objs[i] *= sin(0.5*PI*vars[nobjs-i-1]);
}
}
}
1.3. Enabling multiple seeds
The following lines enable to use random seeds as external arguments. This links your main file to your submission file where multiple seeds are submitted to the cluster. This segment is coupled with the submission file discussed in section 3.
int main(int argc, char* argv[]) {
unsigned int seed = atoi(argv[1]);
1.4. Variable declarations
In lines 1-2 of the following code, a couple of loop variables are declared, j and rank, which will be used later in section 1.7 and 1.9. Also, the maximum number of function evaluations are declared. We will print out the runtime file and the file with the Pareto set; hence, the size of the output file names are also specified in lines 4-7.
int j;
int rank;
int NFE = 100000;
char outputFilename[256];
FILE* outputFile = NULL;
char runtime[256];
char timing[256];
1.5. Parallel borg parameters and core allocation
In the following segment, we specify some key parameters for the parallel Borg. First, all multi-master runs need to call startup (in line 1). The name of the variable argc
stands for “argument count”; argc
contains the number of arguments we pass to the program. The name of the variable argv
stands for “argument vector”, which are the arguments that we pass to the program. Then, the number of islands is specified in line 2. If you want to use the master-slave configuration, you can simply assign one island. The only difference with the pure master-slave code is that it uses uniform population samples, whereas the multi-master code uses latin hypercube population samples. Line 3 specifies the maximum wallclock time estimated for your job. Finally, global latin hypercube initialization is specified in line 5 to ensure each island gets a well sampled distribution of solutions.
BORG_Algorithm_ms_startup(&argc, &argv);
BORG_Algorithm_ms_islands(2);
BORG_Algorithm_ms_max_time(0.1);
BORG_Algorithm_ms_max_evaluations(NFE);
BORG_Algorithm_ms_initialization(INITIALIZATION_LATIN_GLOBAL);
Keep in mind the core allocation, if you have 32 available cores for a computation, when using a master-slave configuration, one core will be allocated to the master and the remaining 31 cores will be available for the function evaluations. If you use a 2 multi-master configuration, one core will be allocated to the controller and two will be allocated to the masters (1 core per master), leaving 29 cores available for the function evaluations. If you are using a small cluster, I wouldn’t recommend going higher than 4 masters.
1.6. Problem definition
The next segment creates the DTLZ2 problem. In line 1, the number of decision variables, objectives and constraints are specified, the last argument, dtlz2, references the function that evaluates the DTLZ2 problem shown in section 1.2. In lines 3-5 the lower and upper bounds for each decision variable are set to 0 and 1. In lines 7-9 the epsilon values used by the Borg MOEA are specified to define the problem resolution for each objective to 0.01.
BORG_Problem problem = BORG_Problem_create(nvars, nobjs, 0, dtlz2);
for (j=0; j<nvars; j++) {
BORG_Problem_set_bounds(problem, j, 0.0, 1.0);
}
for (j=0; j<nobjs; j++) {
BORG_Problem_set_epsilon(problem, j, 0.01);
}
1.7. Printing runtime output
We specify the output frequency in line 1, this will print 100 generations, since we specified 100,000 maximum function evaluations, Borg will provide runtime output every 1000 NFE. Lines 2 and 3 save the Pareto sets and the runtime dynamics to a file. The %d gets replaced by the index of the seed and the %%d gets replaced by the index of the master, make sure that the sets and runtime folders exist in your working directory.
BORG_Algorithm_output_frequency((int)NFE/100);
sprintf(outputFilename, "./sets/DTLZ2_S%d.set", seed);
sprintf(runtime, "./runtime/DTLZ2_S%d_M%%d.runtime", seed);
BORG_Algorithm_output_runtime(runtime);
1.8. Parallelizing seeds
The MPI_Comm_rank routine gets the rank of this process. The rank is used to ensure each parallel process uses a different random seed.
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
BORG_Random_seed(37*seed*(rank+1));
1.9. Printing the Pareto set
The optimization is performed by the multi-master Borg MOEA on the problem in line 1. Then, only the controller process will return a non-NULL result. The controller aggregates all of the Pareto optimal solutions generated by each master. Then print the Pareto optimal solutions to a separate file and frees any allocated memory. Lines 13-15 shutdown the parallel processes and exit the program.
BORG_Archive result = BORG_Algorithm_ms_run(problem);
if (result != NULL) {
outputFile = fopen(outputFilename, "w");
if (!outputFile) {
BORG_Debug("Unable to open final output file\n");
}
BORG_Archive_print(result, outputFile);
BORG_Archive_destroy(result);
fclose(outputFile);
}
BORG_Algorithm_ms_shutdown();
BORG_Problem_destroy(problem);
return EXIT_SUCCESS;
}
2. Makefile
The following makefile compiles the different versions of the Borg MOEA (serial, master-slave and multi-master slave) as well as the DTLZ2 examples for each version and generates the executables. The CC compiler is required for the serial version while the mpicc compiler is required for the parallel versions. From your terminal, access the mm-borg-moea directory and type make to compile the dtlz2 examples.
CC = gcc
MPICC = mpicc
CFLAGS = -O3
LDFLAGS = -Wl,-R,\.
LIBS = -lm
UNAME_S = $(shell uname -s)
ifneq (, $(findstring SunOS, $(UNAME_S)))
LIBS += -lnsl -lsocket -lresolv
else ifneq (, $(findstring MINGW, $(UNAME_S)))
# MinGW is not POSIX compliant
else
POSIX = yes
endif
compile:
$(CC) $(CFLAGS) $(LDFLAGS) -o dtlz2_serial.exe dtlz2_serial.c borg.c mt19937ar.c $(LIBS)
ifdef POSIX
$(CC) $(CFLAGS) $(LDFLAGS) -o borg.exe frontend.c borg.c mt19937ar.c $(LIBS)
endif
$(MPICC) $(CFLAGS) $(LDFLAGS) -o dtlz2_ms.exe dtlz2_ms.c borgms.c mt19937ar.c $(LIBS)
$(MPICC) $(CFLAGS) $(LDFLAGS) -o dtlz2_mm.exe dtlz2_mm.c borgmm.c mt19937ar.c $(LIBS)
.PHONY: compile
3. Submission file
This is an example of a submission file. Parallel Borg uses portable batch system (PBS) to manage the distribution of batch jobs across the available nodes in the cluster. In the following script, the -N flag is the name of the job, the -l flag is the list of required resources, in this file we are requesting 1 node with 16 processors per node. We are also requesting for 5 hours of wallclock time. Index of the output and error files and the path of the output stream. Line 6 specifies the path of the current working directory. In lines 10 through 16 we run multiple seeds in a loop and call mpirun with the name of the executable that was generated compiling the examples in section 2. The following script is saved as a bash file, for instance: mpi-dtlz2.sh.
#!/bin/bash
#PBS -N dtlz2
#PBS -l nodes=1:ppn=16
#PBS -l walltime=5:00:00
#PBS -j oe
#PBS -o output
cd $PBS_O_WORKDIR
NSEEDS=9
SEEDS=$(seq 0 ${NSEEDS})
for SEED in ${SEEDS}
do
mpirun ./dtlz2_mm.exe ${SEED}
done
4. Submitting and managing jobs in a cluster
Once our submission file is ready, we can grant it permission using the following command:
chmod -x ./mpi-dtlz2.sh
Submit it to the cluster as such:
qsub mpi-dtlz2.sh
You can also delete a job using the qdel command and the job identifier:
qdel 24590
You can also hold a job:
qhold
Show status of the jobs:
qstat
Displays information about active, queued or recently completed jobs:
showq