Using Borg in Parallel and Serial with a Python Wrapper – Part 2

This blog post is Part 2 of a two-part series that will demonstrate how I have coupled a pure Python simulation model with the Borg multi-objective evolutionary algorithm (MOEA). I recommend reading Part 1 of this series before you read Part 2. In Part 1, I explain how to get Borg and provide sample code showing how you can access Borg’s serial and/or parallelized (master-slave) implementations through a Python wrapper (borg.py). In Part 2, I provide details for more advanced simulation-optimization setups that require you pass additional information from the borg wrapper into the simulation model (the “function evaluation”) other than just decision variable values.

In Part 1, the example simulation model I use (PySedSim) is called through a function handle “Simulation_Caller” in the example_sim_opt.py file. Borg needs only this function handle to properly call the simulation model in each “function evaluation”. Borg’s only interaction with the simulation model is to pass the simulation model’s function handle (e.g., “Simulation_Caller”) the decision variables, and nothing else. In many circumstances, this is all you need.

However, as your simulation-optimization setup becomes more complex, in order for your simulation model (i.e., the function evaluation) to execute properly, you may need to pass additional arguments to the simulation model from Borg other than just the decision variables. For example, in my own use of Borg in a simulation-optimization setting, in order to do optimization I first import a variety of assumptions and preferences to set up a Borg-PySedSim run. Some of those assumptions and preferences are helpful to the simulation model (PySedSim) in determining how to make use of the decision variable values Borg feeds it. So, I would like to pass those relevant assumptions and preferences directly into the Borg wrapper (borg.py), so the wrapper can in turn pass them directly into the simulation model along with the decision variable values.

Before I show how to do this, let me provide a more concrete example of how/why I am doing this in my own research. In my current work, decision variable values represent parameters for a reservoir operating policy that is being optimized. The simulation model needs to know how to take the decision variable values and turn them into a useful operating policy that can be simulated. Some of this information gets imported in order to run Borg, so I might as well pass that information directly into the simulation model while I have it on hand, rather than importing it once again in the simulation model.

To do what I describe above, we just need to modify the two functions in the example_sim_opt.py module so that a new argument “additional_inputs” is passed from borg to the simulation handle.  Using my python code from blog post 1, I provide code below that is modified in the Simulation_Caller() function on lines 5, 21, 22 and 27; and in the Optimization() function on lines 55, 56 and 70. After that code, I then indicate how I modify the borg.py wrapper so it can accept this information.

import numpy as np
import pysedsim # This is your simulation model
import platform  # helps identify directory locations on different types of OS

def Simulation_Caller(vars, additional_inputs):
    '''
    Purpose: Borg calls this function to run the simulation model and return multi-objective performance.

    Note: You could also just put your simulation/function evaluation code here.

    Args:
        vars: A list of decision variable values from Borg
        additional_inputs: A list of python data structures you want to pass from Borg into the simulation model.
    Returns:
        performance: policy's simulated objective values. A list of objective values, one value each of the objectives.
    '''

    borg_vars = vars  # Decision variable values from Borg

    # Unpack lists of additional inputs from Borg (example assumes additional inputs is a python list with two items)
    borg_dict_1 = additional_inputs[0]
    borg_dict_2 = additional_inputs[1]

    # Reformat decision variable values as necessary (.e.g., cast borg output parameters as array for use in simulation)
    op_policy_params = np.asarray(borg_vars)
    # Call/run simulation model with decision vars and additional relevant inputs, return multi-objective performance:
    performance = pysedsim.PySedSim(decision_vars = op_policy_params, sim_addl_input_1 = borg_dict_1, sim_addl_input_2 = borg_dict_2)
    return performance

def Optimization():

    '''

    Purpose: Call this method from command line to initiate simulation-optimization experiment

    Returns:
        --pareto approximate set file (.set) for each random seed
        --Borg runtime file (.runtime) for each random seed

    '''

    import borg as bg  # Import borg wrapper

    parallel = 1  # 1= master-slave (parallel), 0=serial

    # The following are just examples of relevant MOEA specifications. Select your own values.
    nSeeds = 25  # Number of random seeds (Borg MOEA)
    num_dec_vars = 10  # Number of decision variables
    n_objs = 6  # Number of objectives
    n_constrs = 0  # Number of constraints
    num_func_evals = 30000  # Number of total simulations to run per random seed. Each simulation may be a monte carlo.
    runtime_freq = 1000  # Interval at which to print runtime details for each random seed
    decision_var_range = [[0, 1], [4, 6], [-1,4], [1,2], [0,1], [0,1], [0,1], [0,1], [0,1], [0,1]]
    epsilon_list = [50000, 1000, 0.025, 10, 13, 4]  # Borg epsilon values for each objective
    borg_dict_1 = {'simulation_preferences_1': [1,2]}  # reflects data you want Borg to pass to simulation model
    borg_dict_2 = {'simulation_preferences_2': [3,4]}  # reflects data you want Borg to pass to simulation model

    # Where to save seed and runtime files
    main_output_file_dir = 'E:\output_directory'  # Specify location of output files for different seeds
    os_fold = Op_Sys_Folder_Operator()  # Folder operator for operating system
    output_location = main_output_file_dir + os_fold + 'sets'

    # If using master-slave, start MPI. Only do once.
    if parallel == 1:
        bg.Configuration.startMPI()  # start parallelization with MPI

    # Loop through seeds, calling borg.solve (serial) or borg.solveMPI (parallel) each time
    for j in range(nSeeds):
        # Instantiate borg class, then set bounds, epsilon values, and file output locations
        borg = bg.Borg(num_dec_vars, n_objs, n_constrs, Simulation_Caller, add_sim_inputs = [borg_dict_1, borg_dict_2])
        borg.setBounds(*decision_var_range)  # Set decision variable bounds
        borg.setEpsilons(*epsilon_list)  # Set epsilon values
        # Runtime file path for each seed:
        runtime_filename = main_output_file_dir + os_fold + 'runtime_file_seed_' + str(j+1) + '.runtime'
        if parallel == 1:
            # Run parallel Borg
            result = borg.solveMPI(maxEvaluations='num_func_evals', runtime=runtime_filename, frequency=runtime_freq)

        if parallel == 0:
            # Run serial Borg
            result = borg.solve({"maxEvaluations": num_func_evals, "runtimeformat": 'borg', "frequency": runtime_freq,
                                 "runtimefile": runtime_filename})

        if result:
            # This particular seed is now finished being run in parallel. The result will only be returned from
            # one node in case running Master-Slave Borg.
            result.display()

            # Create/write objective values and decision variable values to files in folder "sets", 1 file per seed.
            f = open(output_location + os_fold + 'Borg_DPS_PySedSim' + str(j+1) + '.set', 'w')
            f.write('#Borg Optimization Results\n')
            f.write('#First ' + str(num_dec_vars) + ' are the decision variables, ' + 'last ' + str(n_objs) +
                    ' are the ' + 'objective values\n')
            for solution in result:
                line = ''
                for i in range(len(solution.getVariables())):
                    line = line + (str(solution.getVariables()[i])) + ' '

                for i in range(len(solution.getObjectives())):
                    line = line + (str(solution.getObjectives()[i])) + ' '

                f.write(line[0:-1]+'\n')
            f.write("#")
            f.close()

            # Create/write only objective values to files in folder "sets", 1 file per seed. Purpose is so that
            # the file can be processed in MOEAFramework, where performance metrics may be evaluated across seeds.
            f2 = open(output_location + os_fold + 'Borg_DPS_PySedSim_no_vars' + str(j+1) + '.set', 'w')
            for solution in result:
                line = ''
                for i in range(len(solution.getObjectives())):
                    line = line + (str(solution.getObjectives()[i])) + ' '

                f2.write(line[0:-1]+'\n')
            f2.write("#")
            f2.close()

            print("Seed %s complete") %j

    if parallel == 1:
        bg.Configuration.stopMPI()  # stop parallel function evaluation process

def Op_Sys_Folder_Operator():
    '''
    Function to determine whether operating system is (1) Windows, or (2) Linux

    Returns folder operator for use in specifying directories (file locations) for reading/writing data pre- and
    post-simulation.
    '''

    if platform.system() == 'Windows':
        os_fold_op = '\\'
    elif platform.system() == 'Linux':
        os_fold_op = '/'
    else:
        os_fold_op = '/'  # Assume unix OS if it can't be identified

    return os_fold_op
 

 

Next, you will need to acquire the Borg wrapper using the instructions I specified in my previous blog post. You will need to make only two modifications: (1) modify the Borg class in borg.py so it accepts the inputs you want to pass to the simulation; and (2) some additional internal accounting in borg.py to ensure those inputs are passed to the borg.py methods that deal with your function handle. I will address these two in order.

First, modify the Borg class in borg.py so it now accepts an additional input (I only show some of the borg.py code here, just to indicate where changes are being made):


class Borg:
    def __init__(self, numberOfVariables, numberOfObjectives, numberOfConstraints, function, epsilons = None, bounds = None, directions = None, add_sim_inputs=None):

    # add_sim_inputs is the new input you will pass to borg

 

Then, modify the portion of the borg.py wrapper where self.function is called, so it can accommodate any simulation inputs you have specified.


if add_pysedsim_inputs is None:
    self.function = _functionWrapper(function, numberOfVariables, numberOfObjectives, numberOfConstraints, directions)
else:
    # More simulation inputs are specified and can be passed to the simulation handle
    self.function = _functionWrapper(function, numberOfVariables, numberOfObjectives, numberOfConstraints, directions, addl_inputs=add_sim_inputs)

After the above, the last step is to modify the _functionWrapper method in borg.py:


def _functionWrapper(function, numberOfVariables, numberOfObjectives, numberOfConstraints, directions=None, addl_inputs=None):
    # addl_inputs will be passed into the simulation model
    def innerFunction(v,o,c):
    global terminate
    try:
        if addl_inputs is None:
            result = function(*[v[i] for i in range(numberOfVariables)])
        else:
            result = function([v[i] for i in range(numberOfVariables)], addl_inputs)

Advertisements

Using Borg in Parallel and Serial with a Python Wrapper – Part 1

Simulation and optimization are frequently used to solve complex water resources and environmental systems problems. By itself, a simulation model begs the question “what to simulate?” Similarly, by itself, an optimization model begs the question “is the solution really best?” For this reason, simulation and optimization models are frequently coupled.

This blog post is part 1 of a multi-part series that will demonstrate how I have coupled a pure Python simulation model with the multi-objective evolutionary optimization algorithm Borg. In this post, I will show how you can access Borg’s serial and/or parallelized (master-slave) implementations through a Python wrapper (borg.py).

Please see this previous blog post for some background about Borg, and how to obtain it. My instructions below assume you have access to the Borg files.

In the setup I will describe below, Borg parameterizes and iteratively refines solutions (e.g., reservoir operating policies) to a problem, optimizing them in response to their simulated performance with respect to multiple objectives.

You will need the following Borg files (see link above for how to download these):

  • Serial (i.e., borg.c, libborg.so, etc.) and/or master-slave (i.e., borgms.c, libborgms.so, etc.) implementations of Borg, depending upon your ability to parallelize.
  • Python wrapper for Borg (borg.py), which will allow you to to access Borg easily in Python.

You will need to create the following files yourself (I provide sample code below for these files):

  • example_sim_opt.py—A python module that should contain two main functions:
    1. A simulation caller, which takes decision variables and returns multi-objective performance. This function is called “Simulation_Caller” in the example_sim_opt.py code below.
    2. An optimization function, which calls the Borg MOEA through its python wrapper borg.py. This borg.py wrapper provides extensive docstring documentation regarding required arguments, returned values, etc., so I do suggest reading through the wrapper if you have questions (e.g., about the python data types of arguments and returns).

Note that the file and function names above are just example names. You can name the above files whatever you want. Just be sure to modify the code I provide below to reflect the new names.

A sample of code for example_sim_opt.py is as follows:

import numpy as np
import pysedsim # This is your simulation model
import platform  # helps identify directory locations on different types of OS

def Simulation_Caller(vars):
    '''
    Purpose: Borg calls this function to run the simulation model and return multi-objective performance.

    Note: You could also just put your simulation/function evaluation code here.

    Args:
        vars: A list of decision variable values from Borg

    Returns:
        performance: policy's simulated objective values. A list of objective values, one value each of the objectives.
    '''

    borg_vars = vars  # Decision variable values from Borg

    # Reformat decision variable values as necessary (.e.g., cast borg output parameters as array for use in simulation)
    op_policy_params = np.asarray(borg_vars)
    # Call/run simulation model, return multi-objective performance:
    performance = pysedsim.PySedSim(decision_vars = op_policy_params)
    return performance

def Optimization():

    '''

    Purpose: Call this method from command line to initiate simulation-optimization experiment

    Returns:
        --pareto approximate set file (.set) for each random seed
        --Borg runtime file (.runtime) for each random seed

    '''

    import borg as bg  # Import borg wrapper

    parallel = 1  # 1= master-slave (parallel), 0=serial

    # The following are just examples of relevant MOEA specifications. Select your own values.
    nSeeds = 25  # Number of random seeds (Borg MOEA)
    num_dec_vars = 10  # Number of decision variables
    n_objs = 6  # Number of objectives
    n_constrs = 0  # Number of constraints
    num_func_evals = 30000  # Number of total simulations to run per random seed. Each simulation may be a monte carlo.
    runtime_freq = 1000  # Interval at which to print runtime details for each random seed
    decision_var_range = [[0, 1], [4, 6], [-1,4], [1,2], [0,1], [0,1], [0,1], [0,1], [0,1], [0,1]]
    epsilon_list = [50000, 1000, 0.025, 10, 13, 4]  # Borg epsilon values for each objective

    # Where to save seed and runtime files
    main_output_file_dir = 'E:\output_directory'  # Specify location of output files for different seeds
    os_fold = Op_Sys_Folder_Operator()  # Folder operator for operating system
    output_location = main_output_file_dir + os_fold + 'sets'

    # If using master-slave, start MPI. Only do once.
    if parallel == 1:
        bg.Configuration.startMPI()  # start parallelization with MPI

    # Loop through seeds, calling borg.solve (serial) or borg.solveMPI (parallel) each time
    for j in range(nSeeds):
        # Instantiate borg class, then set bounds, epsilon values, and file output locations
        borg = bg.Borg(num_dec_vars, n_objs, n_constrs, Simulation_Caller)
        borg.setBounds(*decision_var_range)  # Set decision variable bounds
        borg.setEpsilons(*epsilon_list)  # Set epsilon values
        # Runtime file path for each seed:
        runtime_filename = main_output_file_dir + os_fold + 'runtime_file_seed_' + str(j+1) + '.runtime'
        if parallel == 1:
            # Run parallel Borg
            result = borg.solveMPI(maxEvaluations='num_func_evals', runtime=runtime_filename, frequency=runtime_freq)

        if parallel == 0:
            # Run serial Borg
            result = borg.solve({"maxEvaluations": num_func_evals, "runtimeformat": 'borg', "frequency": runtime_freq,
                                 "runtimefile": runtime_filename})

        if result:
            # This particular seed is now finished being run in parallel. The result will only be returned from
            # one node in case running Master-Slave Borg.
            result.display()

            # Create/write objective values and decision variable values to files in folder "sets", 1 file per seed.
            f = open(output_location + os_fold + 'Borg_DPS_PySedSim' + str(j+1) + '.set', 'w')
            f.write('#Borg Optimization Results\n')
            f.write('#First ' + str(num_dec_vars) + ' are the decision variables, ' + 'last ' + str(n_objs) +
                    ' are the ' + 'objective values\n')
            for solution in result:
                line = ''
                for i in range(len(solution.getVariables())):
                    line = line + (str(solution.getVariables()[i])) + ' '

                for i in range(len(solution.getObjectives())):
                    line = line + (str(solution.getObjectives()[i])) + ' '

                f.write(line[0:-1]+'\n')
            f.write("#")
            f.close()

            # Create/write only objective values to files in folder "sets", 1 file per seed. Purpose is so that
            # the file can be processed in MOEAFramework, where performance metrics may be evaluated across seeds.
            f2 = open(output_location + os_fold + 'Borg_DPS_PySedSim_no_vars' + str(j+1) + '.set', 'w')
            for solution in result:
                line = ''
                for i in range(len(solution.getObjectives())):
                    line = line + (str(solution.getObjectives()[i])) + ' '

                f2.write(line[0:-1]+'\n')
            f2.write("#")
            f2.close()

            print("Seed %s complete") %j

    if parallel == 1:
        bg.Configuration.stopMPI()  # stop parallel function evaluation process

def Op_Sys_Folder_Operator():
    '''
    Function to determine whether operating system is (1) Windows, or (2) Linux

    Returns folder operator for use in specifying directories (file locations) for reading/writing data pre- and
    post-simulation.
    '''

    if platform.system() == 'Windows':
        os_fold_op = '\\'
    elif platform.system() == 'Linux':
        os_fold_op = '/'
    else:
        os_fold_op = '/'  # Assume unix OS if it can't be identified

    return os_fold_op
 

The following is an example of how you would submit a batch script on a Linux cluster to run a parallelized simulation-optimization experiment using the example_sim_opt.py and borg.py files. Note that in the parallelized version, simulations (i.e., “function evaluations”) are being run in parallel by separate processors.

You would need the following two files:

  1. example_sim_opt_caller.py. This is a python file that is used to call example_sim_opt.py
  2. example_sim_opt_batch_script.pbs. This is batch script that runs example_sim_opt_caller.py in parallel on a cluster using open MPI.

Example code for example_sim_opt_caller.py:


'''
Purpose: To initiate the optimization process, which will iteratively call the simulation model.
'''

import example_sim_opt  # Import main optimization module that uses borg python wrapper

# Module within example
example_sim_opt.Optimization()

Example code for example_sim_opt_batch_script.pbs:


#PBS -l nodes=8:ppn=16
#PBS -l walltime=24:00:00
#PBS -j oe
#PBS -o pysedsim_output.out

cd $PBS_O_WORKDIR
source /etc/profile.d/modules.sh
module load openmpi-1.6.5-intel-x86_64/gnu
module load python-2.7.5
mpirun python example_sim_opt_caller.py

You could then run the above batch script with a command such as:

qsub example_sim_opt_batch_script.pbs

Basic implementation of the parallel Borg MOEA

This post walks through the implementation of the different parallel versions of the Borg MOEA: master-slave and multi-master slave.  The implementation is demonstrated  using the DTLZ2 example provided with the multi-master source code. We will break this post in four sections, the fist one describes the Main file were the problem is defined and the required libraries are specified.  The second part describes the Makefile to compile and create the executables for the dtlz2 problem, the third section describes the Submission file to manage the distribution of jobs in a cluster, and finally we’ll cover a  Job submission example in the fourth section.

Both of the parallel Borg implementations are described in detail in the following papers:

Hadka, D., and Reed, P.M., “Large-scale Parallelization of the Borg MOEA for Many-Objective Optimization of Complex Environmental Systems”, Environmental Modelling & Software, v69, 353-369, 2015.

Reed, P.M. and Hadka, D., “Evolving Many-Objective Water Management to Exploit Exascale Computing”, Water Resources Research, v50, n10, 8367–8373, 2014.

1. Main file

1.1. Required headers

The main file refers to the file were we specify the main function called : dtlz2_mm.c,  dtlz2 is the test problem we are solving,  the mm refers to multi-master implementation.  First, you will need the mpi header (line 6), this is a message passing library required for parallel computers and clusters.  You will also need to provide the Borg multi-master header file: borgmm.h (line 7),  if your working  directory is different from that of the multi-master directory, you will need to specify the path, for intstance: “./mm-borg-moea/borgmm.h”.

#include <stdio.h>;
#include <stdlib.h>;
#include <math.h>;
#include <time.h>;
#include <math.h>;
#include <mpi.h>;
#include "borgmm.h";

1.2. Problem definition

In the following lines the DTLZ2 problem is defined as it would be in the serial version.  The following funciton is responsible for reading the decision variables and evaluating the problem.  We will be using the 2 objective version of this problem; however, you can scale it up.  The rule is nvars=nobjs+9.  Hence, if you want to try up to 5 objectives you can simply change lines 2 and 3 to be: nvars= 14 and nobjs=5.

#define PI 3.14159265358979323846
int nvars = 11;
int nobjs = 2;
void dtlz2(double* vars, double* objs, double* consts) {
	int i;
	int j;
	int k = nvars - nobjs + 1;
	double g = 0.0;

	for (i=nvars-k; i<nvars; i++) {
		g += pow(vars[i] - 0.5, 2.0);
	}

	for (i=0; i<nobjs; i++) {
		objs[i] = 1.0 + g;

		for (j=0; j<nobjs-i-1; j++) {
			objs[i] *= cos(0.5*PI*vars[j]);
		}

		if (i != 0) {
			objs[i] *= sin(0.5*PI*vars[nobjs-i-1]);
		}
	}
}

1.3. Enabling multiple seeds

The following lines enable to use random seeds as external arguments.  This links your main file to your submission file where multiple seeds are submitted to the cluster.  This segment is coupled with the submission file discussed in section 3.

int main(int argc, char* argv[]) {
unsigned int seed = atoi(argv[1]);

1.4. Variable declarations

In lines 1-2 of the following code, a couple of loop variables are declared, j and rank, which will be used later in section 1.7 and 1.9.  Also, the maximum number of function evaluations are declared. We will print out the runtime file and the file with the Pareto set; hence, the size of the output file names are also specified in lines 4-7.

int j;
int rank;
int NFE = 100000;
char outputFilename[256];
FILE* outputFile = NULL;
char runtime[256];
char timing[256];

1.5. Parallel borg parameters and core allocation

In the following segment, we specify some key parameters for the parallel Borg.  First, all multi-master runs need to call startup (in line 1).  The name of the variable argc stands for “argument count”; argc contains the number of arguments we pass to the program. The name of the variable argv stands for “argument vector”, which are the arguments that we pass to the program.  Then, the number of islands is specified in line 2.  If you want to use the master-slave configuration, you can simply assign one island.  The only difference with the pure master-slave code is that it uses uniform population samples, whereas the multi-master code uses latin hypercube population samples.  Line 3 specifies the maximum wallclock time estimated for your job.  Finally, global latin hypercube initialization is specified in line 5 to ensure each island gets a well sampled distribution of solutions.

BORG_Algorithm_ms_startup(&argc, &argv);
BORG_Algorithm_ms_islands(2);
BORG_Algorithm_ms_max_time(0.1);
BORG_Algorithm_ms_max_evaluations(NFE);
BORG_Algorithm_ms_initialization(INITIALIZATION_LATIN_GLOBAL);

Keep in mind the core allocation, if you have 32 available cores for a computation, when using a master-slave configuration, one core will be allocated to the master and the remaining  31 cores will be available for the function evaluations.  If you use a 2 multi-master configuration, one core will be allocated to the controller and two will be allocated to the masters (1 core per master), leaving 29 cores available for the function evaluations.  If you are using a small cluster, I wouldn’t recommend going higher than 4 masters.

1.6. Problem definition

The next segment creates the DTLZ2 problem.  In line 1, the number of decision variables, objectives and constraints are specified, the last argument, dtlz2, references the function that evaluates the DTLZ2 problem shown in section 1.2.  In lines 3-5 the lower and upper bounds for each decision variable are set to 0 and 1.  In lines 7-9 the epsilon values used by the Borg MOEA are specified to define the problem resolution for each objective to 0.01.

BORG_Problem problem = BORG_Problem_create(nvars, nobjs, 0, dtlz2);

for (j=0; j<nvars; j++) {
BORG_Problem_set_bounds(problem, j, 0.0, 1.0);
}

for (j=0; j<nobjs; j++) {
BORG_Problem_set_epsilon(problem, j, 0.01);
}

1.7. Printing runtime output

We specify the output frequency in line 1, this will print 100 generations, since we specified 100,000 maximum function evaluations, Borg will provide runtime output every 1000 NFE.  Lines 2 and 3 save the Pareto sets and the runtime dynamics to a file.  The %d gets replaced by the index of the seed and the %%d gets replaced by the index of the master, make sure that the sets and runtime folders exist in your working directory.

	BORG_Algorithm_output_frequency((int)NFE/100);
	sprintf(outputFilename, &quot;./sets/DTLZ2_S%d.set&quot;, seed);
	sprintf(runtime, &quot;./runtime/DTLZ2_S%d_M%%d.runtime&quot;, seed);
	BORG_Algorithm_output_runtime(runtime);

1.8. Parallelizing seeds

The MPI_Comm_rank routine gets the rank of this process. The rank is used to ensure each parallel process uses a different random seed.

    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    BORG_Random_seed(37*seed*(rank+1));

1.9. Printing the Pareto set

The optimization is performed by the multi-master Borg MOEA on the problem in line 1.  Then, only the controller process will return a non-NULL result.  The controller aggregates all of the Pareto optimal solutions generated by each master. Then print the Pareto optimal solutions to a separate file and frees any allocated memory.  Lines 13-15 shutdown the parallel processes and exit the program.


	BORG_Archive result = BORG_Algorithm_ms_run(problem);

	if (result != NULL) {
		outputFile = fopen(outputFilename, &quot;w&quot;);
		if (!outputFile) {
			BORG_Debug(&quot;Unable to open final output file\n&quot;);
		}
		BORG_Archive_print(result, outputFile);
		BORG_Archive_destroy(result);
		fclose(outputFile);
	}

	BORG_Algorithm_ms_shutdown();
	BORG_Problem_destroy(problem);
	return EXIT_SUCCESS;
}

2. Makefile

The following makefile compiles the different versions of the Borg MOEA (serial, master-slave and multi-master slave) as well as the DTLZ2 examples for each version and generates the executables.  The CC compiler is required for the serial version while the mpicc compiler is required for the parallel versions.  From your terminal, access the mm-borg-moea directory and type make to compile the dtlz2 examples.

CC = gcc
MPICC = mpicc
CFLAGS = -O3
LDFLAGS = -Wl,-R,\.
LIBS = -lm
UNAME_S = $(shell uname -s)

ifneq (, $(findstring SunOS, $(UNAME_S)))
	LIBS += -lnsl -lsocket -lresolv
else ifneq (, $(findstring MINGW, $(UNAME_S)))
	# MinGW is not POSIX compliant
else
	POSIX = yes
endif

compile:
	$(CC) $(CFLAGS) $(LDFLAGS) -o dtlz2_serial.exe dtlz2_serial.c borg.c mt19937ar.c $(LIBS)

ifdef POSIX
	$(CC) $(CFLAGS) $(LDFLAGS) -o borg.exe frontend.c borg.c mt19937ar.c $(LIBS)
endif

	$(MPICC) $(CFLAGS) $(LDFLAGS) -o dtlz2_ms.exe dtlz2_ms.c borgms.c mt19937ar.c $(LIBS)
	$(MPICC) $(CFLAGS) $(LDFLAGS) -o dtlz2_mm.exe dtlz2_mm.c borgmm.c mt19937ar.c $(LIBS)
.PHONY: compile

 

3. Submission file

This is an example of a submission file.  Parallel Borg uses portable batch system (PBS) to manage the distribution of batch jobs across the available nodes in the cluster.  In the following script, the -N flag is the name of the job, the -l flag is the list of required resources, in this file we are requesting 1 node with 16 processors per node.  We are also requesting for 5 hours of wallclock time.  Index of the output and error files and the path of the output stream.  Line 6 specifies the path of the current working directory.  In lines 10 through 16 we run multiple seeds in a loop and call mpirun with  the name of the executable that was generated compiling the examples in section 2.  The following script is saved as a bash file, for instance: mpi-dtlz2.sh.

#!/bin/bash
#PBS -N dtlz2
#PBS -l nodes=1:ppn=16
#PBS -l walltime=5:00:00
#PBS -j oe
#PBS -o output

cd $PBS_O_WORKDIR

NSEEDS=9
SEEDS=$(seq 0 ${NSEEDS})

for SEED in ${SEEDS}
do
  mpirun ./dtlz2_mm.exe ${SEED}
done

4.  Submitting and managing jobs in a cluster

Once our submission file is ready, we can grant it permission using the following command:

chmod -x ./mpi-dtlz2.sh

Submit it to the cluster as such:

qsub mpi-dtlz2.sh

You can also delete a job using the qdel command and the job identifier:

qdel 24590

You can also hold a job:

qhold

Show status of the jobs:

qstat

Displays information about active, queued or recently completed jobs:

showq

 

 

 

Simple Python script to create a command to call the Borg executable

Please see my GitHub for a new Python program that may be of interest. If the link becomes broken in the future, look up the ‘misc_scripts’ repository within ‘jrkasprzyk’ GitHub.

It saves you having to type out a long Borg command when using the ‘command line interface’, especially when there are many many decision variables. The script also contains a simple set of Python objects of interest: Objective contains the name and epsilon of an objective, and Decision contains the name, lower bound, and upper bound of an objective.

You can imagine that this would be able to facilitate complicated MOEA analyses such as random seed analysis, and the running of multiple problems. I have also used this to organize my thoughts when doing other analyses in Python such as sampling of the decision space, and different sorts of processing of MOEA output.

On constraints within MOEAs

When thinking about setting up a problem formulation for an MOEA like Borg, it’s helpful to spend a minute (or 15!) talking about our friend the constraint.  I’ve written about problem formulations in the past, and we even have a video about problem formulations, but there’s a subtlety that I though it would be helpful to discuss again here!

A comment about constraints in classical mathematical programming formulations

Classical mathematical programming formulations are chock-full of constraints.  Hundreds of constraints, even!  This is due to the fact that when you are setting up a linear program (LP) or something similar, the set of constraints allows the program to have some amount of realism.  The classical linear programming water allocation problem (for example, something you would find in the classic Revelle et al textbook)  is something like:

Decisions: water allocations to different actors in different periods of time

Objective: minimize cost of the allocation plan, or maximize the net benefits of the plan

Constraints:

  1. Ensure that demands are met

  2. Ensure that, at each supply node, inflow equals outflow

  3. Ensure that the capacity of the conveyance system is not exceeded

  4. Ensure that all decision variables are greater than or equal to zero

  5. Keep the water quality values above or below some threshold

We have proposed 5 categories of constraints, and there may be even more.  But mathematically, this simple formulation above could have hundreds of constraints, if you have to solve each equation for a particular point in time, or at a particular spatial location.

This is perfectly fine!  If you are solving a LP, I encourage you to put more constraints in there because constraints are really the only way that you can make your solutions look more physically accurate.  LP theory will tell you that, if there is one or more optimal solutions, at least one of them will be at the intersection of two or more of the constraints.  So, the constraints are important (more on this later).

How the simulation-optimization of MOEAs is different

The MOEA approach we talk about on this blog is a simulation-optimization approach.  What that means, is that instead of having a set of a few hundred constraints to model your problem, you are actually using a simulation model of the system to model it.  This is going to fundamentally change the way that your decisions and constraints are defined, and it may even change your objectives.  Let’s talk about each in turn, using our simple water allocation example as, well, as an example.

Decisions: It would probably make more sense to optimize rules about how the water is allocated, instead of actually treating each decision variable as a separate volume of water over time.  There are a number of reasons why this is.  In the formulation above, if each decision variable is an allocation of water in a particular period of time, once you start to plan over long planning horizons, the problem will get unwieldy, with thousands of variables.  Plus, if you say that decision variable x_83 is the allocation of water to user 5, at t = 3, then how are you to plan what happens if you don’t have accurate or deterministic data for what happens at t = 3?  But that’s really the purpose of another post.  We’re here to talk about:

Constraints: So, in the above formulation, constraints were designed to create physical reality to your system.  For example, if the decision variables are volumes of water, and you have a basic estimation of the concentration of some pollutant, you can create a constraint that says the number of kilograms of a pollutant should be less than a particular value.  In that fashion, you can think of hundreds of constraints, and we just talked about this, right?

In the simulation-optimization approach, though the simulation model will provide physical reality to the system, not the constraint set.  So it is better to treat constraints as limits on acceptable performance.  For example if you are calculating reliability, you can say that all solutions should have at least some lower threshold of reliability (say, 98%, for example).  Or, don’t have constraints at all and instead just handle the value as an objective!  The point of doing tradeoff analysis with a MOEA is to find diversity of solutions, after all.  If you find that the performance of the values is too poor, or that you have too many solutions, you can impose constraints next time.  Or, you can also change the epsilon resolution within Borg in order to change the number of points in the tradeoff analysis; see this paper by Kollat and Reed where they showed the same tradeoff set with different resolutions, for example.

Constraints can also help with the representation of decisions within your formulation.  After all, the most basic constraint in any LP is that xi >= 0, right?  But think about creative ways to change the representation of decision variables such that all of the algorithm’s solutions are feasible.  This will greatly help the search process!

For example, consider a problem where you have to create drought restriction plans at different levels.  In other words, imagine that the value of restriction at level 2 needed to be greater than the value at level 1.  Mathematically, you could set this up as a constraint (x2 >= x1), but it will be very difficult to meet this, especially with an initially random population within a MOEA.  Especially when you have, say, 5 or 10 or 100 of these levels that each need to be greater as you go up. So, instead, you can do something like this.  Let xi’ be the decision variable in the algorithm, and xi (without a prime) be the actual decision variable used in your model.  Then, let:

x1 = x1′

x2 = x1 + x2′

The algorithm actually searches on x1′ and x2′.  Now, if you set your bounds up correctly, the algorithm can never generate an infeasible solution, and you are spending all your computational resources within search actually finding good, feasible solutions instead of just trying to find a single set of feasible solutions with which to work.  Something similar to this was used in Zeff et al. (2014).

Some comments on how constraint handling works within Borg

The concepts of epsilon non-domination, and non-domination, are used to determine whether solutions survive within the search process.  The general concept of non-domination is, informally, that a solution’s objective function values are better in one objective and at least as good as another solution in all other objectives.  But that’s not what we’re here to talk about!  We are here to talk about constraints, remember? 🙂

Well remember that the definition of a feasible and infeasible solution is the same in LP-land versus for us.  Feasible solutions meet all constraints, infeasible solutions violate one or more constraints. An infeasible solution is defined to not be acceptable to the decision maker.  This is an extremely important point to remember, so it is worth repeating!  An infeasible solution is unacceptable. Because of this, it’s up to you to set up the constraints to keep acceptable solutions, and throw away unacceptable (infeasible) solutions.

However, there is an important distinction between, say, an LP and what we do.  Remember the simplex algorithm for solving LPs?  What it does is intelligently finds a corner of the feasible space (defined by your constraint set), and then it basically walks through each of the corners, trying each one of them to determine if the corner is optimal.  Notice what’s going on here, though.  Most of the time, at least for a simple problem, it is easy to find a feasible solution in the simplex method, and then you are off to the races.  If you can’t find a feasible solution, you’re in a lot of trouble!

Similarly, you also need to find one feasible solution to get started with MOEA search…preferably you’ll have many many feasible solutions!  But what if it is difficult to find a feasible solution? This is going to impact your ability to find solutions, and if your function evaluation count, or search duration, is not long enough you can have a failed search.

So let’s get to the point of this section, which is explaining how constraint handling works within Borg.  As explained in Reed et al., most of the algorithms we’ve typically tested within this blogosphere use restricted tournament selection. There are three conditions where a solution can dominate b:

  1. Both solutions are feasible, and based on objective function performance, a dominates b

  2. The solution is feasible, and is infeasible.

  3. Both and are infeasible, and the sum of a’s constraint violations are less than the sum of b‘s constraint violations

So as you can see, item 3 is very interesting!  What item 3 means, is that Borg or these other algorithms will actually keep some infeasible solutions in the population or archive while all solutions are infeasible.  This is important because it helps move the population toward eventually finding a feasible region of the decision space!  But, according to item 2, any solution that is feasible will beat an infeasible solution, which is consistent with the fact that infeasible solutions are unacceptable, and oh gosh I already said that a few times didn’t I!

By the way, you can also use penalty functions to handle constraints, instead of using this built in restricted tournament selection feature.  An example of that is in the Kollat and Reed paper that I already linked to.  Plus there are other ways to do this too!  By no means is this an exhaustive discussion, but I hope what we’ve talked about is helpful.  Anyway:

Some informal recommendations on how to use constraints within Borg if you choose to use them

  1. If you don’t have to use constraints, don’t use them!  Optimize without them and see how you do (see above).

  2. Make the constraint violations that you’re reporting to Borg meaningful.  Notice item 3 in the restricted tournament selection actually does use the values that you report from constraint handling.  I’ve seen some folks just say: if infeasible, c = 100000000.  But that won’t actually help the restricted tournament mechanism move the set toward meaningful, feasible solutions.  For example let’s say that r gives you a value of reliability for a system, and reliability must be above capital R.  You could set your constraint violation to: c = R – r, which will give you the difference between your performance and what you are required to have.  If all of your solutions are infeasible initially, the algorithm will favor a solution that has, say, 1% reliability lower than R over a solution that has 20% reliability lower than R.

  3. If you have an expensive simulation, skip the objective function calculations if the solution is infeasible.  The objective function values for an infeasible solution are not really used for anything, in restricted tournament selection.  So you can report the constraint violation, but don’t spend time calculating objective functions that will never be used.

Two quick (important) caveats here:  One, remember that infeasible solutions are unacceptable.  And the reason why I keep bringing this up is the fact that you can technically shoot yourself in the foot here, and impose constraints that are too strict, where a real decision maker may actually want solutions that violate your imposed constraint.  So be careful when you define them!  Two, if you do skip the objective function calculation be sure to still report a value of your objectives to Borg, even if that value is just a placeholder.  This is important, to make sure the Borg procedure keeps running.  If you have questions about these caveats please leave them in the comment box below.

To conclude

This has been quite a long post so thanks for sticking with me til the end.  Hopefully you’ve learned something, I know I did!  We can continue the discussion on this blog (for the contributors) and on the comment boxes below (for everyone)!  Have a great day.

Basic Borg MOEA use for the truly newbies (Part 2/2)

This post will walk you though a simple example using Borg MOEA to solve the DTLZ2, 3 objective instance.  I suggest reading first Part 1 of this post.  Once you have compiled the Borg MOEA, you are ready to follow this example.

The DTLZ2 problem, named after its creators :Deb, Thiele, Laumanns and Zitzler, is a very popular function to test MOEAs’ performance since  its  Pareto-optimal solutions are analytically known, and it is easily scalable to any number of objectives.

Step 1. Set the DTLZ2, 3 objective instance

In your borg folder,  open the dtlz2.py function. You should see the following script:


from sys import *
from math import *

nvars = 11
nobjs = 2
k = nvars - nobjs + 1

while True:
 # Read the next line from standard input
 line = raw_input()

# Stop if the Borg MOEA is finished
 if line == "":
 break

# Parse the decision variables from the input
 vars = map(float, line.split())

# Evaluate the DTLZ2 problem
 g = 0

for i in range(nvars-k, nvars):
 g = g + (vars[i] - 0.5)**2

objs = [1.0 + g]*nobjs

for i in range(nobjs):
 for j in range(nobjs-i-1):
 objs[i] = objs[i] * cos(0.5 * pi * vars[j])
 if i != 0:
 objs[i] = objs[i] * sin(0.5 * pi * vars[nobjs-i-1])

# Print objectives to standard output, flush to write immediately
 print " ".join(["%0.17f" % obj for obj in objs])
 stdout.flush()

You can easily scale the number of objectives in the DTLZ2 function by modifying lines 4 and 5 in the code.  Since we want to solve the 3 objective configuration of the problem, we will change the nobjs  and set it equal to 3.  In general :  number of variables= number of objectives + 9;  in this case nvars= 12. Alternatively, if we want to work with the 5 objective instance, then we would have nobjs= 5 and nvars= 14.


from sys import *
from math import *

nvars = 12
nobjs = 3
k = nvars - nobjs + 1

Once you have updated the file, save it and go back to your terminal.

Step 2. Run the DTLZ2 3 objective instance from your terminal

From your terminal  make sure you are in the borg folder and type the following command:

./borg.exe -v 12 -o 3 -e 0.01,0.01,0.01 -l 0,0,0,0,0,0,0,0,0,0,0,0 -u 1,1,1,1,1,1,1,1,1,1,1,1 -n 10000 python dtlz2.py > dtlz2.set

The -v and the -o flags, as you probably guessed are the number of decision variables and objectives, respectively. The -e flag stands for the epsilon precision values, here the same epsilon values are specified for the three objectives for simplicity, just remember that they don’t necessarily need to be the same across objectives.  The -l and -u flag are the decision variables’ lower and upper bounds.  These bounds can vary according to the problem.  The -n flag stands for the number of iterations, here we specify 1000,  you can certainly increase the number of iterations.  In the last part of the command you call the dtlz2.py function, finally, the optimization results are saved  in a separate file using the > “greater than”  symbol.  Once you typed the previous script, you should see a new file in your borg folder called dtlz2.set.  Before we go on, type the following command in your terminal and see what happens:

./borg.exe -h

The -h flag provides information about the different options available.

Step 3. Select the objectives

Now, open your newly generated dtlz2.set file. you will see something like this:

# BORG version: 1.8
# Current time: Thu Jun 25 17:12:00 2015
# Problem: python dtlz2.py
# Number of variables: 12
# Number of objectives: 3
# Number of constraints: 0
# Lower bounds: 0 0 0 0 0 0 0 0 0 0 0 0
# Upper bounds: 1 1 1 1 1 1 1 1 1 1 1 1
# Epsilons: 0.01 0.01 0.01
# Directions: 0 0 0
# Constraint mode: E
# Seed: (time)
1 0.814090298840556947 0.409932057219915436 0.453114658489247812 0.602037485943714312 0.612211413314472264 0.63507815287255176 0.284561154879054812 0.319971228064652446 0.510268965514402262 0.565795181158015859 0.352339424454035932 2.00000000000000014e-17 7.00000000000000035e-17 1.1566219845629353
0.200664515777928404 0.658663937901916241 0.459011341616803015 0.456423615588035791 0.72948415840455394 0.502347551242693147 0.55253340411468177 0.523000325718170678 0.424118159653203652 0.418071947834685986 0.307354991375789255 0.667924369576233357 0.55237113861310283 0.929550654944727661 0.352579197090774399
0.600637603847835599 0.635510585462702893 0.537202278296969316 0.480501397259140872 0.538845435998899225 0.565887992485444302 0.540269661215032948 0.32823362137901485 0.548470138166351484 0.485323707827721162 0.661267336189765076 0.603296309515394924 0.342802362666870974 0.531842633344500104 0.872739730448630957
0.200664515777928404 0.962019489314716703 0.478360997918643505 0.456423615588035791 0.72948415840455394 0.508917943579420218 0.434628101329525951 0.523000325718170678 0.416896640814134301 0.418071947834685986 0.307354991375789255 0.667924369576233357 0.0645572417675578242 1.08080813229192496 0.353051662882838901
0.580252259871913978 0.626070509625966998 0.537503391458319713 0.47945571905554607 0.540634044436276051 0.565466312655102277 0.5375236610793378 0.336204078423532282 0.541224805744518811 0.481480141441438469 0.645345342922852394 0.597006852306095737 0.362763876068478097 0.544895812217087494 0.844603875086156197

This is only a snippet of what you should see, here only five solutions are shown (one solution per row); for an epsilon of 0.01 you should see at least 100 solutions.  The first 12 columns in the file represent the 12 decision variables and the last 3 columns are the objectives.  In order to visualize the Pareto front, we want to select only the objectives.  To do so, type the following command:

awk 'Begin {FS=" "}; /^#/ {print $0}; /^[-]?[0-9]/ {printf("%s %s %s\n", $13,$14,$15)}' dtlz2.set > dtlz2.obj

The previous command is fully described in Joe’s post.  Once you run it you should see a new file called dtlz2.obj, with three columns, one for each of the objectives.

Step 4. Visualize the Pareto front

The Pareto front of the DTLZ2 problem is the first quadrant of a unit sphere centered at the origin and all the decision variables are equal to 0.5.  For the 3-objective case that we solved in this example, the Pareto approximate front generated by Borg MOEA is depicted in the figure below:

Featured image

With your results, attempt to generate a similar plot in the language of your choice.  Once you have visualized the Pareto Front, you will have completed this exercise!

Click here to learn more about the DTLZ2 problem.


			

Compiling shared libraries on Windows (32 bit and 64 bit systems)

While playing with the Borg R-wrapper, I ran into difficulties loading the shared borg.dll library into R as a result of mingw compiling for a 32-bit system when I was using the 64 bit version of R.  After running the following command from the Windows command line,

 


gcc -shared -O3 -o borg.dll borg.c mt19937ar.c -lm

I compiled a .dll file that worked fine for the python wrapper because I have a 32 bit version of Python and the 32 bit version of the R GUI but resulted in error messages when I tried to optimize a problem using Borg in RStudio.  I was able to resolve this issue by installing MinGW-w64 and selecting the x86_64 architecture option as shown in the following screenshot.

MinGW-64-installation

Then, I added the following to my Windows path:

C:\Program Files\mingw-w64\x86_64-4.9.2-posix-seh-rt_v3-rev1\mingw64\bin

and ran the following command.  After this, everything ran smoothly with the Borg R wrapper.

x86_64-w64-mingw32-gcc -shared -O3 -o borg.dll borg.c mt19937ar.c -lm 

I found the following site useful for appropriate commands to compile using MinGW-w64.

http://fedoraproject.org/wiki/MinGW/Tutorial