Checkpointing and restoring runs with the Borg MOEA

Introduction

Simulation-optimization is an increasingly popular tool in many engineering fields. As the name implies, simulation-optimization requires two major components: a simulation model of the system we want to optimize (e.g., a water supply system) and an algorithm for iteratively tuning the decision levers within the system to optimize the simulated performance. For example, Multi-Objective Evolutionary Algorithms (MOEAs), such as the Borg MOEA, have been successfully employed across a variety of challenging stochastic multi-objective simulation-optimization problems in water resources and other fields. However, simulation-optimization can require millions of iterative simulation trials in order to adequately explore the decision and uncertainty spaces, especially under multi-objective framings. Given the ever-increasing size and complexity of simulation models, this can become a significant computational challenge.

Running the MOEA in parallel on high-performance computing clusters can greatly alleviate this challenge, but not eliminate it. For example, in my last blog post on hybrid parallelization of the Borg MOEA, I gave the example of optimizing infrastructure investments using a simulation model that takes 10 minutes to run a 30-year realization. Let’s assume we run 48 Monte Carlo realizations per function evaluation (FE) to account for natural hydrologic variability. Even using the maximum job size of 128 nodes on the Stampede2 supercomputer, with 48 CPU each, we will only be about to complete ~20,000 FEs per 24 hours of wall clock. To complete 100,000 FE, a relatively low target compared to other similarly-complex studies, the MOEA will need to run for ~5 days. This raises a major problem: many computing clusters have strict wall clock limits. On Stampede2, for example, each submission can run for a maximum of 48 hours. As a result, we would be limited to ~40,000 FE for a single Stampede2 run, which may be insufficient based on the difficulty of the problem.

Fortunately, the Borg MOEA’s “checkpointing” and “restoring” functionalities allow us to circumvent this issue. Checkpointing is the process of saving “snapshots” of the MOEA’s internal state at regular intervals, and restoring is the act of reading a snapshot into memory at the beginning of a run and continuing the optimization from that starting point. This allows us to effectively split a 5-day optimization into three sequential submissions, in order to comply with a 2-day wall clock limit. This functionality can also be helpful for iteratively assessing convergence, since we don’t necessarily know a priori how many FE will need to be run.

In this blog post, I will demonstrate the use of checkpointing and restoring with the Borg MOEA using a less computationally-demanding example: the Python-based DPS Lake Problem. However, the methods and results described here should be applicable to much larger experiments such as the one described above. See my last blog post and references therein for more details on the Python implementation of the DPS Lake Problem.

Setup

For this post, I ran the following short proof of concept: first I ran five random seeds of the Borg MOEA for 5 minutes each. Each seed was assigned 2 nodes (16 CPU each) on the Cube cluster at Cornell. Each seed completed approximately 6100 FE in this first round. Next, I ran a second round in which each seed began from the 6100-FE checkpoint and ran for another 5 minutes. All code for this post, along with instructions for downloading the correct branch of the Borg MOEA, can be found in this GitHub repository.

Setting up checkpointing only takes a few extra lines of code in the file “borg_lake_msmp.py”. First, we need to define the baseline file name used to store the checkpoint files:

newCheckpointFileBase = 'results/checkpoints/seed' + str(seed) 

Within the simulation, checkpoints will be made at regular intervals. The checkpoint file names will automatically have the FE appended; for example, the checkpoint for seed 5 after 6100 FE is written to “results/checkpoints/seed5_nfe6100.checkpoint”. The write frequency for both checkpoints and runtime metrics can be changed based on the “runtimeFrequency” parameter in “submit_msmp.sh”.

We can also provide a previous checkpoint to read in at the beginning of the run:

if maxEvalsPrevious > 0:
    oldCheckpointFile = 'results/checkpoints/seed' + str(seed) + '_nfe' + str(maxEvalsPrevious) + '.checkpoint'

where “maxEvalsPrevious” is the number of FE for the prior checkpoint that we want to read in. This parameter is input in “submit_msmp.sh”. The previous checkpoint file must be placed in the “checkpoints” folder prior to runtime.

We provide both checkpoint file names to the Borg MOEA, along with other important parameters such as the maximum number of evaluations.

result = borg.solveMPI(maxEvaluations=maxEvals, runtime=runtimeFile, frequency=runtimeFreq, newCheckpointFileBase=newCheckpointFileBase, oldCheckpointFile=oldCheckpointFile, evaluationFile=evaluationFile)

The Python wrapper for the Borg MOEA will use this command to initialize and run the MOEA across all available nodes. For more details on the implementation of the experiment, see the GitHub repository above.

The checkpoint file

So what information is saved in the checkpoint file? This file includes all internal state variables needed to initialize a new instance of the Borg MOEA that begins where the previous run left off. As an example, I will highlight select information from “seed5_nfe6100.checkpoint”, the last checkpoint file (at 6100 FE) for seed 5 after the first round.

First, the file lists basic problem formulation information such as the number of objectives, the decision variable bounds, and the epsilon dominance parameters:

Version: 108000
Number of Variables: 6
Number of Objectives: 4
Number of Constraints: 1
Lower Bounds: -2 0 0 -2 0 0
Upper Bounds: 2 2 1 2 2 1
Epsilons: 0.01 0.01 0.0001 0.0001
Number of Operators: 6

Next, it lists current values for several parameters which evolve over the course of the optimization, such as the selection probabilities for the six recombination operators, the number of times the algorithm has triggered a “restart” due to stalled progress, and the number of solutions in the population:

Operator Selection Probabilities: 0.93103448275862066 0.0086206896551724137 0.017241379310344827 0.025862068965517241 0.0086206896551724137 0.0086206896551724137
Number of Evaluations: 6100
Tournament Size: 3
Window Size: 200
...
Number of Restarts: 0
...
Population Size: 196
Population Capacity: 196

The file then lists each of the 196 solutions currently stored in the population:

Population:
 0.52276605276517329 1.1277550312248441 0.57174902357263913 -0.053873968914707165 1.5491763609435476 0.68045375364276195 -0.2810269062318374 0.18479149938317199 -1 -1 0 0
...
 0.27892075855585108 0.45130064043648516 0.77550656566943632 0.31799415686665755 1.0881848609123497 0.60273027430733062 -0.25193459430454329 0.16022123986158276 -0.98989898989898994 -1 0 0

Each line in this section represents a different solution; the numbers represent the 6 decision variables, 4 objectives, and 1 constraints for this problem, as well as the index of the operator used to create each solution.

Similarly, the file lists the current archive, which is the subset of 64 solutions from the larger population that are non-dominated:

Archive Size: 64
Archive:
 0.65953806401594717 1.3878689423124047 0.5978484500419472 0.15387554544946802 1.1931240566680981 0.62731909408989983 -0.32254774259834024 0.21682247784367689 -1 -1 0 0
...
 0.28214763078942073 0.44992540083980864 0.77495402511037126 0.1548097106329358 1.0881858398758053 0.6850068814635788 -0.2456560390454271 0.15760082516545762 -0.98989898989898994 -1 0 0

The checkpointing file then lists several parameters associated with the most recently used recombination operators:

Number of Improvements: 194
Recency List Size: 50
Recency List Position: 47
Recency List: 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

We see that after 6100 FE, the algorithm is almost exclusively using the operator with index 0; this is consistent with operator 0’s 93% selection probability listed earlier in the file.

Finally, the checkpoint file closes with the current state of the random number generator:

RNG State Length: 624
RNG State: 2103772999 2815404811 3893540187 ...
RNG Index: 200
Next Gaussian: -0.13978937466755278
Have Next Gaussian: 0

Results

So does it work? Figure 1 shows the convergence of the five seeds according to six different measures of quality for the evolving archive. Performance for all seeds is found to be continuous across the two rounds, suggesting that the checkpointing and restoring is working properly. Additionally, Figure 2 demonstrates the continuity of the auto-adaptive selection probabilities for the Borg MOEA’s different recombination operators.

Figure 1: Performance of 5 random seeds across 2 rounds, quantified by six different measures of approximate Pareto set quality.
Figure 2: Adaptive selection probabilities for the Borg MOEA’s six recombination operators.

The checkpointing process thus allows for continuity of learning across rounds: not only learning about which solutions are most successful (i.e., continuity of the population and archive), but also learning about which strategies for generating new candidate solutions are most successful (i.e., continuity of the recombination operator histories and probabilities). This capability allows for seamlessly sequencing separate “blocks” of function evaluations in situations where wall clock limits are limiting, or where the convergence behavior of the problem is not well understood a priori.

To parallelize or not to parallelize? On performance of hybrid schemes in Python with the Borg MOEA

Hybrid parallelization is often used when performing noisy simulation-optimization on high-performance computing clusters. One common technique is to divide the available CPUs into a smaller number of “tasks”, each of which uses its subset of CPUs to assess the performance of candidate solutions through parallel evaluation of multiple scenarios. In this post, I will demonstrate how to use Python’s Multiprocessing module to set up a hybrid parallelization scheme with the Borg MOEA, and explore the resulting tradeoff between efficiency and learning rate in order to better understand when one should (and should not) implement such a scheme.

This post will bring together and build upon several previous posts on this blog, including (1) this demonstration by Andrew Dircks for how to use the Borg MOEA’s Python wrapper to optimize multi-objective simulations written in Python, using the Lake Problem as example; (2) this explanation of the Borg MOEA’s hybrid parallelization strategies by David Gold; and (3) this post by Bernardo Trindade on parallelizing Monte Carlo evaluations using Python and C/C++. The present post will give a brief overview of these topics in order to show how they can be brought together, but the reader is referred to the posts above for more detail on any of these topics.

Why hybrid parallelization?

As a motivating example, consider a complex water resources simulation model that takes 10 minutes to run a 30 year simulation. We want to explore the potential for new infrastructure investments to improve multiple objectives such as water supply reliability, cost, etc. We will try to discover high-performing solutions using an MOEA. To account for uncertainty, each candidate investment will be evaluated in 48 different stochastic scenarios.

Given the expense of our simulation model, this will be a major computational challenge. If we evaluate 20,000 candidate solutions (which is quite low compared to other studies), the total run time will be 20000 * 48 * 10 minutes = 6,667 days. Clearly, we cannot afford to run this serially on a laptop. If we have access to a computer cluster, we can parallelize the MOEA search and speed things up drastically. For example, with an XSEDE allocation on TACC Stampede2, I can run jobs with up to 128 Skylake nodes, each of which has 48 CPU, for a total of 6144 CPU per job. With this setup, we could finish the same experiment in ~1.1 days.

With the simplest possible parallelization strategy, we have a single CPU as the “Master” for the Borg MOEA, and 6143 CPU running individual function evaluations (FE). Each FE is allocated a single CPU, which runs 48 stochastic simulations in serial (i.e., there is no hybrid parallelization taking place). At the other extreme, I could set the MOEA to run with an entire node per “task”. This would allocate a full node to the Borg Master process, plus 127 nodes running individual FEs. Each FE would distribute its 48 stochastic simulations across its 48 CPUs in parallel. Thus, we are running fewer concurrent FEs (127 vs. 6143), but each finishes much faster (10 minutes vs. 480 minutes). In between these two extremes, there are intermediate parallelization schemes that allocate 2, 3, 4, 6, 8, 12, 16, or 24 CPU per task (Table 1).

So how to choose a parallelization strategy? The first thing to consider is efficiency. In theory, the scheme without hybrid parallelization (1 CPU per task) is the most efficient, since it allocates only one CPU to the Borg Master process, allowing for 6143 CPU to be simultaneously performing simulations. The 48 CPU/task scheme, on the other hand, allocates an entire node to the Borg Master (even though it only needs a single CPU to perform its duties), so that only 127*48=6096 CPU are running simulations at a given time. Thus, it should be ~1% slower in theory. However, in practice, a variety of factors such as communication or I/O bottlenecks can change this behavior (see below), so it is important to run tests on your particular machine.

If the hybrid parallelization scheme is slower, why should we even consider it? The key point is the increased potential for “learning” within the MOEA loop. Remember that the 1 CPU/task scheme has 6143 FE running concurrently, while the 48 CPU/task scheme has 127 FE. This means that the former will only have approximately 20000/6143=3.3 “generations,” while the latter will have 20000/127=157.5 generations*. Put another way, each FE will take 480 minutes (30% of total 1.1 days) to complete for the 1 CPU/task scheme, compared to 10 minutes (0.6% of total) for the 48 CPU/task scheme. The latter thus provides a more continuous stream of information to the MOEA Master, and therefore many more opportunities for the MOEA to learn what types of solutions are successful through iterative trial and error.

*Note: The Borg MOEA is a steady-state algorithm rather than a generational algorithm, so I use the term “generation” loosely to refer to the number of FE that each task will complete. See Dave Gold’s blog post on the Borg MOEA’s parallelization for more details on generational vs. steady-state algorithms. Note that for this post, I am using the master-worker variant of the Borg MOEA.

Function evaluations (FE)Total CPUTasks/nodeCPU/taskConcurrent FEGenerations
20000614448161433.3
20000614424230716.5
20000614416320479.8
200006144124153513.0
20000614486102319.6
2000061446876726.1
20000614441251139.1
20000614431638352.2
20000614422425578.4
200006144148127157.5
Table 1: Scaling of concurrent function evaluations and generations for different hybrid parallelization schemes, using 128 Skylake nodes (48 CPU each) on Stampede2.

A simple example: hybrid parallelization of the Lake Problem in Python

Above, I have outlined a purely theoretical tradeoff between efficiency and algorithmic learning. How does this tradeoff play out in practice? What type of hybrid scheme is “best”? I will now explore these questions in more detail using a smaller-scale example. The Lake Problem is a popular example for demonstrating concepts such as non-linear dynamics, deep uncertainty, and adaptive multi-objective control; a search for “Lake Problem” on the Water Programming Blog returns many results (I stopped counting at 30). In this problem, the decision-maker is tasked with designing a regulatory policy that balances the economic benefits of phosphorus with its environmental risks, given the large uncertainty in the complex dynamics that govern lake pollution concentrations over time. In a 2017 article in Environmental Modelling & Software, Julie Quinn demonstrated the merits of Evolutionary Multi-Objective Direct Policy Search (EMODPS) for developing control policies that adaptively update phosphorus releases based on evolving conditions (e.g., current lake concentration).

A Python version of the EMODPS Lake Problem was introduced along with a Python wrapper for the Borg MOEA in this blog post by Andrew Dircks. Additionally, this post by Bernardo Trindade demonstrates how to parallelize Monte Carlo ensembles in Python as well as C/C++. For the present example, I combined aspects of these two posts, in order to parallelize the Monte Carlo evaluations within each function evaluation, using the Multiprocessing module in Python. For this blog post, I will highlight a few of the important differences relative to the non-parallel version of the problem in Andrew Dircks’ blog post, but the interested reader is referred to these previous two post for more detail. All code used to produce these results can be found in this GitHub repository.

The lake_mp.py file specifies the analysis that the Borg MOEA should run for each function evaluation – we will evaluate each candidate operating policy using an ensemble of 48 simulations, each of which uses a different 100-year stochastic realization of hydrology. This ensemble will be divided among a smaller number of CPUs (say, 16) assigned to each function evaluation. Thus, we will have to set up our code a bit differently than in the serial version. First, consider the overall “problem” function which is called by the Borg Master. Within this function, we can set up shared arrays for each objective. The Array class from the multiprocessing module allows for thread-safe access to the same memory array from multiple CPUs. Note that this module requires shared memory (i.e., the CPUs must be on the same node). If you want to share memory across multiple nodes, you will need to use the mpi4py module, rather than multiprocessing.

from multiprocessing import Process, Array
# create shared memory arrays which each process can access/write.
discounted_benefit = Array('d', nSamples)
yrs_inertia_met = Array('i',  nSamples)
yrs_Pcrit_met = Array('i', nSamples)
average_annual_P = Array('d', nSamples*nYears)

Now we need to figure out which simulations should be performed by each CPU, and dispatch those simulations using the Process class from the multiprocessing module.

# assign MC runs to different processes
nbase = int(nSamples / nProcesses)
remainder = nSamples - nProcesses * nbase
start = 0
shared_processes = []
for proc in range(nProcesses):
    nTrials = nbase if proc >= remainder else nbase + 1
    stop = start + nTrials
    p = Process(target=dispatch_MC_to_procs, args=(proc, start, stop, discounted_benefit, yrs_inertia_met, yrs_Pcrit_met, average_annual_P, natFlow, b, q, critical_threshold, C, R, newW))

    shared_processes.append(p)
    start = stop

# start processes
for sp in shared_processes:
    sp.start()

# wait for all processes to finish
for sp in shared_processes:
    sp.join()

The “args” given to the Process function contains information needed by the individual CPUs, including the shared memory arrays. Once we have appended the processes for each CPU, we start them, and then wait for all processes to finish (“join”).

The “target” in the Process function tells each CPU which subset of Monte Carlo samples to evaluate, and runs the individual evaluations in a loop:

#Sub problem to dispatch MC samples from individual processes
def dispatch_MC_to_procs(proc, start, stop, discounted_benefit, yrs_inertia_met, yrs_Pcrit_met, average_annual_P, natFlow, b, q, critical_threshold, C, R, newW):
    lake_state = np.zeros([nYears + 1])
    Y = np.zeros([nYears])
    for s in range(start, stop):
        LakeProblem_singleMC(s, discounted_benefit, yrs_inertia_met, yrs_Pcrit_met, average_annual_P, lake_state, natFlow[s, :], Y, b, q, critical_threshold, C, R, newW)

Within the loop, we call LakeProblem_singleMC to run each Monte Carlo evaluation. After running the simulation, this function can store its sub-objectives in the shared arrays:

# fill in results for MC trial in shared memory arrays, based on index of this MC sample (s)
average_annual_P[(s*nYears):((s+1)*nYears)] = lake_state[1:]
discounted_benefit[s] = db
yrs_inertia_met[s] = yim
yrs_Pcrit_met[s] = yPm

Finally, back in the main “problem” function, we can evaluate the objectives by aggregating over the shared memory arrays:

# Calculate minimization objectives (defined in comments at beginning of file)
objs[0] = -1 * np.mean(discounted_benefit)  #average economic benefit
objs[1] = np.max(np.mean(np.reshape(average_annual_P, (nSamples, nYears)), axis=0))  #minimize the max average annual P concentration
objs[2] = -1 * np.sum(yrs_inertia_met) / ((nYears - 1) * nSamples)  #average percent of transitions meeting inertia thershold
objs[3] = -1 * np.sum(yrs_Pcrit_met) / (nYears * nSamples)  #average reliability
constrs[0] = max(0.0, reliability_threshold - (-1 * objs[3]))

time.sleep(slp)

In the last line, I add a half-second sleep after each function evaluation in order to help this example better mimic the behavior of the much slower motivating problem at the beginning of this post. Without the sleep, the Lake Problem is much faster than larger simulation codes, and therefore could exhibit additional I/O and/or master-worker interaction bottlenecks.

Again, the interested reader is referred to the GitHub repository, as well as Andrew Dircks’ original Lake Problem blog post, for more details on how these code snippets fit into the broader program.

This experiment is run on the Cube cluster at Cornell University, using 12 nodes of 16 CPU each. I tested 5 different hybrid parallelization schemes, from 1 to 16 CPU per task (Table 2). Each formulation is run for 5 seeds to account for the random dynamics of evolutionary algorithms. For this experiment, I will show results through 800 function evaluations. Although this is much smaller than the standard practice, this short experiment allows us to mimic the behavior of the much larger experiment introduced at the beginning of this post (compare the number of Generations in Tables 1 & 2).

Function evaluations (FE)Total CPUTasks/nodeCPU/taskConcurrent FEGenerations
8001921611914.2
80019282958.4
800192444717.0
800192282334.8
8001921161172.7
Table 2: Scaling of concurrent function evaluations and generations for different hybrid parallelization schemes, using 12 nodes (16 CPU each) on TheCube.

How do different parallelization schemes compare?

Figure 1 shows the performance of the 1 & 16 CPU/task schemes, in terms of hypervolume of the approximate Pareto set. The hybrid parallelization scheme (16 CPU/task) is found to outperform the serial scheme (1 CPU/task). In general across the five seeds, the parallelized scheme is found to improve more quickly and to reach a higher overall hypervolume after 800 FE. The intermediate parallelization schemes (2, 4, & 8 CPU/task) are found to perform similarly to the 16 CPU/task (see GitHub repository for figures)

Figure 1: Hypervolume vs. number of function evaluations for parallelization schemes with 1 and 16 CPU per function evaluation.

Figure 2 shows the number of completed function evaluations over time, which helps illuminate the source of this advantage. The serial formulation shows a staircase pattern: large chunks of time with no progress, followed by short bursts with many FE finishing all at once. Overall, the FE arrive in 5 major bursts (with the last smaller than the rest), at approximately 25, 50, 75, 100, and 125 seconds. As we move to 2, 4, 8, and 16 CPU/task, the number of “stairs” is seen to double each time, so that the 16 CPU/task scheme almost approximates a straight line. These patterns are consistent with the generational pattern shown in Table 2.

Figure 2: Runtime vs. number of function evaluations for parallelization schemes with 1, 2, 4, 8, and 16 CPU per function evaluation.

So why does this matter? Elitist evolutionary algorithms use the relative performance of past evaluations to influence the random generation of new candidate solutions. Because a hybrid parallelization strategy evaluates fewer solutions concurrently and has more “generations” of information, there will be more ability to learn over the course of the optimization. For example, with the serial scheme, the first generation of 191 candidate solutions is randomly generated with no prior information. The 16 CPU/task scheme, on the other hand, only produces 11 solutions in the first generation, and has had ~17 generations worth of learning by the time it reaches 191 FE.

Additionally, the Borg MOEA has a variety of other auto-adaptive features that can improve the performance of the search process over the course of a single run (see Hadka & Reed, 2013). For example, Borg has six different recombination operators that can be used to mate and mutate previously evaluated solutions in order to generate new candidates for evaluation. The MOEA adaptively updates these selection probabilities in favor of the operators which have been most successful at producing successful offspring. The hybrid parallelization allows these probabilities to be adapted earlier and more regularly due to the more continuous accumulation of information (Figure 3). This can improve the performance of the MOEA.

Figure 3: Operator probabilities vs number of function evaluations (NFE) for parallelization schemes with 1 and 16 CPU per function evaluation. The subplots represent the auto-adaptive selection probabilities for the six recombination operators used by the Borg MOEA: differential evolution (DE), parent-centric crossover (PCX), simulated binary crossover (SBX), simplex crossover (PCX), uniform mutation (UM), and unimodal normal distribution crossover (UNDX).

What happens in longer runs?

At this point, it may seem that hybrid parallelization is always beneficial, since it can increase the algorithm’s adaptability (Figure 3) and improve performance (Figure 1) for a fairly negligible computational cost (Figure 2). However, the experiments in Tables 1 & 2 are very constrained in terms of the number of generations that are run. In most contexts, we will have more function evaluations and/or fewer nodes, so that the ratio of nodes to FE is much smaller and thus the number of generations is much larger. In these situations, do the advantages of hybrid parallelization hold up?

Figure 4 shows the results of a longer experiment, which is identical to above except for its use of 10,000 FE rather than 800 FE. Over the longer experiment, the advantage of the parallelized scheme (16 CPU/FE) is found to diminish, with random seed variability becoming dominant. With more generations, the differences between the two schemes seems to fade.

Figure 4: Same as Figure 1, but for a longer experiment of 10,000 function evaluations.

Figure 5 shows the evolution of the recombination operator selection probabilities over the longer experiment. The serial scheme still lags in terms of adaptation speed, but by the end of the experiment, the two formulations have very similar behavior.

Figure 5: Same as Figure 3, but for a longer experiment of 10,000 function evaluations.

Lastly, Figure 6 shows the runtime as a function of FEs for the two formulations. The parallelized scheme takes approximately 15% longer than the serial scheme. We can compare this to the theoretical slowdown based on the Borg Master overhead: the 16 CPU/task scheme runs 11 concurrent FE, each using 16 CPU, for a total of 176 CPU, while the serial version runs 191 concurrent FE, each using 1 CPU. 191 is 9% larger than 176. The difference between 15% and 9% indicates additional bottlenecks, such as writing to the shared array. This overhead may vary based on the characteristics of the cluster and the computational experiment, so testing should be performed prior to running any large experiment.

Figure 6: Same as Figure 2, but for a longer experiment of 10,000 function evaluations.

Given all this information, what to do? Should you use hybrid parallelization for your function evaluations? The answer, unsatisfyingly, is “it depends.” The most important factor to consider is the number of generations for your optimization, calculated as the total number of FE divided by the number of concurrent FE. If this number is relatively high (say, >10-20), then you may be better off sticking with serial FEs in order to improve efficiency. If the number of generations is small, it may be worthwhile to introduce hybrid parallelization in order to stimulate adaptive learning within the MOEA. Where to draw the line between these two poles will likely to vary based on a number of factors, including the number of nodes you are using, the computational architecture of the cluster, the runtime of each function evaluation, and the difficulty of the optimization problem, so testing is recommended to decide what to do in any given scenario.

MORDM Basics V: WaterPaths Tutorial

The MORDM framework poses four fundamental questions that each corresponds to a section within the framework shown in Figure 1:

  1. What actions can we take to address the problem?
  2. What worlds are we implementing those actions in?
  3. What are acceptable levels of performance for those actions in all worlds?
  4. What controls failures across those worlds?
Figure 1: The MORDM framework (Herman et. al., 2013).

In the previous blog post, we used state-aware ROF triggers to implement drought mitigation and supply management actions for one hydroclimatic and demand scenario. In the first MORDM blog post, we generated multiple deeply-uncertain synthetic realizations of hydroclimatic and demand scenarios. The next logical question would be: how do these actions fare across all worlds and across time?

Setting up the system

To explore this question, we will expand the scope of our test case to include Cary’s two neighboring utilities – Durham and Raleigh – within the Research Triangle. The three utilities aim to form a cooperative water supply and management agreement in which they would like to optimize the following objectives, as stated in Trindade et al (2019):

  1. Maximize supply reliability, REL
  2. Minimize the frequency of implementing water use restrictions, RT
  3. Minimize the net present value of infrastructure investment, NPV
  4. Minimize the financial cost of drought mitigation and debt repayment, FC
  5. Minimize the worst first-percentile cost of the FC

In this example, each objective is a function of short-term risks of failure triggers (stROF) and long term risks of failure triggers (ltROF). The stROFs trigger short-term action, typically on a weekly or monthly basis. The ltROFs trigger action on a multi-year, sometimes decadal, timescale. Recall from a previous post that ROF triggers are state-aware, probabilistic decision rules that dictate how a system responds to risk. Here, we optimize for a Pareto-approximate set of ROF triggers (or risk thresholds) that will result in a range of performance objective tradeoffs. An example of a stROF is the restriction ROF trigger we explored in the post prior to this one.

In addition, an example of an ltROF would be the infrastructure construction ltROF. When this ltROF threshold is crossed, an infrastructure project is ‘built’. Potential infrastructure projects are ordered in a development pathway (Zeff et al 2016), and the ltROF triggers the next infrastructure option in the sequence. The term ‘pathway’ is used as these infrastructure sequences are not fixed, but are state-dependent and can be shuffled to allow the optimization process to discover multiple potential pathways.

Over the next two blog posts, we will explore the interaction between the water-use restriction stROF and the infrastructure ltROF, and how this affects system performance. For now, we will simulate the Research Triangle test case and optimize for the ‘best’ set of ROF triggers using WaterPaths and Borg MOEA.

Using WaterPaths and Borg MOEA

We will be using the WaterPaths utility planning and management tool (Trindade et al, 2020) to simulate the performance of the Research Triangle test case. For clarification, the default simulation within WaterPaths is that of the Research Triangle. The folder that you will be downloading from GitHub already contains all the input, uncertainty and decision variable files required. This tool will be paired with the Borg MOEA (Hadka and Reed, 2013) to optimize the performance objectives in each simulation to obtain a set of Pareto-optimal long- and short-term ROF triggers that result in a non-dominated set of tradeoffs. Jazmin made a few posts that walks through compiling Borg and the installation of different parallel and series versions of Borg that might be helpful to try out before attempting this exercise.

Once you have Borg downloaded and set up, begin by downloading the GitHub repository into a file location on your machine of choice:

git clone https://github.com/lbl59/WaterPaths.git

Once all the files are downloaded, compile WaterPaths:

make gcc

To optimize the WaterPaths simulation with Borg, first move the Borg files into the main WaterPaths/borg folder:

mv -r Borg WaterPaths/borg/

This line of code will make automatically make folder called borg within the WaterPaths folder, and copy all the external Borg files into it.

Next, cd into the /borg folder and run make in your terminal. This should a generate a file called libborgms.a. Make a folder called lib within the WaterPaths folder, and move this file into the WaterPaths/lib folder

cp libborgms.a ../lib/

Next, cd back into the main folder and use the Makefile to compile the WaterPaths executable with Borg:

make borg

Great! Now, notice that the /rof_tables_test_problem folder is empty. You will need to generate ROF tables within the WaterPaths environment. To do so, run the generate_rof_tables.sh script provided in the GitHub repository into your terminal. The script provided should look like this:

#!/bin/bash
#SBATCH -n 16 -N 2 -p normal
#SBATCH --job-name=gen_rof_tables
#SBATCH --output=output/gen_rof_tables.out
#SBATCH --error=error/gen_rof_tables.err
#SBATCH --time=04:00:00
#SBATCH --mail-user=YOURUSERNAME@cornell.edu
#SBATCH --mail-type=all

export OMP_NUM_THREADS=16
cd $SLURM_SUBMIT_DIR
./waterpaths\
	-T ${OMP_NUM_THREADS}\
       	-t 2344\
       	-r 1000\
       	-d /YOURCURRENTDIRECTORY/\
       	-C 1\ 
	-m 0\
	-s sample_solutions.csv\
       	-O rof_tables_test_problem/\
       	-e 0\
       	-U TestFiles/rdm_utilities_test_problem_opt.csv\
       	-W TestFiles/rdm_water_sources_test_problem_opt.csv\
       	-P TestFiles/rdm_dmp_test_problem_opt.csv\
	-p false

Replace all the ‘YOUR…’ parameters with your system-specific details. Make two new folders: output/ and error/. Then run the script above by entering

sbatch ./generate_rof_tables.sh

into your terminal. This should take approximately 50 minutes. Once the ROF tables have been generated, run the optimize_sedento_valley.sh script provided. It should look like this:

#!/bin/bash
#SBATCH -n 16 -N 3 -p normal
#SBATCH --job-name=sedento_valley_optimization
#SBATCH --output=output/sedento_valley_optimization.out
#SBATCH --error=error/sedento_valley_optimization.err
#SBATCH --time=04:00:00
#SBATCH --mail-user=YOURUSERNAME@cornell.edu
#SBATCH --mail-type=all

DATA_DIR=/YOURDIRECTORYPATH/
N_REALIZATIONS=1000
export OMP_NUM_THREADS=16
cd $SLURM_SUBMIT_DIR

mpirun ./waterpaths -T ${OMP_NUM_THREADS}\
  -t 2344 -r ${N_REALIZATIONS} -d ${DATA_DIR}\
  -C -1 -O rof_tables_test_problem/ -e 3\
  -U TestFiles/rdm_utilities_test_problem_opt.csv\
  -W TestFiles/rdm_water_sources_test_problem_opt.csv\
  -P TestFiles/rdm_dmp_test_problem_opt.csv\
  -b true -o 200 -n 1000

As usual, replace all the ‘YOUR…’ parameters with your system-specific details. Run this script by entering

sbatch ./optimize_sedento_valley.sh

into the terminal. This script runs the Borg MOEA optimization for 1,000 function evaluations, and will output a .set file every 200 function evaluations. At the end of the run, you should have two files within your main folder:

  1. NC_output_MS_S3_N1000.set contains the Pareto-optimal set of decision variables and the performance objective values for the individual utilities and the overall region.
  2. NC_runtime_MS_S3_N1000.runtime contains information on the time it took for 1000 simulations of the optimization of the Research Triangle to complete.

The process should take approximately 1 hour and 40 minutes.

Summary

Congratulations, you are officially the Dr Strange of the Research Triangle! You have successfully downloaded WaterPaths and Borg MOEA, as well as run a simulation-optimization of the North Carolina Research Triangle test case across 1000 possible futures, in which you were Pareto-optimal in more than one. You obtained the .set files containing the Pareto-optimal decision variables and their respective performance objective values. Now that we have optimized the performance of the Research Triangle system, we are ready to examine the performance objectives and the Pareto-optimal ROF trigger values that result in this optimal set of tradeoffs.

In the next blog post, we will process the output of the .set files to visualize the objective space, decision variable space, and the tradeoff space. We will also conduct robustness analysis on the Pareto-optimal set to delve further into the question of “What are acceptable levels of performance for those actions in all worlds?”. Finally, we will explore the temporal interactions between the water use restrictions stROF and the infrastructure construction ltROF, and how supply and demand are both impacted by – and have an effect on – these decision variables.

References

Hadka, D., & Reed, P. (2013). Borg: An auto-adaptive Many-objective evolutionary computing framework. Evolutionary Computation, 21(2), 231–259. https://doi.org/10.1162/evco_a_00075

Trindade, B. C., Gold, D. F., Reed, P. M., Zeff, H. B., & Characklis, G. W. (2020). Water pathways: An open source stochastic simulation system for integrated water supply portfolio management and infrastructure investment planning. Environmental Modelling & Software, 132, 104772. https://doi.org/10.1016/j.envsoft.2020.104772

Trindade, B. C., Reed, P. M., & Characklis, G. W. (2019). Deeply uncertain pathways: Integrated multi-city regional water supply infrastructure investment and portfolio management. Advances in Water Resources, 134, 103442. https://doi.org/10.1016/j.advwatres.2019.103442

Zeff, H. B., Herman, J. D., Reed, P. M., & Characklis, G. W. (2016). Cooperative drought adaptation: Integrating infrastructure development, conservation, and water transfers into adaptive policy pathways. Water Resources Research, 52(9), 7327–7346. https://doi.org/10.1002/2016wr018771

Using the Python Borg Wrapper – Lake Problem Example

Python compatibility with the Borg MOEA is highly useful for practical development and optimization of Python models. The Borg algorithm is implemented in C, a strongly-typed, compiled language that offers high efficiency and scalability for complex optimization problems. Often, however, our models are not written in C/C++, rather simpler scripting languages like Python, MATLAB, and R. The Borg Python wrapper allows for problems in Python to be optimized by the underlying C algorithm, maintaining efficiency and ease of use. Use of the Python wrapper varies slightly across operating systems and computing architectures. This post will focus on Linux systems, like “The Cube”, our computing cluster here at Cornell. This post will work with the most current implementation of the Borg MOEA, which can be accessed with permission from this site.

The underlying communications between a Python problem and the Borg C source code are handled by the wrapper with ctypes, a library that provides Python compatibility with C types and functions in shared libraries. Shared libraries (conventionally .so files in Linux/Unix) provide dynamic linking, a systems tool that allows for compiled code to be linked and reused with different objects. For our purposes, we can think of the Borg shared library as a way to compile the C algorithm and reuse it with different Python optimization runs, without having to re-compile any code. The shared library gives the wrapper access to the underlying Borg functions needed for optimization, so we need to create this file first. In the directory with the Borg source code, use the following command to create the (serial) Borg shared library.

gcc -shared -fPIC -O3 -o libborg.so borg.c mt19937ar.c –lm 

Next, we need to move our shared library into the directory containing the Python wrapper (borg.py) and whatever problem we are optimizing.

In this post, we’ll be using the Lake Problem DPS formulation to demonstrate the wrapper. Here’s the source code for the problem:

@author: Rohini

#DPS Formulation 

#Objectives:
#1) Maximize expected economic benefit
#2) Minimize worst case average P concentration 
#3) Maximize average inertia of P control policy
#4) Maximize average reliability 

#Constraints: 
#Reliability has to be <85%

#Decision Variables 
#vars: vector of size 3n 
#n is the number of radial basis functions needed to define the policy
#Each has a weight, center, and radius (w1, c1, r1..wm,cn,rn)

#Time Horizon for Planning, T: 100 Years
#Simulations, N: 100 

"""

import numpy as np
from sys import *
from math import *
from scipy.optimize import root
import scipy.stats as ss

# Lake Parameters
b = 0.42
q = 2.0

# Natural Inflow Parameters
mu = 0.03
sigma = np.sqrt(10**-5)

# Economic Benefit Parameters
alpha = 0.4
delta = 0.98

# Set the number of RBFs (n), decision variables, objectives and constraints
n = 2
nvars = 3 * n
nobjs = 4
nYears = 100
nSamples = 100
nSeeds = 2
nconstrs = 1

# Set Thresholds
reliability_threshold = 0.85
inertia_threshold = -0.02


###### RBF Policy ######

#Define the RBF Policy
def RBFpolicy(lake_state, C, R, W):
    # Determine pollution emission decision, Y
    Y = 0
    for i in range(len(C)):
        if R[i] != 0:
            Y = Y + W[i] * ((np.absolute(lake_state - C[i]) / R[i])**3)

    Y = min(0.1, max(Y, 0.01))

    return Y


###### Main Lake Problem Model ######

def LakeProblemDPS(*vars):

    seed = 1234

    #Solve for the critical phosphorus level
    def pCrit(x):
        return [(x[0]**q) / (1 + x[0]**q) - b * x[0]]

    sol = root(pCrit, 0.5)
    critical_threshold = sol.x

    #Initialize arrays
    average_annual_P = np.zeros([nYears])
    discounted_benefit = np.zeros([nSamples])
    yrs_inertia_met = np.zeros([nSamples])
    yrs_Pcrit_met = np.zeros([nSamples])
    lake_state = np.zeros([nYears + 1])
    objs = [0.0] * nobjs
    constrs = [0.0] * nconstrs

    #Generate nSamples of nYears of natural phosphorus inflows
    natFlow = np.zeros([nSamples, nYears])
    for i in range(nSamples):
        np.random.seed(seed + i)
        natFlow[i, :] = np.random.lognormal(
            mean=log(mu**2 / np.sqrt(mu**2 + sigma**2)),
            sigma=np.sqrt(log((sigma**2 + mu**2) / mu**2)),
            size=nYears)

    # Determine centers, radii and weights of RBFs
    C = vars[0::3]
    R = vars[1::3]
    W = vars[2::3]
    newW = np.zeros(len(W))

    #Normalize weights to sum to 1
    total = sum(W)
    if total != 0.0:
        for i in range(len(W)):
            newW[i] = W[i] / total
    else:
        for i in range(len(W)):
            newW[i] = 1 / n

    #Run model simulation
    for s in range(nSamples):
        lake_state[0] = 0
        Y = np.zeros([nYears])

        #find policy-derived emission

        Y[0] = RBFpolicy(lake_state[0], C, R, newW)

        for i in range(nYears):
            lake_state[i + 1] = lake_state[i] * (1 - b) + (
                lake_state[i]**q) / (1 +
                                     (lake_state[i]**q)) + Y[i] + natFlow[s, i]
            average_annual_P[
                i] = average_annual_P[i] + lake_state[i + 1] / nSamples
            discounted_benefit[
                s] = discounted_benefit[s] + alpha * Y[i] * delta**i

            if i >= 1 and ((Y[i] - Y[i - 1]) > inertia_threshold):
                yrs_inertia_met[s] = yrs_inertia_met[s] + 1

            if lake_state[i + 1] < critical_threshold:
                yrs_Pcrit_met[s] = yrs_Pcrit_met[s] + 1

            if i < (nYears - 1):
                #find policy-derived emission
                Y[i + 1] = RBFpolicy(lake_state[i + 1], C, R, newW)

# Calculate minimization objectives (defined in comments at beginning of file)
    objs[0] = -1 * np.mean(discounted_benefit)  #average economic benefit
    objs[1] = np.max(
        average_annual_P)  #minimize the max average annual P concentration
    objs[2] = -1 * np.sum(yrs_inertia_met) / (
        (nYears - 1) * nSamples
    )  #average percent of transitions meeting inertia thershold
    objs[3] = -1 * np.sum(yrs_Pcrit_met) / (nYears * nSamples
                                            )  #average reliability

    constrs[0] = max(0.0, reliability_threshold - (-1 * objs[3]))

    return (objs, constrs)

The important function for this blog post is LakeProblemDPS, which demonstrates how to configure your own problem with the wrapper. Your function must take in *vars, the decision variables, and return objs, a list of objective values (or a tuple of objective values and constraints). Within the problem, refer to vars[i] as the i-th decision variable, for i in [0,nvars-1]. Set the list of objective values in the same manner.

Once our problem is defined and compatible with the wrapper, we can optimize with Borg. The following code runs the Lake problem optimization for 10,000 function evaluations.

# Serial Borg run with Python wrapper
# ensure libborg.so is compiled and in this directory
from borg import *
from lake import *

maxevals = 10000

# create an instance of the serial Borg MOEA
borg = Borg(nvars, nobjs, nconstrs, LakeProblemDPS)

# set the decision variable bounds and objective epsilons
borg.setBounds(*[[-2, 2], [0, 2], [0, 1]] * int((nvars / 3)))
borg.setEpsilons(0.01, 0.01, 0.0001, 0.0001)

# perform the optimization
# pass in a dictionary of arguments, as defined in borg.py
result = borg.solve({"maxEvaluations": maxevals})

# print the resulting objectives
for sol in result:
    print(sol.getObjectives())

Note the constructor Borg() creates an instance of the Borg algorithm with a specified number of variables, objectives, and constraints. The LakeProblemDPS argument is the objective function to be optimized by this instance of Borg. The setBounds and setEpsilons methods are required. solve() performs the optimization and takes in a dictionary of Borg parameters. See borg.py for a comprehensive list.

Using the Python wrapper to run the Parallel Borg MOEA

The previous example uses the serial Borg MOEA, but the wrapper also supports the master-worker and multi-master parallelizations. Configuring a parallel version of the algorithm requires a few additional steps. First, you must compile a shared library of the parallel implementation and move it to the wrapper directory.

For the master-worker version, use:

mpicc -shared -fPIC -O3 -o libborgms.so borgms.c mt19937ar.c -lm

For the multi-master version, use:

mpicc -shared -fPIC -O3 -o libborgmm.so borgmm.c mt19937ar.c -lm

To call the master-worker version, you must explicitly start up and shut down MPI using the Configuration class provided in borg.py. The following code performs a parallel master-worker optimization of the Lake problem:

# Master-worker Borg run with Python wrapper
# ensure libborgms.so or libborgms.so is compiled and in this directory
from borg import *
from lake import *

# set max time in hours
maxtime = 0.1

# need to start up MPI first
Configuration.startMPI()

# create an instance of Borg with the Lake problem
borg = Borg(nvars, nobjs, nconstrs, LakeProblemDPS)

# set bounds and epsilons for the Lake problem
borg.setBounds(*[[-2, 2], [0, 2], [0, 1]] * int((nvars / 3)))
borg.setEpsilons(0.01, 0.01, 0.0001, 0.0001)

# perform the optimization
result = borg.solveMPI(maxTime=maxtime)

# shut down MPI
Configuration.stopMPI()

# only the master node returns a result
# print the objectives to output
if result:
    for solution in result:
        print(solution.getObjectives())

This script must be called as a parallel process – here’s a SLURM submission script that can be used to run the optimization on 16 processors (compatible for The Cube):

#!/bin/bash
#SBATCH -J py-wrapper
#SBATCH -o normal.out
#SBATCH -e normal.err
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 16

mpirun python3 mslake.py

sbatch submission.sbatch will allocate one node with 16 processors for the optimization run.

Troubleshooting

Depending on your machine’s MPI version and your shell’s LD_LIBRARY_PATH environment variable, the Borg wrapper may try to access an unavailable mpi shared library. This issue happens on our cluster, the Cube, and causes the following error:

OSError: libmpi.so.0: cannot open shared object file: No such file or directory

In borg.py, the startMPI method attempts to access the nonexistent libmpi.so.0 shared library. To fix this, find the location of your mpi files with:

echo $LD_LIBRARY_PATH

Likely, a directory to your mpi library (i.e. /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/lib on the cube) will print. (Note, if such a path does not exist, set the LD_LIBRARY_PATH environment variable to include your mpi library) Navigate to this directory and view the file names. On the Cube, libmpi.so.0 (the library the Borg wrapper is trying to access) does not exist, but libmpi.so does (this is a software versioning discrepancy). Back in the startMPI method in borg.py, change the line

CDLL("libmpi.so.0", RTLD_GLOBAL)

to access the existing mpi library. On the cube:

CDLL("libmpi.so", RTLD_GLOBAL)

On Parallelization of the Borg MOEA – Part 1: Parallel Implementations

This post will introduce basic concepts regarding the parallelization of the Borg Multiobjective Evolutionary Algorithm (Borg MOEA). In this post I’ll assume the reader is familiar with the basic architecture of the Borg MOEA, if you’re unfamiliar with the algorithm or would like a refresher, see Hadka and Reed, (2013a) before reading further.

Parallelization Basics

Before we go into parallization of Borg, let’s quickly define some terminology. Modern High Performance Computing (HPC) resources are usually comprised of many multi-core processors, each consisting of two or more processing cores. For this post, I’ll refer to an individual processing core as a “processor”. Parallel computing refers to programs that utilize multiple processors to simultaneously perform operations that would otherwise be performed in serial on a single processor.

Parallel computing can be accomplished using either “distributed” or “shared” memory methods. Shared memory methods consist of parallelizing tasks across a group of processors that all read and write from the same memory space. In distributed memory parallelization, each processor maintains its own private memory and data is usually passed between processors using a message passing interface (MPI) library. Parallel Borg applications are coded using distributed memory parallelization, though it’s important to note that it’s possible to parallelize the simulation model that is coupled with Borg using shared memory parallelization. For additional reading on parallelization concepts see Bernardo’s post from April and Barney’s posts from 2017.

Hadka et al., (2012) showed that the quality of search results discovered by the Borg MOEA is strongly dependent on the number of function evaluations (NFE) performed in an optimization run. Efficient parallelization of search on HPC resources can allow not only for the search to be performed “faster” but also may allow more NFE to be run, potentially improving the quality of the final approximation of the Pareto front. Parallelization also offers opportunities to improve the search dynamics of the MOEA, improving the reliability and efficiency of multi-objective search (Hadka and Reed, 2015; Salazar et al., 2017).

Below I’ll discuss two parallel implementations of the Borg MOEA, a simple master-worker implementation to parallelize function evaluations across multiple processors and an advanced hybrid multi-population implementation that improves search dynamics and is scalable Petascale HPC resources.

Master-worker Borg

Figure 1: The Master Worker implementation of the Borg MOEA. (Figure from Salazar et al., 2017)

MOEA search is “embarrassingly parallel” since the evaluation of each candidate solution can be done independently of other solutions in a population (Cantu-Paz, 2000). The master-worker paradigm of MOEA parallelization, which has been in use since early days of evolutionary algorithms (Grefenstette, 1981), utilizes this property to parallelize search. In the master worker implementation of Borg in a system of P processors, one processor is designated as the “master” and P-1 processors are designated as “workers”. The master processor runs the serial version of the Borg MOEA but instead of evaluating candidate solution, it sends the decision variables to an available worker which evaluates the problem with the given decision variables and sends the evaluated objectives and constraints back to the master processor.

Most MOEAs are generational, meaning that they evolve a population in distinct stages known as generations (Coello Coello et al., 2007). During one generation, the algorithm evolves the population to produce offspring, evaluates the offspring and then adds them back into the population (methods for replacement of existing members vary according to the MOEA). When run in parallel, generational MOEAs must evaluate every solution in a generation before beginning evaluation of the next generation. Since these algorithms need to synchronize function evaluations within a generation, they are known as synchronous MOEAs. Figure 2, from Hadka et al., (2013b), shows a timeline of events for a typical synchronous MOEA. Algorithmic time (TA) represents the time it takes the master processor to perform the serial components of the MOEA. Function evaluation time (TF) is the the time it takes to evaluate one offspring and communication time (TC) is the time it takes to pass information to and from worker nodes. The vertical dotted lines in Figure 2 represent the start of each new generation. Note the periods of idle time that each worker node experiences while it waits for the algorithm to perform serial calculations and communicate. If the function evaluation time is not constant across all nodes, this idle time can increase as the algorithm waits for all solutions in the generation to be evaluated.

Figure 2: Diagram of a synchronous MOEA. Tc represents communication time, TA represents algorithmic time and TF represents function evaluation time. (Figure from Hadka et al., 2013b).

The Borg MOEA is not generational but rather a steady-state algorithm. As soon as an offspring is evaluated by a worker and returned to the master, the next offspring is immediately sent to the worker for evaluation. This is accomplished through use of a queue, for details of Borg’s queuing process see Hadka and Reed, (2015). Since Borg is not bound by the need to synchronize function evaluations within generations, it is known as an asynchronous MOEA. Figure 3, from Hadka et al., (2013b), shows a timeline of a events for a typical Borg run. Note that the idle time has been shifted from the workers to the master processor. When the algorithm is parallelized across a large number of cores, the decreased idle time for each worker has the potential to greatly increase the algorithm’s parallel efficiency. Furthermore, if function evaluation times are heterogeneous, the algorithm is not bottlenecked by slow evaluations as generational MOEAs are.

Figure 3: Diagram of an asynchronous MOEA. Tc represents communication time, TA represents algorithmic time and TF represents function evaluation time. (Figure from Hadka et al., 2013b).

While the master-worker implementation of the Borg MOEA is an efficient means of parallelizing function evaluations, the search algorithm’s search dynamics remain the same as the serial version and as the number of processors increases, the search may suffer from communication bottlenecks. The multi-master implementation of the Borg MOEA uses parallelization to not only improve the efficiency of function evaluations but also improve the quality of the multi-objective search.

Multi-Master Borg

Figure 4: The multi-master implementation of the Borg MOEA. (Figure from Salazar et al., 2017)

In population genetics, the “island model” considers distinct populations that evolve independently but periodically interbreed via migration events that occur over the course of the evolutionary process (Cantu-Paz, 2000). Two independent populations may evolve very different survival strategies based on the conditions of their environment (i.e. locally optimal strategies). Offspring of migration events that combine two distinct populations have the potential to combine the strengths developed by both groups. This concept has been utilized in the development of multi-population evolutionary algorithms (also called multi-deme algorithms in literature) that evolve multiple populations in parallel and occasionally exchange individuals via migration events (Cantu-Paz, 2000). Since MOEAs are inherently stochastic algorithms that are influenced by their initial populations, evolving multiple populations in parallel has the potential to improve the efficiency, quality and reliability of search results (Hadka and Reed, 2015; Salazar et al., 2017). Hybrid parallelization schemes that utilize multiple versions of master-worker MOEAs may further improve the efficiency of multi-population search (Cantu-Paz, 2000). However, the use of a multi-population MOEA requires the specification of parameters such as the number of islands, number of processors per island and migration policy that whose ideal values are not apparent prior to search.

The multi-master implementation of the Borg MOEA is a hybrid parallelization of the MOEA that seeks to generalize the algorithm’s ease of use and auto-adaptivity while maximizing its parallel efficiency on HPC architectures (Hadka and Reed, 2015). In the multi-master implementation, multiple versions of the master-worker MOEA are run in parallel, and an additional processor is designated as the “controller”. Each master maintains its own epsilon dominance archive and operator probabilities, but regulatory updates its progress with the controller which maintains a global epsilon dominance archive and global operator probabilities. If a master processor detects search stagnation that it is not able to overcome via Borg’s automatic restarts, it requests guidance from the controller node which seeds the master with the contents of the global epsilon dominance archive and operator probabilities. This guidance system insures that migration events between populations only occurs when one population is struggling and only consists of globally non-dominated solutions. Borg’s adaptive population sizing ensures the injected population is resized appropriately given the global search state.

The use of multiple-populations presents an opportunity for the algorithm to improve the sampling of the initial population. The serial and master-worker implementations of Borg generate the initial population by sampling decision variables uniformly at random from their bounds, which has the potential to introduce random bias into the initial search population (Hadka and Reed, 2015). In the multi-master implementation of Borg, the controller node first generates a latin hypercube sample of the decision variables, then distributes these samples between masters uniformly at random. This initial sampling strategy adds some additional overhead to the algorithm’s startup, but ensures that globally the algorithm starts with a well-distributed, diverse set of initial solutions which can help avoid preconvergence (Hadka and Reed, 2015).

Conclusion

This post has reviewed two parallel implementations of the Borg MOEA. The next post in this series will discuss how to evaluate parallel performance of a MOEA in terms of search efficiency, quality and reliability. I’ll review recent literature comparing performance of master-worker and multi-master Borg and discuss how to determine which implementation is appropriate for a given problem.

References

Cantu-Paz, E. (2000). Efficient and accurate parallel genetic algorithms (Vol. 1). Springer Science & Business Media.

Hadka, D., Reed, P. M., & Simpson, T. W. (2012). Diagnostic assessment of the Borg MOEA for many-objective product family design problems. 2012 IEEE Congress on Evolutionary Computation (pp. 1-10). IEEE.

Hadka, D., & Reed, P. (2013a). Borg: An auto-adaptive many-objective evolutionary computing framework. Evolutionary computation21(2), 231-259.

Hadka, D., Madduri, K., & Reed, P. (2013b). Scalability analysis of the asynchronous, master-slave borg multiobjective evolutionary algorithm. In 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (pp. 425-434). IEEE.

Hadka, D., & Reed, P. (2015). Large-scale parallelization of the Borg multiobjective evolutionary algorithm to enhance the management of complex environmental systems. Environmental Modelling & Software69, 353-369.

Salazar, J. Z., Reed, P. M., Quinn, J. D., Giuliani, M., & Castelletti, A. (2017). Balancing exploration, uncertainty and computational demands in many objective reservoir optimization. Advances in water resources109, 196-210.

Using Borg in Parallel and Serial with a Python Wrapper – Part 1

Simulation and optimization are frequently used to solve complex water resources and environmental systems problems. By itself, a simulation model begs the question “what to simulate?” Similarly, by itself, an optimization model begs the question “is the solution really best?” For this reason, simulation and optimization models are frequently coupled.

This blog post is part 1 of a multi-part series that will demonstrate how I have coupled a pure Python simulation model with the multi-objective evolutionary optimization algorithm Borg. In this post, I will show how you can access Borg’s serial and/or parallelized (master-slave) implementations through a Python wrapper (borg.py).

Please see this previous blog post for some background about Borg, and how to obtain it. My instructions below assume you have access to the Borg files.

In the setup I will describe below, Borg parameterizes and iteratively refines solutions (e.g., reservoir operating policies) to a problem, optimizing them in response to their simulated performance with respect to multiple objectives.

You will need the following Borg files (see link above for how to download these):

  • Serial (i.e., borg.c, libborg.so, etc.) and/or master-slave (i.e., borgms.c, libborgms.so, etc.) implementations of Borg, depending upon your ability to parallelize.
  • Python wrapper for Borg (borg.py), which will allow you to to access Borg easily in Python.

You will need to create the following files yourself (I provide sample code below for these files):

  • example_sim_opt.py—A python module that should contain two main functions:
    1. A simulation caller, which takes decision variables and returns multi-objective performance. This function is called “Simulation_Caller” in the example_sim_opt.py code below.
    2. An optimization function, which calls the Borg MOEA through its python wrapper borg.py. This borg.py wrapper provides extensive docstring documentation regarding required arguments, returned values, etc., so I do suggest reading through the wrapper if you have questions (e.g., about the python data types of arguments and returns).

Note that the file and function names above are just example names. You can name the above files whatever you want. Just be sure to modify the code I provide below to reflect the new names.

A sample of code for example_sim_opt.py is as follows:

import numpy as np
import pysedsim # This is your simulation model
import platform  # helps identify directory locations on different types of OS

def Simulation_Caller(vars):
    '''
    Purpose: Borg calls this function to run the simulation model and return multi-objective performance.

    Note: You could also just put your simulation/function evaluation code here.

    Args:
        vars: A list of decision variable values from Borg

    Returns:
        performance: policy's simulated objective values. A list of objective values, one value each of the objectives.
    '''

    borg_vars = vars  # Decision variable values from Borg

    # Reformat decision variable values as necessary (.e.g., cast borg output parameters as array for use in simulation)
    op_policy_params = np.asarray(borg_vars)
    # Call/run simulation model, return multi-objective performance:
    performance = pysedsim.PySedSim(decision_vars = op_policy_params)
    return performance

def Optimization():

    '''

    Purpose: Call this method from command line to initiate simulation-optimization experiment

    Returns:
        --pareto approximate set file (.set) for each random seed
        --Borg runtime file (.runtime) for each random seed

    '''

    import borg as bg  # Import borg wrapper

    parallel = 1  # 1= master-slave (parallel), 0=serial

    # The following are just examples of relevant MOEA specifications. Select your own values.
    nSeeds = 25  # Number of random seeds (Borg MOEA)
    num_dec_vars = 10  # Number of decision variables
    n_objs = 6  # Number of objectives
    n_constrs = 0  # Number of constraints
    num_func_evals = 30000  # Number of total simulations to run per random seed. Each simulation may be a monte carlo.
    runtime_freq = 1000  # Interval at which to print runtime details for each random seed
    decision_var_range = [[0, 1], [4, 6], [-1,4], [1,2], [0,1], [0,1], [0,1], [0,1], [0,1], [0,1]]
    epsilon_list = [50000, 1000, 0.025, 10, 13, 4]  # Borg epsilon values for each objective

    # Where to save seed and runtime files
    main_output_file_dir = 'E:\output_directory'  # Specify location of output files for different seeds
    os_fold = Op_Sys_Folder_Operator()  # Folder operator for operating system
    output_location = main_output_file_dir + os_fold + 'sets'

    # If using master-slave, start MPI. Only do once.
    if parallel == 1:
        bg.Configuration.startMPI()  # start parallelization with MPI

    # Loop through seeds, calling borg.solve (serial) or borg.solveMPI (parallel) each time
    for j in range(nSeeds):
        # Instantiate borg class, then set bounds, epsilon values, and file output locations
        borg = bg.Borg(num_dec_vars, n_objs, n_constrs, Simulation_Caller)
        borg.setBounds(*decision_var_range)  # Set decision variable bounds
        borg.setEpsilons(*epsilon_list)  # Set epsilon values
        # Runtime file path for each seed:
        runtime_filename = main_output_file_dir + os_fold + 'runtime_file_seed_' + str(j+1) + '.runtime'
        if parallel == 1:
            # Run parallel Borg
            result = borg.solveMPI(maxEvaluations='num_func_evals', runtime=runtime_filename, frequency=runtime_freq)

        if parallel == 0:
            # Run serial Borg
            result = borg.solve({"maxEvaluations": num_func_evals, "runtimeformat": 'borg', "frequency": runtime_freq,
                                 "runtimefile": runtime_filename})

        if result:
            # This particular seed is now finished being run in parallel. The result will only be returned from
            # one node in case running Master-Slave Borg.
            result.display()

            # Create/write objective values and decision variable values to files in folder "sets", 1 file per seed.
            f = open(output_location + os_fold + 'Borg_DPS_PySedSim' + str(j+1) + '.set', 'w')
            f.write('#Borg Optimization Results\n')
            f.write('#First ' + str(num_dec_vars) + ' are the decision variables, ' + 'last ' + str(n_objs) +
                    ' are the ' + 'objective values\n')
            for solution in result:
                line = ''
                for i in range(len(solution.getVariables())):
                    line = line + (str(solution.getVariables()[i])) + ' '

                for i in range(len(solution.getObjectives())):
                    line = line + (str(solution.getObjectives()[i])) + ' '

                f.write(line[0:-1]+'\n')
            f.write("#")
            f.close()

            # Create/write only objective values to files in folder "sets", 1 file per seed. Purpose is so that
            # the file can be processed in MOEAFramework, where performance metrics may be evaluated across seeds.
            f2 = open(output_location + os_fold + 'Borg_DPS_PySedSim_no_vars' + str(j+1) + '.set', 'w')
            for solution in result:
                line = ''
                for i in range(len(solution.getObjectives())):
                    line = line + (str(solution.getObjectives()[i])) + ' '

                f2.write(line[0:-1]+'\n')
            f2.write("#")
            f2.close()

            print("Seed %s complete") %j

    if parallel == 1:
        bg.Configuration.stopMPI()  # stop parallel function evaluation process

def Op_Sys_Folder_Operator():
    '''
    Function to determine whether operating system is (1) Windows, or (2) Linux

    Returns folder operator for use in specifying directories (file locations) for reading/writing data pre- and
    post-simulation.
    '''

    if platform.system() == 'Windows':
        os_fold_op = '\\'
    elif platform.system() == 'Linux':
        os_fold_op = '/'
    else:
        os_fold_op = '/'  # Assume unix OS if it can't be identified

    return os_fold_op
 

The following is an example of how you would submit a batch script on a Linux cluster to run a parallelized simulation-optimization experiment using the example_sim_opt.py and borg.py files. Note that in the parallelized version, simulations (i.e., “function evaluations”) are being run in parallel by separate processors.

You would need the following two files:

  1. example_sim_opt_caller.py. This is a python file that is used to call example_sim_opt.py
  2. example_sim_opt_batch_script.pbs. This is batch script that runs example_sim_opt_caller.py in parallel on a cluster using open MPI.

Example code for example_sim_opt_caller.py:


'''
Purpose: To initiate the optimization process, which will iteratively call the simulation model.
'''

import example_sim_opt  # Import main optimization module that uses borg python wrapper

# Module within example
example_sim_opt.Optimization()

Example code for example_sim_opt_batch_script.pbs:


#PBS -l nodes=8:ppn=16
#PBS -l walltime=24:00:00
#PBS -j oe
#PBS -o pysedsim_output.out

cd $PBS_O_WORKDIR
source /etc/profile.d/modules.sh
module load openmpi-1.6.5-intel-x86_64/gnu
module load python-2.7.5
mpirun python example_sim_opt_caller.py

You could then run the above batch script with a command such as:

qsub example_sim_opt_batch_script.pbs

Simple Python script to create a command to call the Borg executable

Please see my GitHub for a new Python program that may be of interest. If the link becomes broken in the future, look up the ‘misc_scripts’ repository within ‘jrkasprzyk’ GitHub.

It saves you having to type out a long Borg command when using the ‘command line interface’, especially when there are many many decision variables. The script also contains a simple set of Python objects of interest: Objective contains the name and epsilon of an objective, and Decision contains the name, lower bound, and upper bound of an objective.

You can imagine that this would be able to facilitate complicated MOEA analyses such as random seed analysis, and the running of multiple problems. I have also used this to organize my thoughts when doing other analyses in Python such as sampling of the decision space, and different sorts of processing of MOEA output.

Basic Borg MOEA use for the truly newbies (Part 2/2)

This post will walk you though a simple example using Borg MOEA to solve the DTLZ2, 3 objective instance.  I suggest reading first Part 1 of this post.  Once you have compiled the Borg MOEA, you are ready to follow this example.

The DTLZ2 problem, named after its creators :Deb, Thiele, Laumanns and Zitzler, is a very popular function to test MOEAs’ performance since  its  Pareto-optimal solutions are analytically known, and it is easily scalable to any number of objectives.

Step 1. Set the DTLZ2, 3 objective instance

In your borg folder,  open the dtlz2.py function. You should see the following script:


from sys import *
from math import *

nvars = 11
nobjs = 2
k = nvars - nobjs + 1

while True:
 # Read the next line from standard input
 line = raw_input()

# Stop if the Borg MOEA is finished
 if line == "":
 break

# Parse the decision variables from the input
 vars = map(float, line.split())

# Evaluate the DTLZ2 problem
 g = 0

for i in range(nvars-k, nvars):
 g = g + (vars[i] - 0.5)**2

objs = [1.0 + g]*nobjs

for i in range(nobjs):
 for j in range(nobjs-i-1):
 objs[i] = objs[i] * cos(0.5 * pi * vars[j])
 if i != 0:
 objs[i] = objs[i] * sin(0.5 * pi * vars[nobjs-i-1])

# Print objectives to standard output, flush to write immediately
 print " ".join(["%0.17f" % obj for obj in objs])
 stdout.flush()

You can easily scale the number of objectives in the DTLZ2 function by modifying lines 4 and 5 in the code.  Since we want to solve the 3 objective configuration of the problem, we will change the nobjs  and set it equal to 3.  In general :  number of variables= number of objectives + 9;  in this case nvars= 12. Alternatively, if we want to work with the 5 objective instance, then we would have nobjs= 5 and nvars= 14.


from sys import *
from math import *

nvars = 12
nobjs = 3
k = nvars - nobjs + 1

Once you have updated the file, save it and go back to your terminal.

Step 2. Run the DTLZ2 3 objective instance from your terminal

From your terminal  make sure you are in the borg folder and type the following command:

./borg.exe -v 12 -o 3 -e 0.01,0.01,0.01 -l 0,0,0,0,0,0,0,0,0,0,0,0 -u 1,1,1,1,1,1,1,1,1,1,1,1 -n 10000 python dtlz2.py > dtlz2.set

The -v and the -o flags, as you probably guessed are the number of decision variables and objectives, respectively. The -e flag stands for the epsilon precision values, here the same epsilon values are specified for the three objectives for simplicity, just remember that they don’t necessarily need to be the same across objectives.  The -l and -u flag are the decision variables’ lower and upper bounds.  These bounds can vary according to the problem.  The -n flag stands for the number of iterations, here we specify 1000,  you can certainly increase the number of iterations.  In the last part of the command you call the dtlz2.py function, finally, the optimization results are saved  in a separate file using the > “greater than”  symbol.  Once you typed the previous script, you should see a new file in your borg folder called dtlz2.set.  Before we go on, type the following command in your terminal and see what happens:

./borg.exe -h

The -h flag provides information about the different options available.

Step 3. Select the objectives

Now, open your newly generated dtlz2.set file. you will see something like this:

# BORG version: 1.8
# Current time: Thu Jun 25 17:12:00 2015
# Problem: python dtlz2.py
# Number of variables: 12
# Number of objectives: 3
# Number of constraints: 0
# Lower bounds: 0 0 0 0 0 0 0 0 0 0 0 0
# Upper bounds: 1 1 1 1 1 1 1 1 1 1 1 1
# Epsilons: 0.01 0.01 0.01
# Directions: 0 0 0
# Constraint mode: E
# Seed: (time)
1 0.814090298840556947 0.409932057219915436 0.453114658489247812 0.602037485943714312 0.612211413314472264 0.63507815287255176 0.284561154879054812 0.319971228064652446 0.510268965514402262 0.565795181158015859 0.352339424454035932 2.00000000000000014e-17 7.00000000000000035e-17 1.1566219845629353
0.200664515777928404 0.658663937901916241 0.459011341616803015 0.456423615588035791 0.72948415840455394 0.502347551242693147 0.55253340411468177 0.523000325718170678 0.424118159653203652 0.418071947834685986 0.307354991375789255 0.667924369576233357 0.55237113861310283 0.929550654944727661 0.352579197090774399
0.600637603847835599 0.635510585462702893 0.537202278296969316 0.480501397259140872 0.538845435998899225 0.565887992485444302 0.540269661215032948 0.32823362137901485 0.548470138166351484 0.485323707827721162 0.661267336189765076 0.603296309515394924 0.342802362666870974 0.531842633344500104 0.872739730448630957
0.200664515777928404 0.962019489314716703 0.478360997918643505 0.456423615588035791 0.72948415840455394 0.508917943579420218 0.434628101329525951 0.523000325718170678 0.416896640814134301 0.418071947834685986 0.307354991375789255 0.667924369576233357 0.0645572417675578242 1.08080813229192496 0.353051662882838901
0.580252259871913978 0.626070509625966998 0.537503391458319713 0.47945571905554607 0.540634044436276051 0.565466312655102277 0.5375236610793378 0.336204078423532282 0.541224805744518811 0.481480141441438469 0.645345342922852394 0.597006852306095737 0.362763876068478097 0.544895812217087494 0.844603875086156197

This is only a snippet of what you should see, here only five solutions are shown (one solution per row); for an epsilon of 0.01 you should see at least 100 solutions.  The first 12 columns in the file represent the 12 decision variables and the last 3 columns are the objectives.  In order to visualize the Pareto front, we want to select only the objectives.  To do so, type the following command:

awk 'Begin {FS=" "}; /^#/ {print $0}; /^[-]?[0-9]/ {printf("%s %s %s\n", $13,$14,$15)}' dtlz2.set > dtlz2.obj

The previous command is fully described in Joe’s post.  Once you run it you should see a new file called dtlz2.obj, with three columns, one for each of the objectives.

Step 4. Visualize the Pareto front

The Pareto front of the DTLZ2 problem is the first quadrant of a unit sphere centered at the origin and all the decision variables are equal to 0.5.  For the 3-objective case that we solved in this example, the Pareto approximate front generated by Borg MOEA is depicted in the figure below:

Featured image

With your results, attempt to generate a similar plot in the language of your choice.  Once you have visualized the Pareto Front, you will have completed this exercise!

Click here to learn more about the DTLZ2 problem.


			

Compiling, running, and linking a simulation model to Borg: LRGV Example

We have put together some training on using the Borg MOEA, and it is available here: Part I, Part II. This training goes over how to link one of the external C++ problems to the algorithm, and the focus is on using the algorithm itself, not on how to get a complicated model up and running. So I thought I would informally write some notes on how to get a model like the Lower Rio Grande Valley (LRGV) simulation running with Borg. If this post gets more developed, we may eventually merge it into the documentation. But let’s see how it goes!

This post is a work in progress! I am posting it incrementally in order to help the readers, but I will expand on it in the next few days. Last update: September 18, 2014

For this exercise you’ll need:
1. The LRGV source code. Get it from GitHub, or perhaps you already have a copy of it sitting around.
2. A copy of the Borg MOEA. Request it from Borgmoea.org, or perhaps you already have a copy of it sitting around.
3. Some way of compiling C/C++ code and running it. This can be either:
3a. A computer running Linux
3b. A computer running Cygwin, where you do all the computations on your local computer.
3c. Access to a computing cluster. This is probably the preferred method, because you can run longer jobs, and run the jobs simultaneously. Refer to our intro to using a cluster. Check out our video on submitting jobs to the cluster. Etc. In order to use the cluster you need:
3b.1 Ability to transfer files to the cluster. On all platforms (Mac, PC, Linux) you can use the web-based Globus Connect. Or on Windows you can use a program like WinSCP.
3b.2 Ability to perform interactive commands on the cluster. On Mac or Linux simply open your terminal software! On Windows, there’s a few options. PuTTY is a simple one. I like to use Cygwin, because it mimics a Linux environment on your PC and you can do things like compile code, run python, all as if you were running a Linux box even if you’re not.

So I will be running this example on a Windows PC with WinSCP and Cygwin. I already have a copy of Borg and the LRGV model, but nothing is on my cluster yet.

Connect to the cluster and transfer your files

Some of these instructions are going to be specific to your particular cluster. Since I’m in Colorado I’ll be writing specifically about the University of Colorado system, which I also discuss in this post over on our sister blog and a more recent post on this blog by my colleague Ben Livneh. But in general this should mostly apply to your own system too.

Start WinSCP and make sure you have selected the proper system you want to connect to. At Colorado, we have an Authenticator that will connect us to the system based on a one-time code that is refreshed every second. When WinSCP connects to the system, you’ll see your local files on the left, and the remote files on the right. Drag the folders that you want to copy to the system over to the right hand side. You should have Borg in one folder, and the LRGV code in the other.

In essence, you were given the source code for both Borg and LRGV. These are the fundamental instructions that tell the computer how to run the governing equations of the simulation and the optimization algorithm. For most code designs, though, you shouldn’t need to change the source code to change some parameters of how the research is carried out. For example, you can change the population size of Borg without changing the source code. In the LRGV simulation, you can change the volume of permanent rights without changing the source code, too. Now there’s different ways to arrange your files when you are doing research but here’s one option…

Keep the source code in their own separate folders. Then, when you compile, you can move the executable and project data (for the LRGV this is demand.txt, hydro.txt, etc.) into another folder. There’s several reasons for this:

  • The source code may have dozens of files. If everything is in one folder, source code is going to be alongside project files, some of which may be extraneous outputs that you often want to delete. If you delete these outputs, you may inadvertently delete important source code. On Unix, there is no ‘undelete’ so be careful!
  • It allows you to maintain multiple copies of the source code if you change something. Granted, source control using GitHub helps you do the same thing. But maybe you made a major change and you want to see what the ‘October 1’ versus the ‘November 1’ version of the code looks like. Keeping separate folders helps you see this.
  • Because you don’t have to change source code that often, it allows you to have a ‘canonical’ version of the code that is used in the study. If the folder is called ‘LRGV_2013-10-08’, because I know the last time I changed the source code was back on 10/8. Then, I can move into my ‘project’ folders and start changing other parameters, variables, etc. that do not require re-compilation but allow me to explore what I’d like to explore.

The last task in this session is to connect to the cluster using your ‘terminal’ program.
On PCs with PuTTY, type your hostname in the opening screen. Then, the program will ask you what you want to login ‘as’, so type your username.
Use a ‘terminal’. On Mac or Linux, simply open up the program called ‘Terminal’. On PCs with Cygwin, find the ‘Cygwin Terminal’. Then, for any of these terminal systems, you’ll want to use ssh to connect. The command is something like, ssh [hostname] but consult your own documentation for the specific command.  If you are rusty on your cluster commands please consult this post.

Congratulations! You’re connected to the cluster and are able to both transfer files and write commands. Time to…

Compile Borg

Source code by itself will not do anything, because it is a set of instructions written in programming language for your computer. But a computer doesn’t understand programming language — the program needs to be in executable form that the computer can actually use. Compilation is the process of converting source code into executables.

On Linux, the ‘makefile’ is a set of commands that does the compiling for you! It’s really easy. For a lot of programs, typing the command ‘make’ could be enough to compile all the files you need. However, sometimes you need to compile different sets of files, and thats why makefiles often have ‘targets’ that tell the computer the exact groups of files that the compiler needs, and what dependencies exist between the different files.  More on makefiles here.

To begin compiling borg, cd into the Borg directory. You’ll notice there is the makefile, and all the source, in this directory. Use more makefile to examine what the makefile looks like. I see several targets in the makefile: all and clean. Clean is designed to delete all the outputs of your compilation. It’s helpful between compilations to make sure that all the code is ‘fresh’ when you use it. The other options are specific to Borg, but the one that I want is all, which is the default. Therefore, to compile borg simply type make. You should see commands that come up on the screen to indicate that the compilation is proceeding successfully! Note, if you need to compile again, or you’ve changed something, ‘make clean’ will clean up the intermediate files.

To take a look at the files in your folder, use the ls command. If you were successful, you should see some files that are highlighted in a different color on your screen, for example we have ‘borg.exe’, ‘dtlz1.exe’, and so forth. The program ‘test.exe’ looks intriguing. To run it, simply type ./test.exe on the command line. The output seems like it worked.

We’ll need to pull out those executable files in a minute, but for the time being leave them in the folder.

Compile the LRGV simulation model

Change into the LRGV directory, and you’ll see two folders. ‘lrgv_bin’ contains the executable, makefile, project data, etc. ‘lrgv_src’ contains the source code. Change into the ‘lrgv_bin’ folder. Notice that here, the makefile has a funky name. It is called ‘MakefileSerial’. The make command for a nonstandard makefile name looks like this: make -f MakefileSerial. Do this command, and you’ll see that the compilation begins. After it’s finished, when you ls the directory notice there is a new file called ‘lrgvForMOEAFramework.’ Voila!

To see the help file for this program, the command is ./lrgvForMOEAFramework -h. That provides the options to you.

If you don’t yet understand all the options, that’s ok. Try the following set of default options to make sure the program is working: ./lrgvForMOEAFramework -m std-io -b AllDecAll -c combined -r 1000. Basically this is saying to run the standard input/output version of the code that is designed to work with Borg; you are running the AllDecAll_control.txt control file, you are running the ‘combined’ calculation mode; and you want to run 1000 Monte Carlo replicates at each function evaluation.

You may think nothing has happened, and I guess that’s true! This is because any program designed for standard i/o to connect with Borg is designed to wait for you (or the algorithm, or the processing code) to input values, and it will output some objective function values and constraint violation values back to you. The cursor is waiting for you, the human, right now, so input some values! Enter them with spaces in between, and when you are done, hit enter.

For the LRGV, the decision variables are in a certain order depending on the model ‘case’ (see the De Novo paper). For case 3, the order is rights, options low, options high, xi (threshold for deciding between low and high option values), alpha2, beta2, alpha, and beta. These values have transformations applied to them. One of these days we will have to write some documentation for the LRGV code but for now take my word for it. Try these random numbers:

0.1 0.3 0.3 0.4 1.4 0.4 1.4 0.4

Hit enter, and you should see some objective function output.

Take a look at AllDecAll_control.txt to see the objectives. (In the LRGV, the list of objectives is actually set in the control file. Not all models are like this — but the good thing here is it allows us to change the problem formulation ‘on the fly’ so to say). The objectives are cost, critical reliability, surplus, dropped transfers, number of leases, and drought transactions cost.

You have now compiled Borg, and separately compiled the LRGV simulation model. Stay tuned for edits to this post, where we refine the procedure and get the algorithm to talk to the simulation model.

Arranging the files for your job

Above, we talked about how it’s a good idea to have all your executables and project data in one place in order to speed up the solution process. Navigate out to the main folder that contains Borg and LRGV. When you do a ‘ls’ command you should see those two folders listed on the screen. Now let’s make a new folder: ‘mkdir run01’.

Change into the Borg folder (‘cd Borg’, or whatever you are calling the folder that contains the Borg source code), and then copy the borg.exe file to your run directory (‘cp borg.exe ../run01/’).

Similarly, change into the LRGV folder and pull out the executable and the project data. First change into the directory. You can take advantage of some Linux tricks here. Levels of different folders are denoted with dots (see above). Also, you can use wildcards so you don’t have to type out the whole folder name. So something like this should work: ‘cd ../LR*/’. When you change into a new folder, the ‘pwd’ command lets you know what folder you are in, and you can always ‘ls’ to list the contents of the directory.

The LRGV project information is contained in the ‘lrgv_bin’ folder. Change into it (‘cd lrgv_bin’) and then copy the executable (‘cp ./lrgvForMOEAFramework ../../run01’). Note, when you copy the executable, you’re copying it two levels up, because you’re within a subfolder of your LRGV directory here. We’re not done yet! The LRGV model requires some other files in this folder that must be included in your run folder. They are demand.txt, hydro.txt, initrights.txt, and so forth. As a hint, you can just copy all the text files over: ‘cp *.txt ../../run01’.

Now change into the run01 folder and do a ‘ls’ to admire your handiwork! The output of ls should show you that you have (i) the borg executable, (ii) the LRGV executable, and (iii) the LRGV data files (hydro.txt etc.) and (iv) the ‘control’ file that has LRGV parameters ([something]_control.txt).  If all of these files are not contained in the run folder, the program will not work.

Linking to the Borg ‘Command Line Interface’

The biggest lesson I learned here is that you need to make sure there are no extraneous fprintf, cerr, cout, (etc…) statements in the code, that give messages to the console. If you are specifying a socket for communication, this isn’t a problem, but without a specific socket you’ll run into an error. If you download the latest version of the LRGV code from github, this problem should be fixed!

Here is a sample command to run Borg with LRGV. Note that the backslashes simply split the command on multiple lines.
./borg.exe -v 8 -o 6 -c 4 -C E -R runtime.txt -F 100 \
-l 0,0,0,0.1,0,0,0,0 -u 1,1,1,0.4,3,1,3,1 \
-e 0.003,0.002,0.01,0.002,0.003,0.001 -n 1000 \
-- ./lrgvForMOEAFramework -m std-io -b AllDecAll -c combined -r 5000

You are running Borg with 8 decision variables, 6 objectives, and 4 constraints (as told by the v, o, and c flags). The -C flag tells Borg that constraints are feasible when the constraint violation variables equal zero. The l and u flags control the lower and upper bounds of the variables. Epsilon values are given by the e flag, and the n flag tells Borg to run for 1000 function evaluations. Runtime output is specified using the -R flag, with the output being printed in runtime.txt. The -F flag tells Borg to write runtime output every 100 function evaluations. Everything to the right of the two dashes is the set of commands for the LRGV model. Here we are specifying 5000 Monte Carlo simulations per evaluation.

This run should take a few minutes to do. The cool thing is you will actually get some good results from only running Borg for 1000 iterations.

Quick Plotting of Borg Results in AeroVis

The first Borg output you will see on the console window will look something like this:

BORG version: 1.4

Current time: Mon Feb 10 09:30:32 2014

Problem: ./lrgvForMOEAFramework -m std-io -b AllDecAll -c combined -r 5000

Number of variables: 8

Number of objectives: 6

Number of constraints: 4

Lower bounds: 0 0 0 0.1 0 0 0 0

Upper bounds: 1 1 1 0.4 3 1 3 1

Epsilons: 0.003 0.002 0.01 0.002 0.003 0.001

Directions: 0 0 0 0 0 0

Constraint mode: E

Seed: (time)


This is a nice way to verify that your flags all got parsed correctly. Everything looks like it is in order. Note that Borg will show you the statement, verbatim, that is used to invoke your model. It is up to the programmer of the simulation model to verify it is working successfully!. Also note that the default random seed is the system time. This means that every time you run Borg you will get different results! If you don’t want this to be the case, set the random seed with the -s command.

Another piece of advice: With the command I typed above, It should take Borg several minutes to get results! this means if the program terminates or exits right away, something is wrong! Unfortunately, some input/output errors can cause a crash and you won’t know it. You can verify that things are working by looking at the runtime.txt file periodically.

Alright, so I waited several minutes and Borg finished. The results are a teachable moment! When we look at what happened, it turned out that only one solution was generated. “Why?” you ask. Well, good question. There are three explanations — these explanations are important because no matter what problem you’re solving, a lot of real world problems will suffer from these traits:

  1. The LRGV problem is highly constrained. Constraints are given in the AllDecAll_control.txt file. Reliability, critical reliability, cost variability, and drought transactions cost are constrained. Therefore many solutions that are generated by Borg may be infeasible, limiting the output set.
  2. We didn’t run the algorithm for a long time. Given more time, we may have found more solutions.
  3. Epsilon values may have been too coarse. Epsilon values set the precision that the user wants out of their optimization process. The epsilon values used in the trial may have been too coarse, and because of that, there was only one non-dominated solution generated. This may not be a bad thing — it may show there aren’t strong tradeoffs between your objectives. Take a look at Josh Kollat et al’s calibration work for more details on that.
  4. The treatment of uncertainty may affect the process. This is more a comment for other applications and not so much the LRGV. The objectives and constraints were carefully formulated here to be fairly robust to different ensemble sizes. We’ve run the problem with 5000 replicates all the way down to 1000 replicates and it generally works pretty well. However, perhaps you may see a low number results because it’s hard to get a feasible point with a high number of Monte Carlo replicates. You could run into some issues when you formulate objectives — what if you did an objective like ‘the number of failures in a time series’? Then the number of time steps that you run would directly influence the feasible range of objectives that you could see. That sounds like another post all together!

Ok so let’s modify the run to get something that has more points. I’ll change two things. The first is, I will make the epsilon resolution much more fine. Second, I am going to change the number of Monte Carlo simulations — this is really not about generating more solutions, but rather just to speed up the process. Here’s the modified command:


./borg.exe -v 8 -o 6 -c 4 -C E -R runtime.txt \
-F 100 -l 0,0,0,0.1,0,0,0,0 -u 1,1,1,0.4,3,1,3,1 \
-e 0.0003,0.0002,0.001,0.002,0.0003,0.0001 -n 1000 \
-- ./lrgvForMOEAFramework -m std-io -b AllDecAll \
-c combined -r 1000

Now after you’re done you are going to see many solutions, a few of which are here:

0.66648339565528636 0.67476996888284801 0.043713294445470278 0.33181321009155323 0.91922773565526716 0.45571096741028849 0.26815793390110088 0.28307276249003427 0.12025764110925453 -0.99891666666666667 0.41695535434092823 0.00012229087558711678 0.00068000000000000005 0
0.65802436611603377 0.17040802931274276 0.89787749927831673 0.37145853665497164 0.73557567609940055 0.28699499841402953 0.66876681220805767 0.64095194612404671 0.11439486913000001 -0.99775000000000003 0.42406373201305358 3.2074893365397502e-06 0.00033 0
0.71927499163180475 0.47214204629854206 0.91211932640347826 0.28733751854346368 1.1157608185160335 0.38838605481614208 1.1937640440664763 0.4668697964761449 0.12171828747183734 -0.9996666666666667 0.48153734145487198 0.00012463055456501487 0.00024000000000000001 0
0.70988260669592018 0.66195116819530686 0.86137019607092546 0.22492238687148389 0.04267660529415044 0.83243266132074933 1.4422130367535617 0.12824331995291713 0.1230293014619977 -0.99916666666666665 0.46583663718635338 0 0.00013999999999999999 0
0.65514335046148675 0.72225758980317201 0 0.32678886191611856 0.018830764515940516 0.67001200457315935 0.16300338695838254 0.22887574793603069 0.1198738899999995 -0.99608333333333332 0.41045367650680376 0 0 0

Ok so now all we have to do is save these results into a file and view them in AeroVis. This should also be the subject of another post but in the interest of completeness I’ll briefly discuss it here. I should note: we probably should have specified -f as a flag to the Borg, which would have indicated that we wanted to save our approximation set output file, and then we could have specified a filename. But it’s basically equivalent to copying and pasting which is what I’m about to do. Also this is another teachable moment — you should really be looking at the Borg readme and Borg documentation for the full set of commands for using this algorithm!

Anyway, simply copy all the numeric data in between pound signs (or hash tags if you are a fan of #twitter) onto the clipboard. Modern versions of Excel make it really easy to create a comma separated value (CSV) file of pasted data. Basically when you use the ‘paste’ command it will prompt you to use the ‘Text Import Wizard’ (see Microsoft help here), where you should tell the program that the data is space delimited. (Many of my fellow blog mates will probably scream bloody murder about recommending that you process data in Excel. I suppose they are right! But this procedure is really easy especially in modern versions of Excel, and recent versions of AeroVis are designed to seamlessly integrate with Excel. If you are a fan of other platforms, skip this step and somehow parse the data into a comma separated format such that you have headers in the first line and then the data in subsequent lines.)

Once the data is in Excel, add a row with the appropriate header. Basically, Borg outputs the decision variables first, then the objectives. Each row is a different solution. So if you had S solutions, N decision variables, and M objectives, you would basically have a data table that had S rows by N+M columns. The header for our problem would be the following line. In the final CSV file, this will have commas in between each entry, but in Excel each entry is in its own cell (i.e. the first entry is in A1, the second entry is in B1, and so forth)

Rights,OptLow,OptHigh,Xi,Alpha2,Beta2,Alpha,Beta,Cost,CritRel,Surplus,Drop,NumLeases,DrTransCost

Save the file with some name, myfile.csv for example. Open AeroVis, and then in a separate window navigate to where the file is stored (in the ‘Explorer’ for example). Now simply drag the file into the main window of AeroVis! The ‘Data Import Wizard’ should come up. Click ‘Select All’ to select all the variables, and hit ok.

You should see something like the following [Insert image here]. You can kind of see a faint cone on the bottom left side, but it looks pretty transparent. If you use the ‘Plot Control Buttons’ tool, you can see that the program just assumes the first variable (Rights) is on the first axis (X), the second variable (OptLow) is on the second axis (Y) and so forth.

In the final update to this post, I will discuss good plotting settings to use and how to manipulate the AeroVis plot to learn more about the results you got. Please stay tuned!