Using the Python Borg Wrapper – Lake Problem Example

Python compatibility with the Borg MOEA is highly useful for practical development and optimization of Python models. The Borg algorithm is implemented in C, a strongly-typed, compiled language that offers high efficiency and scalability for complex optimization problems. Often, however, our models are not written in C/C++, rather simpler scripting languages like Python, MATLAB, and R. The Borg Python wrapper allows for problems in Python to be optimized by the underlying C algorithm, maintaining efficiency and ease of use. Use of the Python wrapper varies slightly across operating systems and computing architectures. This post will focus on Linux systems, like “The Cube”, our computing cluster here at Cornell. This post will work with the most current implementation of the Borg MOEA, which can be accessed with permission from this site.

The underlying communications between a Python problem and the Borg C source code are handled by the wrapper with ctypes, a library that provides Python compatibility with C types and functions in shared libraries. Shared libraries (conventionally .so files in Linux/Unix) provide dynamic linking, a systems tool that allows for compiled code to be linked and reused with different objects. For our purposes, we can think of the Borg shared library as a way to compile the C algorithm and reuse it with different Python optimization runs, without having to re-compile any code. The shared library gives the wrapper access to the underlying Borg functions needed for optimization, so we need to create this file first. In the directory with the Borg source code, use the following command to create the (serial) Borg shared library.

gcc -shared -fPIC -O3 -o libborg.so borg.c mt19937ar.c –lm 

Next, we need to move our shared library into the directory containing the Python wrapper (borg.py) and whatever problem we are optimizing.

In this post, we’ll be using the Lake Problem DPS formulation to demonstrate the wrapper. Here’s the source code for the problem:

@author: Rohini

#DPS Formulation 

#Objectives:
#1) Maximize expected economic benefit
#2) Minimize worst case average P concentration 
#3) Maximize average inertia of P control policy
#4) Maximize average reliability 

#Constraints: 
#Reliability has to be <85%

#Decision Variables 
#vars: vector of size 3n 
#n is the number of radial basis functions needed to define the policy
#Each has a weight, center, and radius (w1, c1, r1..wm,cn,rn)

#Time Horizon for Planning, T: 100 Years
#Simulations, N: 100 

"""

import numpy as np
from sys import *
from math import *
from scipy.optimize import root
import scipy.stats as ss

# Lake Parameters
b = 0.42
q = 2.0

# Natural Inflow Parameters
mu = 0.03
sigma = np.sqrt(10**-5)

# Economic Benefit Parameters
alpha = 0.4
delta = 0.98

# Set the number of RBFs (n), decision variables, objectives and constraints
n = 2
nvars = 3 * n
nobjs = 4
nYears = 100
nSamples = 100
nSeeds = 2
nconstrs = 1

# Set Thresholds
reliability_threshold = 0.85
inertia_threshold = -0.02


###### RBF Policy ######

#Define the RBF Policy
def RBFpolicy(lake_state, C, R, W):
    # Determine pollution emission decision, Y
    Y = 0
    for i in range(len(C)):
        if R[i] != 0:
            Y = Y + W[i] * ((np.absolute(lake_state - C[i]) / R[i])**3)

    Y = min(0.1, max(Y, 0.01))

    return Y


###### Main Lake Problem Model ######

def LakeProblemDPS(*vars):

    seed = 1234

    #Solve for the critical phosphorus level
    def pCrit(x):
        return [(x[0]**q) / (1 + x[0]**q) - b * x[0]]

    sol = root(pCrit, 0.5)
    critical_threshold = sol.x

    #Initialize arrays
    average_annual_P = np.zeros([nYears])
    discounted_benefit = np.zeros([nSamples])
    yrs_inertia_met = np.zeros([nSamples])
    yrs_Pcrit_met = np.zeros([nSamples])
    lake_state = np.zeros([nYears + 1])
    objs = [0.0] * nobjs
    constrs = [0.0] * nconstrs

    #Generate nSamples of nYears of natural phosphorus inflows
    natFlow = np.zeros([nSamples, nYears])
    for i in range(nSamples):
        np.random.seed(seed + i)
        natFlow[i, :] = np.random.lognormal(
            mean=log(mu**2 / np.sqrt(mu**2 + sigma**2)),
            sigma=np.sqrt(log((sigma**2 + mu**2) / mu**2)),
            size=nYears)

    # Determine centers, radii and weights of RBFs
    C = vars[0::3]
    R = vars[1::3]
    W = vars[2::3]
    newW = np.zeros(len(W))

    #Normalize weights to sum to 1
    total = sum(W)
    if total != 0.0:
        for i in range(len(W)):
            newW[i] = W[i] / total
    else:
        for i in range(len(W)):
            newW[i] = 1 / n

    #Run model simulation
    for s in range(nSamples):
        lake_state[0] = 0
        Y = np.zeros([nYears])

        #find policy-derived emission

        Y[0] = RBFpolicy(lake_state[0], C, R, newW)

        for i in range(nYears):
            lake_state[i + 1] = lake_state[i] * (1 - b) + (
                lake_state[i]**q) / (1 +
                                     (lake_state[i]**q)) + Y[i] + natFlow[s, i]
            average_annual_P[
                i] = average_annual_P[i] + lake_state[i + 1] / nSamples
            discounted_benefit[
                s] = discounted_benefit[s] + alpha * Y[i] * delta**i

            if i >= 1 and ((Y[i] - Y[i - 1]) > inertia_threshold):
                yrs_inertia_met[s] = yrs_inertia_met[s] + 1

            if lake_state[i + 1] < critical_threshold:
                yrs_Pcrit_met[s] = yrs_Pcrit_met[s] + 1

            if i < (nYears - 1):
                #find policy-derived emission
                Y[i + 1] = RBFpolicy(lake_state[i + 1], C, R, newW)

# Calculate minimization objectives (defined in comments at beginning of file)
    objs[0] = -1 * np.mean(discounted_benefit)  #average economic benefit
    objs[1] = np.max(
        average_annual_P)  #minimize the max average annual P concentration
    objs[2] = -1 * np.sum(yrs_inertia_met) / (
        (nYears - 1) * nSamples
    )  #average percent of transitions meeting inertia thershold
    objs[3] = -1 * np.sum(yrs_Pcrit_met) / (nYears * nSamples
                                            )  #average reliability

    constrs[0] = max(0.0, reliability_threshold - (-1 * objs[3]))

    return (objs, constrs)

The important function for this blog post is LakeProblemDPS, which demonstrates how to configure your own problem with the wrapper. Your function must take in *vars, the decision variables, and return objs, a list of objective values (or a tuple of objective values and constraints). Within the problem, refer to vars[i] as the i-th decision variable, for i in [0,nvars-1]. Set the list of objective values in the same manner.

Once our problem is defined and compatible with the wrapper, we can optimize with Borg. The following code runs the Lake problem optimization for 10,000 function evaluations.

# Serial Borg run with Python wrapper
# ensure libborg.so is compiled and in this directory
from borg import *
from lake import *

maxevals = 10000

# create an instance of the serial Borg MOEA
borg = Borg(nvars, nobjs, nconstrs, LakeProblemDPS)

# set the decision variable bounds and objective epsilons
borg.setBounds(*[[-2, 2], [0, 2], [0, 1]] * int((nvars / 3)))
borg.setEpsilons(0.01, 0.01, 0.0001, 0.0001)

# perform the optimization
# pass in a dictionary of arguments, as defined in borg.py
result = borg.solve({"maxEvaluations": maxevals})

# print the resulting objectives
for sol in result:
    print(sol.getObjectives())

Note the constructor Borg() creates an instance of the Borg algorithm with a specified number of variables, objectives, and constraints. The LakeProblemDPS argument is the objective function to be optimized by this instance of Borg. The setBounds and setEpsilons methods are required. solve() performs the optimization and takes in a dictionary of Borg parameters. See borg.py for a comprehensive list.

Using the Python wrapper to run the Parallel Borg MOEA

The previous example uses the serial Borg MOEA, but the wrapper also supports the master-worker and multi-master parallelizations. Configuring a parallel version of the algorithm requires a few additional steps. First, you must compile a shared library of the parallel implementation and move it to the wrapper directory.

For the master-worker version, use:

mpicc -shared -fPIC -O3 -o libborgms.so borgms.c mt19937ar.c -lm

For the multi-master version, use:

mpicc -shared -fPIC -O3 -o libborgmm.so borgmm.c mt19937ar.c -lm

To call the master-worker version, you must explicitly start up and shut down MPI using the Configuration class provided in borg.py. The following code performs a parallel master-worker optimization of the Lake problem:

# Master-worker Borg run with Python wrapper
# ensure libborgms.so or libborgms.so is compiled and in this directory
from borg import *
from lake import *

# set max time in hours
maxtime = 0.1

# need to start up MPI first
Configuration.startMPI()

# create an instance of Borg with the Lake problem
borg = Borg(nvars, nobjs, nconstrs, LakeProblemDPS)

# set bounds and epsilons for the Lake problem
borg.setBounds(*[[-2, 2], [0, 2], [0, 1]] * int((nvars / 3)))
borg.setEpsilons(0.01, 0.01, 0.0001, 0.0001)

# perform the optimization
result = borg.solveMPI(maxTime=maxtime)

# shut down MPI
Configuration.stopMPI()

# only the master node returns a result
# print the objectives to output
if result:
    for solution in result:
        print(solution.getObjectives())

This script must be called as a parallel process – here’s a SLURM submission script that can be used to run the optimization on 16 processors (compatible for The Cube):

#!/bin/bash
#SBATCH -J py-wrapper
#SBATCH -o normal.out
#SBATCH -e normal.err
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 16

mpirun python3 mslake.py

sbatch submission.sbatch will allocate one node with 16 processors for the optimization run.

Troubleshooting

Depending on your machine’s MPI version and your shell’s LD_LIBRARY_PATH environment variable, the Borg wrapper may try to access an unavailable mpi shared library. This issue happens on our cluster, the Cube, and causes the following error:

OSError: libmpi.so.0: cannot open shared object file: No such file or directory

In borg.py, the startMPI method attempts to access the nonexistent libmpi.so.0 shared library. To fix this, find the location of your mpi files with:

echo $LD_LIBRARY_PATH

Likely, a directory to your mpi library (i.e. /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/lib on the cube) will print. (Note, if such a path does not exist, set the LD_LIBRARY_PATH environment variable to include your mpi library) Navigate to this directory and view the file names. On the Cube, libmpi.so.0 (the library the Borg wrapper is trying to access) does not exist, but libmpi.so does (this is a software versioning discrepancy). Back in the startMPI method in borg.py, change the line

CDLL("libmpi.so.0", RTLD_GLOBAL)

to access the existing mpi library. On the cube:

CDLL("libmpi.so", RTLD_GLOBAL)