Introduction to Borg Operators Part 1: Simplex Crossover (SPX)

I have always found the connection between Genetic Algorithms and Biological Evolution very interesting. In many ways, Genetic Algorithms like Borg, emulate Biological Evolution. They start with a random population of potential solutions, the best fit of these solutions are selected to produce the future generation. During the production of offspring, crossover and mutation occur, which leads to more variation in the future generation. Strong (Non-dominated) offspring join the larger population and replace some of the previous weaker solutions (which are now dominated), and the process is repeated until some final condition is met.

In this post, I will briefly discuss Borg’s operators, and then dive deeper into Simplex Crossover (SPX) by Tsutsui et Al. (1999). This is the first of a series of posts in which I will discuss the Borg operators.

General Overview of Borg Operators

Borg uses the six operators shown in the figure 1 below to achieve variation in its solutions. In the figure, the larger dots represent the parent solutions, while the smaller dots represent the offspring solutions (Hadka & Reed, 2013).

The question is how does Borg know which operators are best suited for a new problem? Borg auto-adaptively selects operators to based on their performance to aid the search. Based on section 3.4 in Hadka and Reed, 2013: In the first iteration in Borg assigns an equal probability for using each operator. For K operators, this probability is 1/K. With each iteration, these probabilities are updated, so that the operators that produced more solutions in the epsilon box archive, are assigned higher probabilities, while those with less solutions in the epsilon box dominance archive are assigned a lower probability. Hadka and Reed(2013) use the following weighted average equation to obtain this behavior:

equation 1

Where:

  • K is the number of operators used
  • Qi ∈ {Q1,…, Qk} is the probability that an operator is used to produce offspring in the next generation. and is initialized at 1/K
  • Ci ∈ {C1,…, Ck} is the number of solutions in the epsilon box dominance archive produced by each operator i.
  • Sigma is a positive number that prevents any of the operators from reaching a probability of zero.

figure 1

Figure 1 (Hadka & Reed, 2013)

Notice in the bottom three operators in Figure 1 show some similarity: the parents form a simplex (which is a triangle in 2 dimensions). However, the way the offspring are distributed around the parents is different. In Parent Centric Crossover, the offspring form around the parent solutions, in Unimodal Normal Distribution Crossover, the offspring are normally distributed around the center of mass of the three parents, and in Simplex Crossover, the solutions are uniformly distributed inside a simplex that is larger than the simplex created by the three parents (Deb et Al., 2002).

Simplex Crossover (SPX)

Discussion based on Tsutsui et Al., 1999

Simplex Crossover (SPX) is a recombination operator that can use more than two parent solutions to generate offspring. The method involves expanding the simplex formed by the parent solutions by a factor of epsilon, and then sampling offspring from the resulting expanded simplex. The expansion of the simplex is centered around the center of mass of the parent solution vectors, and therefore independent of the coordinate system used. The number of parents must be at least 2 and less than or equal to the number of parameters plus one. That is:

equation 2

Where:

  • m: is the number of parents
  • n: is the number of parameters

The SPX is denoted as SPX-n-m-ε.

Tsutsui et Al., 1999 states that for low dimensional functions, SPX works best with 3 parent solution vectors, and for high dimensional functions SPX works best with 4 parent solution vectors.

The 3 Parent, 2 Parameter Case

It is easiest to see this in the 2-dimensional three parent, 2 parameter case, i.e. SPX-2-3-ε. This can be done in four steps, referring to figure 2:

figure 2

Figure 2 (Tsutsui et Al., 1999)

  1. Let X1, X2, and X3 be the three parent solution vectors. Each of these vectors has two parameters (it’s x and y coordinates). Calculate the center of mass, O, of the parents by computing the average of their two parameters:equation 3.png
  2. Expand the simplex by epsilon at every vertex:equation 4
  3. Produce offspring by using a uniform distribution to sample inside new expanded simplex defined in step 3.

You can see this in Python using the code below:


import numpy as np
import random

def SPX_2D(X1, X2, X3, epsilon, n_offspring=1, m=2):
# m is the number of parameters in each parent vector
# X1, X2, X3 are the parent vectors as np.arrays, each with m parameters (elements) i.e. len(Xi)=m
# epsilon is a value greated that zero by which the simplex is expanded
# n_offspring is the number of offspring you would like to produce

# Step 0: apply checks and inform the user if the input is wrong
    if m<2:
        print("error: at least two parameters needed for 2D SPX")
    elif len(X1)!= len(X2) | len(X1)!=len(X3) | len(X1)!=len(X2):
        print("error: the number of parameters in the parent vectors don't match")
    elif len(X1)!=m:
        print("error: the number of parameters in the parent vectors is not equal to m")

# if the checks pass, the function can proceed
    else:
# Step 1: find the center of mass (CoM) of the three parents
        CoM = 1/3 * (X1 + X2 + X3)

# Step 2: Define the vertices (Vi) of the simplex by expanding the simplex in the direction of Xi-CoM by (1+epsilon)
# note that the equation here is slightly different from the one in the paper, since the one in the paper produces the vector
# that translates the center of mass to the new vertices, and so the vectors need to be added to the center of mass coordinates
# to produce the new vertices.
        V1=CoM+(X1-CoM)*(1+epsilon)
        V2=CoM+(X2-CoM)*(1+epsilon)
        V3=CoM+(X3-CoM)*(1+epsilon)

# Step 3: Produce offspring by sampling n_offspring points from the new simplex defined in step 3 using a uniform distribution
        offspring = [point_on_triangle(V1, V2, V3) for _ in range(n_offspring)]

    return offspring, V1, V2, V3, CoM

######################################################################################################################### 

# Point_on_triangle function
# source: https://stackoverflow.com/questions/47410054/generate-random-locations-within-a-triangular-domain
# Solution by Mark Dickinson (Nov 21, 2017)
# Comments added by Sara

def point_on_triangle(pt1, pt2, pt3):

# pt1, pt2, pt3 are the vertices of a triangle

# find two random numbers s and t, note that random produces a float between 0 and 1 using a uniform distribution
    s, t = sorted([random.random(), random.random()])

# use s & t to calculate a weighted average of each coordinate. This will produce the offspring vector
    return (s * pt1[0] + (t-s)*pt2[0] + (1-t)*pt3[0],
            s * pt1[1] + (t-s)*pt2[1] + (1-t)*pt3[1])

#########################################################################################################################

# Let's try an example
X1=np.array([-2,2])
X2=np.array([4,2])
X3=np.array([1,6])
epsilon=0.3
m=2
n_offspring=1000

offspring, V1, V2, V3, CoM = SPX_2D(X1, X2, X3, epsilon, n_offspring, m)

# Finally, you can plot the parents and offspring to produce Figure 3

import matplotlib.pyplot as plt

# Plot the parents
x1, y1 = zip(X1, X2, X3)
plt.scatter(x1, y1, s=50, label='Parents')

# Plot the center of mass
x2, y2 = zip(CoM)
plt.scatter(x2, y2, s=150, label='Center of Mass')

# Plot the expanded simplex
x3, y3 = zip(V1, V2, V3)
plt.scatter(x3, y3, s=100, label='Simplex Vertices')

# Plot the offspring
x4, y4 = zip(*offspring)
plt.scatter(x4, y4, s=0.05, label='Offspring')

plt.legend()

plt.show()

figure 3.png

Figure 3

General Case

The general case is denoted as SPX-n-m-ε, with n parameters and n parents.

Each parent solution vector is:

equation 5.png

These vectors are in R^n.

Thinking in terms of Biology, this parent vector mimics a chromosome, with m parameters as its different traits.

The general case can be outlined in four steps:

1) Parent vectors are selected from the population pool

2)  R^n is space is then divided as follows:equation 6Divide R^n into h sets of R^m-1 spaces and one R^q space. I found this easier when I thought of it in the context of a chromosome, i.e. large parent vector of length n, being divided into smaller sub-vectors of length m-1, and a remainder vector or length q. See figure 4 below:

figure 4            Figure 4

3) In each R^m-1 space, the m parent vectors are used to generate offspring using the expanded simplex as described in the 2-dimensional, 3 parent, 2 parameters case. Again, using the chromosome analogy, figure 5 shows how this can be depicted for m=3 parents and for example, n=9 parameters. R^9 is divided into four (h=integer(n/(m-1)=4) R^2 (m-1=2) spaces and one R^1 space (q=remainder(n/(m-1)=1).

figure 5

Figure 5

4) The sets of offspring in R^m-1 produced in step 3 together with the vector in R^q (which remains unchanged), are then combined in their correct positions to form an offspring vector in R^n.

The code for the general SPX case can be found in David Hadka’s github.

References

 

Interacting with Plotted Functions Using Jupyter Notebooks ipywidgets with matplotlib

Interacting with Plotted Functions Using Jupyter Notebooks ipywidgets with matplotlib

When exploring the properties of an individual function or a system of equations, I often find myself quickly writing up code in a Jupyter Notebook (overview) to plot functions using matplotlib. While a previous blogpost by Jazmin has gone over creating interactive plots with matplotlib, being able to actively change variables to quickly explore their sensitivity or even teach others is crucial.

The primary reason I started playing with interactive widgets in Jupyter Notebook was to  teach individuals the basic mechanics of a simple crop allocation model that utilizes Positive Mathematical Programming (PMP) (Howitt, 1995). Beyond showing the simple calibration steps via code, allowing students to interact with individual variables allows for them to not only explore the sensitivity of models but also to find where such a model might break.

In this specific case, the marginal cost and revenues for a specific commodity (corn) versus the number of acres planted are graphed. A breaking case where the crop’s input data (the price per ton of corn) causes the model to produce unrealistic results (a negative marginal cost) is produced below.

Plotting Functions in matplotlib

One of the features that is often overlooked in matplotlib is that you can (indirectly) plot functions without much effort. The following commented code is used to demonstrate plotting marginal cost and revenue curves on the same graph.

import matplotlib.pyplot as plt
from numpy import *

#fixed inputs from calibrated model
#alpha and gamma for quadratic formulation from Howitt, 1995
input_params = [743.0, 2.57]

#crop revenue per acre, known
crop_revenues = 1500

#Creating the x-axis domain, controls range of x inputs
t = linspace(0, 350)

#t is the independent variable, corn_MC is dependent variable
corn_MC = input_params[0]  + input_params[1] * t 

#note that you have to multiply t by 0 to populate entire line
corn_MR = crop_revenues + 0 * t

plt.figure(dpi=400) #set resolution of graph, can change. 

#label axes
plt.ylabel('Marginal Cost ($/acre)')
plt.xlabel('Acres of Corn Planted')

#plot as many lines as you'd like, expand them here
plt.plot(t, corn_MC, 'r', label='Marginal Cost')
plt.plot(t, corn_MR, 'b', label='Marginal Revenue')

#change size of ticks on axes
plt.tick_params(labelsize=8)

legend = plt.legend(loc=4, shadow=True)
plt.show()

 

The resulting graph is generated inline in a Jupyter Notebook. Note that the resolution and size of the graphic is adjusted using plot.figure(dpi=400)—the default is much smaller and makes for hard-to-read graphs!

matplotlib_output.png

Interactive Inputs for Graphics in Jupyter Notebook

While graphing a function in Jupyter Notebook using matplotlib is rather straightforward, interactive plots with functions are rarely utilized. Using iPython widgets allows us to easily create these tools for any type of situation, from vertical and horizontal slider bars to progress bars to dropdown to boolean ‘click me’ buttons. You can find any of these features here. For the sake of simplicity, I have simply included three slider bars to demonstrate these features.

Notably, as an expansion of the example above, I have to run scipy.minimize multiple times whenever inputs are recalibrated (for questions on this end, I am more than happy to explain PMP if you contact me directly). To do this, I have to imbed multiple functions into a larger function to carry this out. However, I have not noticed any considerable slowdowns when running this much code.

To carry out the entirety of the example above, I have almost 300 lines of code to fully encapsulate a 3-crop allocation problem. For simplicity’s sake, I defined the function ‘pmp_example_sliders’ to contain all of the code required to calibrate and produce the graphic above (note that input parameters are what are calculated) using Scipy.optimize.minimize. Shadow prices are then derived using scipy.optimize.minimize and used to create the parameters necessary for the  marginal cost curve shown in this example. Note that the full model can be found in the GitHub file.

from ipywidgets import *
import scipy.optimize
%matplotlib inline
from numpy import *
import matplotlib.pyplot as plt

def pmp_example_sliders(corn_price=250, corn_initial_acres=200, x_axis_scale=350):
return plot
###Code for Running Calibration and Graphing Resulting Graph###
#create sliders for three variables to be changed
#have to have function with changing parameters (shown above)
#note that scale for corn_price is form 10 to 500 and stepping by 10
slider = interactive(pmp_example_sliders, corn_price=(10, 500, 10),
corn_initial_acres=(0.01,400, 10), x_axis_scale=(30, 650, 10))

#display the resulting slider
display(slider)

The following graphic results with the aforementioned sliders:

interact_1.PNG

We can easily change the input slider on the corn price from 250 to 410 to change to the following results:

interact_2.PNG

Because of this interactive feature, we not only see how the MC and MR curves interact, but we can observe that the Alpha Parameter (shown below the graph) becomes negative. While not important for this specific blog post, it shows where the crop allocation model produces negative marginal costs for low allocations of corn. Thus, utilizing interactive graphics can help highlight dynamics and drawbacks/limitations for models, helping assist in standalone tutorials.

If you would like to run this yourself, please feel free to download the Jupyter Notebooks from my GitHub. All code is written in Python 3. If you need assistance opening Jupyter Notebook, please refer to my blog post introducing Jupyter Notebook (link) or how to create a shortcut to launch Jupyter Notebook from your current working directory (link).

Introduction to DiscoveryDV

In this blog post, I will outline some of the capabilities of DiscoveryDV, a software created by Josh Kollat, a former PhD student of Dr. Reed and the CEO and founder of DecisionVis. DiscoveryDV allows users to visualize and explore tradeoffs in high dimensional data using pareto sort techniques, a variety of plotting tools, and animations. I will demonstrate these functionalities using a problem that Josh first explored when he was developing VIDEO (Visually Interactive Decision-Making and Design Using Evolutionary Multi-Objective Optimization), a precursor to DiscoveryDV. DiscoveryDV is not available for direct download, but you can request access here.

Premise

The above linked paper uses a long-term groundwater monitoring case study to demonstrate how visualization tools can be helpful in exploring large multi-objective solution sets to aid in decision making. The case study premise is as follows: A PCE contamination plume from an underground storage tank is spreading through an aquifer. PCE data samples are available at 25 pre-determined well sampling locations and there are 1-3 sampling points along the vertical axis of the well that can also be sampled. The issue is addressed through a multi-objective optimization problem with four conflicting objectives.

  1. Minimize Cost: Cost is equivalent to the number of wells chosen to be sampled.
  2. Minimize Concentration Error: Concentration estimation error is calculated between a Kriged map of the plume utilizing all available sampling locations and a Kriged map of the plume which utilizes a subset of the sampling locations.
  3. Minimize Uncertainty: The sum of the estimated variances attained for each estimation location
  4. Minimize Mass: Reflects the error between the total mass of PCE estimated by Kriging the domain based on all available well sampling locations, and the estimated mass of PCE obtained by Kriging the domain based on a subset of well sampling locations.

There are 33 million possible solutions to this optimization problem, so let’s use DiscoveryDV to dig in!

Starting a Project in DiscoveryDV

Figure 1 is a screenshot of the DiscoveryDV interface which is very simple and easy to use. All data sets and plots can be accessed from the storyboard on the right.

fig1.png

Figure 1: DiscoveryDV Interface

There are many pre-loaded data sets in DiscoveryDV and we will be using the LTM example.

  1. From the Examples tab in the toolbar, choose LTM. Notice that a new page has been created in your storyboard.
  2. If you click the > by LTM, you will see the attributes for the project: data files, plots, brushing rules, snapshots, etc.

By clicking the example, DiscoveryDV will load in a CSV file with the pareto optimal results. There are 2569 pareto optimal solutions, so what solution do you choose? This is a very real dilemma for stakeholders. The first step is to visualize the pareto front.

Creating a 3D Plot of the Pareto Surface

Since the problem formulation has 4 objectives, you should expect to see a 3D surface. We can plot the results of the 4th objective as the color of our surface.

In order to make this plot, you can use the N-Dimensional Plot tool. Right click on Plots in the storyboard and choose Create plot -> N-Dimensional Plot. Plot Cost, Error, and Risk on the X,Y, and Z axes respectively and Mass as color. You can rotate the graph and zoom in and out. You can show the utopia point by choosing it in the Glyphs menu. You can also save this image into a PowerPoint by choosing Share>Send To PowerPoint from the plot menu. Finally, you can create an animation, as shown in Figure 2.

2018-11-03-20-29-05

Figure 2: Pareto Surface GIF

It is interesting to note that by solving for the high-order Pareto-optimal surface for four objectives, all the sub-problems based on three objectives, two objectives, or even a single objective are implicitly solved at the same time. It can be unwieldy to look at a whole pareto surface, so sometimes it is advantageous to just look at two-objective pareto fronts to gain a better understanding of the behavior of the objectives.

Creating a 2D Plot to Showcase Two-Objective Tradeoffs

A quick way to visualize the 2D tradeoffs is by using the 2D scatter matrix plot (Figure 3). Along the diagonals are histograms that show the breakdown of solutions in bins. The rest of the plots show the tradeoffs. By hovering over a point, you can see what bin that point lies in along with its location on the other relevant 2D plots.

Figure 3: Two-Objective Tradeoffs

However, we are not necessarily interested in all of the solutions portrayed in the 2D scatter. Rather we would like to see the pareto fronts of the 2-objective formulations to better dissect what is happening in the 3D surface. The 2D scatter matrix doesn’t have pareto sorting functionality, so let’s look at one of these tradeoffs more closely using the XY Plot tool.

  1. Right click on Plots in the storyboard and choose Create plot -> XY Plot.
  2. Plot Cost and Error on the X and Y axis respectively
  3. Perform a pareto sort. The dominated data points should turn transparent.
Picture2

Figure 4: Two-objective Pareto front

From Figure 4, it is clear that increasing cost will decrease error which is somewhat intuitive, but maybe wasn’t clear from the 3D plot. This is the advantage of looking at two-objective tradeoffs. It is up to the decision maker to determine what criteria are important to them, and 2D plots can help them elicit these preferences. Is it acceptable to be cheap but have 35% error? When cost is low, increasing the cost marginally will decrease the error a lot, but when does it become too costly to minimize error? Pareto sorting on a 2D plot gives a clear idea of the optimal solutions for different combinations of objectives and an idea of the tradeoffs associated with those decisions. Furthermore, in DiscoveryDV, a pareto sort can be performed with a single click.

Brushing to Elicit Preferences

At this point, the stakeholder should have a better understanding of the tradeoffs that exist, but we need some way to help narrow down the set of possible options. One way to do this is to brush solutions based on stakeholder preference. The stakeholder can decide on an acceptable range for each objective from the full ranges shown below.

Objective

Range

Minimize Cost 7-46 sampled locations
Minimize Concentration Error 0-34.5 (Kg PCE per Cubic Meter of Aquifer)
Minimize Uncertainty 1284-1563 (Kg PCE per Cubic Meter Aquifer)2
Minimize Mass

 

0-48.64 (Kg PCE)

In order to implement the ranges, expand the brushing tab and add brushing rules for each objective. Adjust to your pre-specified ranges. On the “hiding” axis of the 3D plot, pick the “brushing” option. Only the solutions that meet the range requirements will remain (Figure 5).

Figure 5: Example of a Brushed Pareto Front

Parallel Axis Plots to View Tradeoffs

Now let’s create a set from our brushed data for further inspection. Click the analysis menu option and then click Create Set From > Brushed on Data. On your storyboard, you will see a new member, A, under your set membership. Now we will use a new type of plot, a parallel axis plot, to view the tradeoffs among these solutions.

  1. Right click on Plots in the storyboard and choose Create plot -> Parallel Coordinate Plot.
  2. Place your objectives of interest on the plot: Cost, Error, Risk, and Mass. Perhaps the stakeholder is particularly interested in cost. Let’s place cost on the color axis for extra visibility.

Each line in the parallel axis plot represents one of our solutions and where the lines cross between two axes indicates a tradeoff. Now, on the hiding axis, specify “brushing”. This will narrow down your parallel axis plot to only your brushed solutions.

Figure 6: Full and Brushed Parallel Axis Plots

You can also record an animation of your parallel axis plot and save it as a GIF (Figure 7)! In this GIF, I cycled through the solutions from lowest to highest cost.

2018-11-03-20-38-25_2

Figure 7: Parallel Axis Plot GIF

From here, it is completely up to the decision maker to decide which solution is best. You can double click on solutions of interest in your parallel axis plot to mark them. They will show up as M1 and M2 the marking tab of your storyboard. Then you can then mark them in your 3D plot using the marking axis and add captions for visualization purposes (Figure 8).

Fig8

Figure 8: Brushed 3D Plot with Caption

A user would typically take on the order of hours to create these visualizations from scratch, but in DiscoveryDV, most graphs can be created with a few clicks of the mouse. This is especially usefully when you are trying to quickly visualize large amounts of data. There is quite a bit more functionality that DiscoveryDV offers that wasn’t explored in this blog post, but once the user understands the interface, more complicated graphs can be made such as Figure 9, which is the culminating figure from the linked paper.

Figure 9: Negotiated Design Selection using 2D Tradeoffs