The ABCs of MOEAs

We have recently begun introducing multi-objective evolutionary algorithms (MOEAs) to a few new additions to our research group, and it reminded me of when I began learning the relevant terminology myself barely a year ago. I recalled using Antonia’s glossary of commonly-used terms as I was getting acquainted with the group’s research in general, and figured that it might be helpful to do something similar for MOEAs in particular.

This glossary provides definitions and examples in plain language for terms commonly used to explain and describe MOEAs, and is intended for those who are just being introduced to this optimization tool. It also has a specific focus on the Borg MOEA, which is a class of algorithms used in our group. It is by no means exhaustive, and since the definitions are in plain language, please do leave a comment if I have left anything out or if the definitions and examples could be better written.

Greek symbols

ε-box

Divides up the objective space into n-dimensional boxes with side length ε. Used as a “filter” to prevent too many solutions from being “kept” by the archive. The smaller the size of the ε-box, the finer the “mesh” of the filter, and the more solutions are kept by the archive. Manipulating the value of ε affects convergence and diversity.

Each ε-box can only hold one solution at a time. If two solutions are found that reside in the same ε-box, the solution closest to the lower left corner of the box will be kept, while the other will be eliminated.

ε-dominance

A derivative of Pareto dominance. A solution x is said to ε-dominate solution y if it lies in the lower left corner of an ε-box for at least one objective, and is not ε-dominated by solution y for all other objectives.

ε-progress

ε-progress occurs when the current solution x lies in an different ε-box that dominates the previous solution. Enforces a minimum threshold ( ε ) over which an MOEA’s solution must exceed to avoid search stagnation.

ε-value

The “resolution” of the problem. Can also be interpreted a measure of the degree of error tolerance of the decision-maker. The ε-values can be set according to the discretion of the decision-maker.

A

A posteriori

Derived from Latin for “from the latter”. Typically used in multi-objective optimization to indicate that the search for solutions precedes the decision-making process. Exploration of the trade-offs resulting from different potential solutions generated by the MOEA is used to identify preferred solutions. Used when uncertainties and preferences are not known beforehand.

A priori

Derived from Latin for “from the former”. Typically used in multi-objective optimization to indicate that a set of potential solutions have already been decided beforehand, and that the function of a search is to identify the best solution(s). Used when uncertainties and preferences are known (well-characterized).

Additive ε-indicator

The distance that the known Pareto front must “move” to dominate the true Pareto front. In other words, the gap between the current set of solutions and the true (optimal) solutions. A performance measure of MOEAs that captures convergence. Further explanation can be found here.

Archive

A “secondary population” that stores the non-dominated solutions. Borg utilizes ε-values to bound the size of the archive (an ε-box dominance archive) . That is, solutions that are ε-dominated are eliminated. This helps to avoid deterioration.

C

Conditional dependencies

Decision variables are conditionally dependent on each other if the value of one decision variable affects one or more if its counterparts.

Control maps

Figures that show the hypervolume achieved in relation to the number of objective function evaluations (NFEs) against the population size for a particular problem. Demonstrates the ability of an MOEA to achieve convergence and maintain diversity for a given NFE and population size. An ideal MOEA will be able to achieve a high hypervolume for any given NFE and population size.

Controllability

An MOEA with a high degree of controllability is one that results in fast convergence rates, high levels of diversity, and a large hypervolume regardless of the parameterization of the MOEA itself. That is, a controllable MOEA is insensitive to its parameters.

Convergence

Two definitions:

  1. An MOEA is said to have “converged” at a solution when the subsequent solutions are no better than the previous set of solutions. That is, you have arrived at the best set of solutions that can possibly be attained.
  2. The known Pareto front of the MOEA is approximately the same as the true Pareto front. This definition requires that the true Pareto front be known.

D

Decision variables

Variables that can be adjusted and set by the decision-maker.

Deterioration

Occurs when elements of the current solution are dominated by a previous set of solutions within the archive. This indicates that the MOEA’s ability to find new solutions is diminishing.

Diversity

The “spread” of solutions throughout the objective space. An MOEA is said to be able to maintain diversity if it is able to find many solutions that are evenly spread throughout the objective space.

Dominance resistance

A phenomenon in which an MOEA struggles to produce offspring that dominate non-dominated members of the population. That is, the current set of solutions are no better than the worst-performing solutions of the previous set. An indicator of stagnation.

E

Elitist selection

Refers to the retention of a select number of ‘best’ solutions in the previous population, and filling in the slots of the current generation with a random selection of solutions from the archive. For example, the Borg MOEA utilizes elitist selection during the randomized restarts when the best k-solutions from the previous generation are maintained in the population.

Epistasis

Describes the interactions between the different operators used in Borg MOEA. Refers to the phenomenon in which the heavier applications of one operator suppresses the use of other operators, but does not entirely eliminate the use of the lesser-used operators. Helps with finding new solutions. Encourages diversity and prevents pre-convergence.

G

Generation

A set of solutions generated from one iteration of the MOEA. Consists of both dominated and non-dominated solutions.

Generational

Generational MOEAs apply the selection, crossover and mutation operators all at once to an entire generation of the population. The result is a complete replacement of the entire generation at the next time-step.

Generational distance

The average distance between the known Pareto front and the true Pareto front. The easiest performance metric to meet, and captures convergence of the solutions. Further explanation can be found here.

Genetic algorithm

An algorithm that uses the principles of evolution – selection, mutation and crossover – to search for solutions to a problem given a starting point, or “seed”.

H

Hypervolume

The n-dimensional “volume” covered by the known Pareto front with respect to the total n-dimensional volume of all the objectives of a problem, bounded by a reference point. Captures both convergence and diversity. One of the performance measures of an MOEA. Further explanation can be found here.

I

Injection

The act of “refilling” the population with members of the archive after a restart. Injection can also include filling the remaining slots in the current population with new, randomly-generated solutions or mutated solutions. This helps to maintain diversity and prevent pre-convergence.

L

Latin hypercube sampling (LHS)

A statistical method of sampling random numbers in a way that reflects the true underlying probability distribution of the data. Useful for high-dimensional problems such as those faced in many-objective optimization. More information on this topic can be found here.

M

Many-objective problem

An optimization problem that involves more than three objectives.

Mutation

One of the three operators used in MOEAs. Mutation occurs when a solution from the previous population is slightly perturbed before being injected into the next generation’s population. Helps with maintaining diversity of solutions.

Multi-objective

An optimization problem that traditionally involves two to three objectives.

N

NFE

Number of function evaluations. The maximum number of times an MOEA is applied to and used to update a multi (or many)-objective problem.

O

Objective space

The n-dimensional space defined by the number, n, of objectives as stated by the decision-maker. Can be thought of as the number of axes on an n-dimensional graph.

Offspring

The result of selection, mutation, or crossover in the current generation. The new solutions that, if non-dominated, will be used to replace existing members in the current generation’s population.

Operator

Genetic algorithms typically use the following operators – selection, crossover, and mutation operators. These operators introduce variation in the current generation to produce new, evolved offspring. These operators are what enable MOEAs to search for solutions using random initial solutions with little to no information.

P

Parameters

Initial conditions for a given MOEA. Examples of parameters include population-to-archive ratio, initial population size, and selection ratio.

Parameterization

An MOEA with a high degree of parameterization implies that it requires highly-specific parameter values to generate highly-diverse solutions at a fast convergence rate.

Parents

Members of the current generation’s population that will undergo selection, mutation, and/or crossover to generate offspring.

Pareto-dominance

A solution x is said to Pareto-dominate another solution y if x performs better than y in at least one objective, and performs at least as well as y in all other objectives.

Pareto-nondominance

Both solutions x and y are said to be non-dominating if neither Pareto-dominates the other. That is, there is at least one objective in which solution x that is dominated by solution y and vice versa.

Pareto front

A set of solutions (the Pareto-optimal set) that are non-dominant to each other, but dominate other solutions in the objective space. Also known as the tradeoff surface.

Pareto-optimality

A set of solutions is said to have achieved Pareto-optimality when all the solutions within the same set non-dominate each other, but are dominant to other solutions within the same objective space. Not to be confused with the final, “optimal” set of solutions.

Population

A current set of solutions generated by one evaluation of the problem by an MOEA. Populated by both inferior and Pareto-optimal solutions; can be static or adaptive. The Borg MOEA utilizes adaptive population sizing, of which the size of the population is adjusted to remain proportional to the size of the archive. This prevents search stagnation and the potential elimination of useful solutions.

Pre-convergence

The phenomenon in which an MOEA mistakenly converges to a local optima and stagnates. This may lead the decision-maker to falsely conclude that the “best” solution set has been found.

R

Recombination

One of the ways that a mutation operator acts upon a given solution. Can be thought of as ‘shuffling’ the current solution to produce a new solution.

Rotation

Applying a transformation to change the orientation of the matrix (or vector) of decision variables. Taking the transpose of a vector can be thought of as a form of rotation.

Rotationally invariant

MOEAs that utilize rotationally invariant operators are able to generate solutions for problems and do not require that the problem’s decision variables be independent.

S

Search stagnation

Search stagnation is said to have occurred if the set of current solutions do not ε-dominate the previous set of solutions. Detected by the ε-progress indicator (ref).

Selection

One of the three operators used in MOEAs. The selection operator chooses the ‘best’ solutions from the current generation of the population to be maintained and used in the next generation. Helps with convergence to a set of optimal solutions.

Selection pressure

A measure of how ‘competitive’ the current population is. The larger the population and the larger the tournament size, the higher the selection pressure.

Steady-state

A steady-state MOEA applies its operators to single members of its population at a time. That is, at each step, a single individual (solution) is selected as a parent to be mutated/crossed-over to generate an offspring. Each generation is changed one solution at each time-step.

T

Time continuation

A method in which the population is periodically ’emptied’ and repopulated with the best solutions retained in the archive. For example, Borg employs time continuation during its randomized restarts when it generates a new population with the best solutions stored in the archive and fills the remaining slots with randomly-generated or mutated solutions.

Tournament size

The number of solutions to be ‘pitted against each other’ for crossover or mutation. The higher the tournament size, the more solutions are forced to compete to be selected as parents to generate new offspring for the next generation.

References

Coello, C. C. A., Lamont, G. B., & Van, V. D. A. (2007). Evolutionary Algorithms for Solving Multi-Objective Problems Second Edition. Springer.

Hadjimichael, A. (2017, August 18). Glossary of commonly used terms. Water Programming: A Collaborative Research Blog. https://waterprogramming.wordpress.com/2017/08/11/glossary-of-commonly-used-terms/.

Hadka, D., & Reed, P. (2013). Borg: An Auto-Adaptive Many-Objective Evolutionary Computing Framework. Evolutionary Computation, 21(2), 231–259. https://doi.org/10.1162/evco_a_00075

Kasprzyk, J. R. (2013, June 25). MOEA Performance Metrics. Water Programming: A Collaborative Research Blog. https://waterprogramming.wordpress.com/2013/06/25/moea-performance-metrics/.

Li, M. (n.d.). Many-Objective Optimisation. https://www.cs.bham.ac.uk/~limx/MaOP.html.

What is Latin Hypercube Sampling? Statology. (2021, May 10). https://www.statology.org/latin-hypercube-sampling/.

Lower dimensional visualizations of many-objective Pareto Fronts

Understanding the geometry of a many-objective Pareto front can be challenging due to the high dimensionality of many-objective problems. Often, we use tools such as parallel coordinate plots, glyph plots and pairwise scatter plot matrices to identify trade-offs and select candidate alternatives. While these tools are useful to decision making, they don’t always capture patterns or geometric properties that may be embedded withing many-objective Pareto fronts.

Mathematical tools for visualizing high dimensional data on lower dimensional manifolds have existed for decades, and in recent years they have been applied in many-objective contexts (Filipac and Tusar, 2018). In this post I’ll overview four common methods for representing Pareto fronts on lower dimensional manifolds and demonstrate them on two many-objective test problems: the four objective DTLZ 2 and four objective DTLZ 7 (Deb et al., 2005).

Parallel coordinate plots of the two test problems can be found in Figure 1. DTLZ 2 has a continuous Pareto front, while DTLZ 7 has a highly disconnected Pareto front. Both Pareto fronts visualized here are the analytical true Pareto fronts from the MOEAFramework.

I’ve added the code to produce the plots in this post to my repository on many-objective visualization, which can be found here.

Figure 1: Parallel coordinate plots of the four objective DTLZ 2 (left) and DTLZ 7 (right)

1. Multidimensional Scaling

Multidimensional Scaling (MDS) refers to a family of techniques that seek low dimensional representations of high dimensional spaces by preserving pairwise distances between points (Kruskal, 1978). Classic MDS attempts to preserve the euclidean distance between each pair of points by minimizing a stress function defined as:

stress = (\frac{\sum_i \sum_j (f(x_{ij}) - d_{ij})^2}{\sum_i \sum_j d_{ij}^2})^{1/2}

Where:

f(x_{ij}) is the euclidean distance between points x_i and x_j in the full dimensional space. (Note: extensions of MDS have been developed that substitute this distance for a weakly monotonic transformation of the original points)

d_{ij} is the euclidean distance between points x_i and x_j in the lower dimensional representation

To perform MDS on the test problem Pareto Fronts, I used the Manifold tool from the Yellowbrick package, a machine learning visualization module associated with sklearn. MDS representations of four objective DTLZ 2 and DTLZ 7 and shown in Figure 2. For the highly disconnected DTLZ 7 problem, MDS clearly distinguishes the 8 clusters within the Pareto Front.

Figure 2: MDS representations of the four objective DTLZ 2 (left) and DTLZ 7 (right)

2. IsoMaps

IsoMaps are an extension of MDS that first clusters points using K-nearest neighbors, then maps the points to a lower dimensional space by minimizing the geodesic distances between clusters. To create IsoMap visualizations for the test problems, I again utilized the Yellowbrick manifold function. IsoMap projections for four objective DTLZ 2 and DTLZ 7 are shown in Figure 3. Like MDS, IsoMapping is able to clearly demonstrate the disconnected nature of the DTLZ 7 Pareto front. I should mention that I had to use 30 neighbors to achieve this, which is much higher than the 8-12 neighbors recommended as an unofficial rule of thumb. This could be a result of the highly disconnected nature of DTLZ 7, which may cause problems for IsoMap.

Figure 3: IsoMap representations of the four objective DTLZ 2 (left) and DTLZ 7 (right)

3. Sammon Mapping

Like MDS and IsoMapping, Sammon Mapping (Sammon, 1969) seeks to find a lower dimensional representation that preserves the the distances between each point in the Pareto front from the high dimensional space. Sammon Mapping uses a modified version of stress known as “Sammon Stress”, defined as:

S =\sum_{i} \sum_{j>i} \frac{(d^{*}_{ij} - d_{ij})^2}{d^{*}_{ij}}

Where:

d^{*}_{ij}: is the distance between points x_i and x_j in the full objective space

d_{ij}: is the distance between points x_i and x_j in the lower dimensional space

The key difference between Sammon Stress and the classic MDS stress is that Sammon Stress is normalized by the distance in the high dimensional space rather than the low dimensional representation. This function is usually minimized by gradient descent.

I coded the Sammon maps for the two test problems using an open source implementation from tompollard on Github. Like the other two methods, Sammon mapping highlights the disconnected nature of DTLZ 7 while showing a relatively continuous representation of DTLZ 2 that suggests its spherical shape.

Figure 4: Sammon mapping representations of the four objective DTLZ 2 (left) and DTLZ 7 (right)

4. Self Organizing Maps

Self Organizing Maps (SOM; Kohonen, 1982) use an artificial neural network to create a discrete, low dimensional representation of a high dimensional space. SOMs are a form of unsupervised machine learning that are used in both classification and dimensional reduction applications.

To map the high dimensional data, SOMs start with a uniformly spaced grid of neurons, then implement a competitive learning algorithm to associate each neuron with a set of Pareto front solutions. This video has the best explanation I’ve seen on how SOMs work (thanks to Rohini for sharing it with me). I also found this Gif from Wikicommons to be helpful in understanding SOMs.

Pareto front visualizations using SOMs plot the the original uniform grid of neurons on an x-y plane, and the distance between neurons of the final map as the color. Grid points with dark shading (long distance between final neurons) indicate boundaries between clusters in the high dimensional space. SOMs for the four objective DTLZ 2 and DTLZ 7 problems are shown in Figure 5. The disconnected clusters in DTLZ 7 are clearly apparent while no real structure is shown for DTLZ 2.

Figure 5: SOM representations of the four objective DTLZ 2 (left) and DTLZ 7 (right)

Concluding thoughts

To be perfectly honest, I don’t think that any of the methods described in this post are particularly useful for understanding many-objective optimization results if used on their own. However, they may be a useful complement when exploring solution sets and finding patterns that may not be apparent when visualizing the full dimensional space. Additionally, they are all fairly straight forward to code and can easily be included in an exploratory analysis.

References

Deb, K., Thiele, L., Laumanns, M., & Zitzler, E. (2005). Scalable test problems for evolutionary multiobjective optimization. In Evolutionary multiobjective optimization (pp. 105-145). Springer, London.

Filipič, B., & Tušar, T. (2018, July). A taxonomy of methods for visualizing pareto front approximations. In Proceedings of the Genetic and Evolutionary Computation Conference (pp. 649-656).

Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological cybernetics, 43(1), 59-69.

Kruskal, J. B. (1978). Multidimensional scaling (No. 11). Sage.

Sammon, J. W. (1969). A nonlinear mapping for data structure analysis. IEEE Transactions on computers, 100(5), 401-409.

Beyond Hypervolume: Dynamic visualization of MOEA runtime

Multi-objective evolutionary algorithms have become an essential tool for decision support in water resources systems. A core challenge we face when applying them to real world problems is that we don’t have analytic solutions to evaluate algorithmic performance, i.e. since we don’t know what solutions are possible before hand, we don’t have a point of reference to assess how well our algorithm is performing. One way we can gain insight into algorithmic performance is by examining runtime dynamics. To aid our understanding of the dynamics of the Borg MOEA, I’ve written a small Python library to read Borg runtime files and build a dynamic dashboard that visualizes algorithmic progress.

The Borg MOEA produces runtime files which track algorithmic parameters and the evolving Pareto approximate set over an optimization run. Often we use these data to calculate performance metrics, which provide information on the relative convergence of an approximation set and the diversity of solutions within it (for background on metrics see Joe’s post here). Commonly, generational distance, epsilon indicator and hypervolume are used to examine quality of the approximation set. An animation of these metrics for the 3 objective DTLZ2 test problem is shown in Figure 1 below.

Figure 1: Runtime metrics for the DTLZ2 test problem. The x-axis is number of function evaluations, the y-axis is the each individual metric

While these metrics provide a helpful picture of general algorithmic performance, they don’t provide insight into how the individual objectives are evolving or Borg’s operator dynamics.

Figure 2 shows a diagnostic dashboard of the same 3 objective DTLZ2 test problem run. I used the Celluloid python package to animate the figures. I like this package because it allows you to fully control each frame of the animation.

Figure 2: DTLZ2 runtime dynamics. The tree objectives are shown in a scatter plot and a parallel axis plot. The third figure plots hypervolume over the optimization run and the bottom figure shows Borg’s operator dynamics. (a higher resolution version of this file can be found here: https://www.dropbox.com/s/0nus5xhwqoqtghk/DTLZ2_runtime.gif?dl=0)

One thing we can learn from this dashboard is that though hypervolume starts to plateau around 3500 NFE, the algorithm is still working to to find solutions that represent an adequately diverse representation of the Pareto front. We can also observe that for this DTLZ2 run, the SPX and SBX operators were dominant. SBX is an operator tailored to problems with independent decision variables, like DTLZ2, so this results make sense.

I’m planning on building off this dashboard to include a broader suite of visualization tools, including pairwise scatter plots and radial plots. The library can be found here: https://github.com/dgoldri25/runtimeDiagnostics

If anyone has suggestions or would like to contribute, I would love to hear from you!

Simple Python script to create a command to call the Borg executable

Please see my GitHub for a new Python program that may be of interest. If the link becomes broken in the future, look up the ‘misc_scripts’ repository within ‘jrkasprzyk’ GitHub.

It saves you having to type out a long Borg command when using the ‘command line interface’, especially when there are many many decision variables. The script also contains a simple set of Python objects of interest: Objective contains the name and epsilon of an objective, and Decision contains the name, lower bound, and upper bound of an objective.

You can imagine that this would be able to facilitate complicated MOEA analyses such as random seed analysis, and the running of multiple problems. I have also used this to organize my thoughts when doing other analyses in Python such as sampling of the decision space, and different sorts of processing of MOEA output.

Training video: Java file for external problems

In the next few weeks, we’ll be adding blog posts relating to our MOEAframework training.  They are a little bit out of order now, but we may rearrange them as things move forward.

Here we continue our discussion of the “external problem” example in MOEAframework.  A nice feature of the software is the ability to link a problem in a different programming language to the Java MOEA software.  In this video I walk through the different parts of the java file and run a quick example.

Training video: MOEAFramework GUI

In the next few weeks, we’ll be adding blog posts relating to our MOEAframework training.  They are a little bit out of order now, but we may rearrange them as things move forward.

The following video deals with using the GUI for MOEAFramework.  The GUI is best suited for those beginning their MOEA training.  The following video shows how to download and operate the GUI as well as two simple problems:

  1. Running two different problems with the same algorithm (DTLZ1-2D and DTLZ1-2D-Rotated with eNSGAII)
  2. Running two different algorithms on the same problem. (DTLZ1-2D-Rotated with eNSGAII and eMOEA)

Training Video: External problems in MOEAFramework

In the next few weeks, we’ll be adding blog posts relating to our MOEAframework training.  They are a little bit out of order now, but we may rearrange them as things move forward.

Today’s post is a video overview of a simple “external” problem for MOEAframework.  External means that the objective function is coded in another language, compiled in an executable, and it communicates with the MOEAframework algorithms via the command line.  I walk through different components of a simple code file, rosenbrock.c, compile it, and show how to interact with the program via the command line.