# MOEA Performance Metrics

“How well is my search algorithm doing?”, you ask.

Well, that’s a great question!

Let’s first look at the example of a single-objective, analytical formulation such as an LP.  Generally speaking we know there is either no solution, exactly one solution, or infinitely many solutions to the LP.  A solution is optimal, but it’s only optimal with respect how you defined your problem.  We’ll take the example of trying to optimize the production of items in a factory and show two potential issues with the classical approach.  First, an LP formulation of such a problem can’t consider economies of scale (i.e., the first unit costs just as much as the 100,000th unit).  You may need a more complicated model of your process, such as a full scale simulation model, to characterize your solution better.  Second, there may be multiple objectives to consider.  You can minimize cost in an LP, but what about resource usage? Time of production? And so on.

The approach we talk about here seeks to address these concerns.  Our approach is a simulation-optimization approach, which means the complex simulation model is embedded in the search — how the system is characterized in the simulation is included when you’re generating alternatives.  And it is many-objective, which means you can generate the explicit tradeoffs between conflicting objectives.  Such a tradeoff is called the Pareto-optimal frontier, and if you can’t prove you have the true Pareto optimal frontier you have the Pareto-approximate frontier.

But there is a challenge!  And the challenge relates to our original question.  For a difficult problem, it’s very hard to prove that you have the true Pareto optimal frontier.  Even if you have a best-known approximation to the frontier, how do you know “how good”  any single run of a MOEA has done, at getting you near that best-known approximation? The best known set (black), and one approximation of it (gray). Preferred solutions are found toward the lower left corner (i.e., a min-min problem)

In the image above, we want to minimize both objectives.  So preferred solutions are near the lower left corner of the figure.  The best known approximation set is shown in black.  A given run of an algorithm produces the gray points.  There’s two concepts we can consider.  The first is convergence — how “close” did we come to finding the black set?  The second is diversity — are there points that are fully spread across the set?  Or is there only a cluster at, say, high values of the first objective?

Visual evaluation of your approximation set can start to help you answer these questions.  It’s especially helpful if you can visualize the MOEA’s performance during the search process itself.  But this can be challenging for high-dimensional problems.  Luckily there are quantitative metrics that can give you a feeling for both convergence and diversity of your approximation sets.

## Quantitative Metrics

For the rest of this post, I wanted to just give an informal description of three important performance metrics.  For a full description, please see a reference such as our recent comparison of MOEA performance on water problems, published in Advances in Water Resources.

Also, users of MOEAframework can easily calculate these metrics for their own problems!

### Generational Distance Generational Distance illustration. Red lines show distance between approximation and the best approximation.

Generational Distance gives a good first approximation of the performance of your approximation set.  The red lines in the figure show a calculation of distance.  For each of the gray points, find the closest point in the best approximation.  The metric is then the average of these distances.  We consider this an “easy” metric to meet, because in a situation where you only have one solution close to the set, you’d have almost perfect Generational Distance performance.

### Epsilon Indicator Epsilon Indicator is the worst-case distance that you’d have to translate your set to have it “dominate” the best approximation.

Epsilon indicator is considered a “harder” metric to meet.  It’s based on the idea of one set “dominating” another: having better performance in both objectives.  Near the top of the figure, we see that the gray points are close to the black points.  But on the lower right side of the figure, the gray points are much further from the black ones.  Epsilon indicator uses the worst-case distance that you would have to translate the approximation set, in order to have it dominate the best-known approximation (see the red arrow in the illustration).

The metric is harder to meet, because now if you have a large gap in your approximation, you will have poorer metric performance.

### Hypervolume The hypervolume compares a multidimensional volume determined by the approximation (red) to the volume determined by the best known approximation (black), relative to a reference point.

The hypervolume metric captures both convergence and diversity.  It does this in a clever way — by looking at the multidimensional “volume” created by each set, relative to a reference point.  Such volumes are illustrated by the red and the black spaces on the figure.  You can see that you can’t really capture good hypervolume unless you have almost every point, across the full spread of objective function performance.  One downside is that the hypervolume is computationally intensive, but this is improving with new algorithms, some of which are supported by MOEAframework.