Part 4 wraps up the MOEAFramework training by taking the metrics generated in Part 3 and visualizing them to gain general insight about algorithm behavior and assess strengths and weaknesses of the algorithms.
The .metrics files stored in the data_metrics folder look like the following:
Metrics are reported every 1000 NFE and a .metrics file will be created for each seed of each parameterization of each algorithm. There are different ways to proceed with merging/processing metrics depending on choice of visualization. Relevant scripts that aren’t in the repo can be found in this zipped folder along with example data.
Creating Control Maps
When creating control maps, one can average metrics across seeds for each parameterization or use best/worst metrics to try to understand the best/worse performance of the algorithm. If averaging metrics, it isn’t unusual to find that all metrics files may not have the same number of rows, therefore rendering it impossible to average across them. Sometimes the output is not reported as specified. This is not unusual and rather just requires you to cut down all of your metric files to the greatest number of common rows. These scripts are found in ./MOEA_Framework_Group/metrics
1.Drag your metrics files from data_metrics into ./MOEA_Framework_Group/metrics and change all extensions to .txt (use ren .metrics .txt in command prompt to do so)
2.Use Cutting_Script.R to find the maximum number of rows that are common among all seeds. This will create new metric files in the folder, Cut_Files
Now these files can be averaged and grouped with their corresponding parameter values.
1.Seed_Merge.R: Creates a text file with the average hypervolume for each parameterization for each algorithm (i.e. hypervolume_Borg.txt)
2.Add Borg_Samples.txt and NSGAII_Samples.txt to the folder
3.Make_Final_Table.R: takes the population values from the sample file and the hypervolume values for each parameterization and puts them into a final matrix in a form to be accepted by the control map code.
In order to create control maps, you need to first find the reference set hypervolume, because all metrics are normalized to this overall hypervolume. This can be done using the following command and the HypervolumeEval java class written by Dave Hadka.
$ java -cp MOEAFramework-2.12-Demo.jar HypervolumeEval ./data_ref/lake.ref >> lake_ref.hypervolume
Finally, use Control_Map_Borg.py and Control_Map_NSGAII.py make your control maps.
Initial population size on the x-axis can be regarded as a proxy for different parameterizations and the y-axis shows the number of NFE. The color represents the percentage of the overall reference hypervolume that is achieved. Control maps highlight a variety of information about an algorithm: Controllability (sensitivity to parameterization), Effectiveness (quality of approximation sets), and Efficiency (how many NFE it takes to achieve high quality solutions)
Ideally, we would want to see a completely dark blue map that would indicate that the algorithm is able to find high quality solutions very quickly for any parameterization. We can see that this is not the case for either of the algorithms above. Any light streaks indicate that for that particular parameterization, the algorithm had a harder time achieving high quality solutions. Borg is generally robust to parameterization and as seen, if allowed more NFE, it will likely result in a more even blue plot.
Creating Attainment Plots
To create attainment plots:
1.Drag metrics back from Cut_Files back into the data_metrics_new directory on the Cube.
2.Use the average_metrics.sh script to average out the metrics to obtain a set of average metrics across seeds for each algorithm.
3.Concatenate parameter files using: cat NSGAII_lake_*.average >> NSGAII_Concatenate_Average_Metrics.txt
4.Use Example_Attain.m to find the best metric values and to calculate the probability of attainment of the best metrics.
5.Create attainment vectors with build_attainment_matrix.py
6.Plot with color_mesh.py
Attainment plots highlight the following:
Reliability of the algorithm: In general, we would like to see minimal attainment variability which would suggest that our algorithm reliably produces solutions of high quality across seeds. The white circles show the algorithm’s single best run for each metric. The gradient shows the probability of obtaining the best metric value. You can see here that the algorithms are able to reliably obtain high generational distance metrics. However, remember that generational distance is an easy metric to meet. For the other two metrics, one can see that while NSGAII obtains the best metric values, Borg has a slightly higher reliability of obtaining high hypervolume values which arguably is a more important quality that demonstrates robustness in the algorithm.
There are some extra visualizations that can be made to demonstrate algorithmic performance.
Reference Set Contribution
How much of the reference set is contributed by each algorithm?
1.Copy MOEAFramework jar file into data_ref folder
2.Add a # at the end of the individual algorithm sets
3.java -cp MOEAFramework-2.12-Demo.jar org.moeaframework.analysis.sensitivity.SetContribution -e 0.01,0.01,0.0001,0.001 -r lake.ref Borg_lake.set NSGAII_lake.set > lake_set_contribution.txt
lake_set_contribution.txt, as seen above, reports the percentage (as a decimal) of the reference set that is contributed by each algorithm and includes unique and non-unique solutions that could have been found by both algorithms. Typically, these percentages are shown in a bar chart and would effectively display the stark difference between the contribution made by Borg and NSGAII in this case.
Random Seed Analysis
A random seed analysis is a bit of a different experiment and requires just one parameterization, the default parameterization for each algorithm. Usually around 50 seeds of the defaults are run and the corresponding average hypervolume is shown as solid line while the 5th and 95th percentile confidence interval across seeds is shown as shading. Below is an example of such a plot for a different test case:
This style of plot is particularly effective at showcasing default behavior, as most users are likely to use the algorithms “straight out of the box”. Ideally, the algorithms have monotonically increasing hypervolume that reaches close to 1 in a small number of functional evaluations and also thin shading to indicate low variability among seeds. Any fluctuations in hypervolume indicates deterioration in the algorithm, which is a result of losing non-dominated solutions and is an undesirable quality of an algorithm.