This blog post follows part 1 of this series and provides a tutorial for the statistical analysis I performed on my diagnostic results. After completing the briefly described diagnostic steps, I had 900 files that looked like this.

`./SOW4/metrics/Borg_myLake4ObjStoch_1.metrics`

Threshold: 0.75

Best: 0.9254680538114082

Attainment: 0.328

Controllability: 0.0

Efficiency: 1.0

There were 900 as I evaluated 3 metrics for each of 50 seeds for 6 six algorithms. To evaluate statistics, I first had to read in the information of interest from the files. The following snippet of Matlab code is useful for that.

clc; clear all; close all; algorithms = {'Borg', 'eMOEA', 'eNSGAII', 'NSGAII', 'MOEAD', 'GDE3'}; seeds = (1:1:50); metrics = {'GenDist'; 'EpsInd'; 'Hypervolume';}; % work = sprintf('./working_directory/'); %specify the working directory problem = 'Problem'; %Loop through metrics for i=1:length(metrics) %Loop through algorithms for j=1:length(algorithms) %Loop through seeds for k=1:length(seeds) %open and read files filename = ['./' metrics{i} '_' num2str(75) '_' algorithms{j} '_' num2str(seeds(k)) '.txt']; fh = fopen(filename); if(fh == -1) disp('Error opening analysisFile!'); end values = textscan(fh, '%*s %f', 5, 'Headerlines',1); fclose(fh); values = values{1}; threshold(k,j,i) = values(1); best(k,j,i) = values(2); if strcmp(metrics{i},'Hypervolume'); best(k,j,i) = best(k,j,i)/(threshold(k,j,i)/(75/100)); end; %Normalize the best Hypervolume value to be between 0 and 1 attainment(k,j,i) = values(3); controllability(k,j,i) = values(4); efficiency(k,j,i) = values(5); end end end

The above code loops through all metrics, algorithms, and seeds to load and read from the corresponding files. Each of the numerical values was stored in an appropriately named 3 dimensional array. For this tutorial, I am going to focus on a statistical analysis of the probability of attainment for the Hypervolume metric across all algorithms. The values of interest are stored in the attainment array.

Below is code that can be used for the Kruskall Wallis nonparametric one-way ANOVA.

P = kruskalwallis(attainment(:,:,3),algorithms,'off');

The above code performs the kruskalwallis test on the matrix of 50 rows and 6 columns storing the Hypervolume attainment values and determines if there is a difference in performance between any 2 of the groups (columns). In this case the 6 groups are the algorithms. The ‘off’ flag indicates whether you want a graphical display. More information can be found using Matlab help as you probably already know.

If this returns a small p-value indicating that there is a difference between at least 2 of your algorithms, it is worthwhile to continue to the Mann Whitney U test. I used the following code to loop through the possible algorithm pairs and determine if there was a statistical difference in the probability of attaining the 75% threshold for hypervolume. I wasn’t very concerned about the P value so it is constantly overwritten, but I stored the binary indicating whether or not to reject the null hypothesis in a matrix. In this case, I used an upper tail test as algorithms with higher probabilities of attaining a threshold are considered to perform better.

for i = 1:length(algorithms) for j = 1:length(algorithms) [P3, Mann_Whitney_U_Hyp(i,j)] = ranksum(attainment(:,i,3),... attainment(:,j,3),... 'alpha', 0.05, 'tail', 'right'); end end

In the end, you get a table of zeros and ones like that below.

As the help function indicates that a 1 indicates that the null hypothesis that two medians are the same can be rejected and the alternative hypothesis tested in this case was that the median of the first group is higher, this table was used to rank algorithm performance. As the loop went row then column, Borg is considered the best performing algorithm in this case as its median is higher than the median for all other algorithms at the 5% level of significance. The rest of my ranking goes NSGAII, eNSGAII, eMOEA, GDE3, and MOEA/D.

Hopefully, this is helpful to at least one person in the future. Feel free to use the snippets in your own code to automate the whole process. 🙂

Pingback: Determining Whether Differences in Diagnostic Results are Statistically Significant (Part 1) | Water Programming: A Collaborative Research Blog