A few weeks ago I filmed a video training guide to the Rhodium framework for the annual meeting of the society for Decision Making Under Deep Uncertainty. Rhodium is a Python library that facilitates Many Objective Robust Decision making. The training walks through a demonstration of Rhodium using the Lake Problem. The training introduces a live Jupyter notebook Antonia and I created using Binder.
A parallel axis plot is a simple way to convey a lot of information in a meaningful and easy-to-understand way. Also known as parallel coordinate plots (PCP), it is a visualization technique used to analyze multivariate numerical data (Weitz, 2020), or in the case of multi-objective optimization, to analyze tradeoffs between multiple conflicting objectives. As someone new to the field of multi-objective optimization, I found them particularly helpful as I tried to wrap my head around the multi-dimensional aspects of this field.
There are multiple tools in Python that you can use to generate PCPs. There are several different posts by Bernardoand Jazminthat utilize the Pandas and Plotting libraries to do so. In this post, I would like to explain a little about how you can generate a decent PCP using only Numpy and Matplotlib.
For context, I used a PCP to contrast the non-dominated solutions from the entire reference set of the optimized GAA problem reference set.
For the beginner, the figure above demonstrates three important visualization techniques in generating PCPs: color, brushing, and axis ordering. Firstly, it is important to consider using colors that stand on opposite sides on the color wheel to contrast the different types of information you are presenting. Next, brushing should be used to divert the viewer’s attention away from any information deemed unnecessary, highlight vital information, or to prove a point using juxtaposition. Finally, the ordering of the axes is important, particularly when presenting conflicting objectives. It is best for all axes to be oriented in one “direction of preference”, so that the lines between each adjacent axis can represent the magnitude of the tradeoff between two objectives. Thus, the order in which these axes are placed will significantly affect the way the viewer perceives the tradeoffs, and should be well-considered.
To help with understanding the how to generate a PCP, here is a step-by-step walk-through of the process.
1. Import all necessary libraries, load data and initialize the Matplotlib figure
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import ticker
# load data
all_soln = np.loadtxt(open("GAA-reference-set.csv"), delimiter=",")
nd_indices = np.loadtxt(open("non-dominated-index.csv"), delimiter=",")
# identify and list the objectives of the reference set
objs = ['NOISE', 'WEMP', 'DOC', 'ROUGH', 'WFUEL', 'PURCH', 'RANGE', 'LDMAX', 'VCMAX', 'PFPF']
# create an array of integers ranging from 0 to the number of objectives
x = [i for i, _ in enumerate(objs)]
# sharey=False indicates that all the subplot y-axes will be set to different values
fig, ax = plt.subplots(1,len(x)-1, sharey=False, figsize=(15,5))
Two sets of data are loaded:
all_soln: the entire reference set
nd_indices: an array of row indices of the non-dominated solutions from all_soln
In Line 16, we are initializing a figure fig and an array of axis objects ax. I find that having an array of axes helps me better control tick locations and labeling, since I can iterate over them in a loop.
Bear in mind that this is simply an example. It is also possible to obtain the non-dominated set directly from the the reference set by performing a Pareto sort.
2. Normalize the objective values in all_soln
min_max_range = {}
for i in range(len(objs)):
all_soln[:,i] = np.true_divide(all_soln[:,i] - min(all_soln[:,i]), np.ptp(all_soln[:,i]))
min_max_range[objs[i]] = [min(all_soln[:,i]), max(all_soln[:,i]), np.ptp(all_soln[:,i])]
All values in all_soln are normalized by subtracting the minimum value from each objective, then dividing it by the range of values for that objective. The min_max_range dictionary contains the minimum, maximum and range of values for each objective. This will come in handy later on.
3. Iterate through all the axes in the figure and plot each point
I used the enumerate function here. It may seem somewhat confusing at first, but it basically keeps count of your iterations as your are iterating through an object (ie: a list, an array). More information on how it works can be found here.
for i, ax_i in enumerate(ax):
for d in range(len(all_soln)):
if ((d in nd_indices)== False):
if (d == 0):
ax_i.plot(objs, all_soln[d, :], color='lightgrey', alpha=0.3, label='Dominated', linewidth=3)
else:
ax_i.plot(objs, all_soln[d, :], color='lightgrey', alpha=0.3, linewidth=3)
ax_i.set_xlim([x[i], x[i+1]])
for i, ax_i in enumerate(ax):
for d in range(len(all_soln)):
if (d in nd_indices):
if (d == nd_indices[0]):
ax_i.plot(objs, all_soln[d, :], color='c', alpha=0.7, label='Nondominated', linewidth=3)
else:
ax_i.plot(objs, all_soln[d, :], color='c', alpha=0.7, linewidth=3)
ax_i.set_xlim([x[i], x[i+1]])
All solutions from the non-dominated set are colored cyan, while the rest of the data is greyed-out. This is an example of brushing. Note that only the first line plotted for both sets are labeled, and that the grey-out data is plotted first. This is so the non-dominated lines are shown clearly over the brushed lines.
4. Write a function to position y-axis tick locations and labels
The set_ticks_for_axis() function is key to this process as it grants you full control over the labeling and tick positioning of your y-axes. It has three inputs:
dim: the index of a value from the objs array
ax_i: the current axis
ticks: the desired number of ticks
def set_ticks_for_axis(dim, ax_i, ticks):
min_val, max_val, v_range = min_max_range[objs[dim]]
step = v_range/float(ticks)
tick_labels = [round(min_val + step*i, 2) for i in range(ticks)]
norm_min = min(all_soln[:,dim])
norm_range = np.ptp(all_soln[:,dim])
norm_step =(norm_range/float(ticks-1))
ticks = [round(norm_min + norm_step*i, 2) for i in range(ticks)]
ax_i.yaxis.set_ticks(ticks)
ax_i.set_yticklabels(tick_labels)
Hello min_max_range! This dictionary essentially makes accessing the extreme values and range of each objective easier and less mistake-prone. It is optional, but I do recommend it.
Overall, this function does two things:
Creates ticks-evenly spaced tick-marks along ax_i.
Labels ax_i with tick labels of size ticks. The tick labels are evenly-spaced values generated by adding step*i to min_val for each iteration i.
A nice thing about this function is that is also preserves the order that the objective values should be placed along the axis, which makes showing a direction of preference easier. It will be used to label each y-axis in our next step.
5. Iterate over and label axes
for dim, ax_i in enumerate(ax):
ax_i.xaxis.set_major_locator(ticker.FixedLocator([dim]))
set_ticks_for_axis(dim, ax_i, ticks=10)
FixedLocator() is a subclass of Matplotlib’s ticker class. As it’s name suggests, it fixes the tick locations and prevents changes to the tick label or location that may possibly occur during the iteration. More information about the subclass can be found here.
Here, you only need to label the x-axis with one label and one tick per iteration (hence Line 2). On the other hand, you are labeling the entire y-axis of ax_i, which is where you need to use set_ticks_for_axis().
6. Create a twin axis on the last axis in ax
ax2 = plt.twinx(ax[-1])
dim = len(ax)
ax2.xaxis.set_major_locator(ticker.FixedLocator([x[-2], x [-1]]))
set_ticks_for_axis(dim, ax2, ticks=10)
ax2.set_xticklabels([objs[-2], objs[-1]])
Creating a twin axis using plt.twinx() enables you to label the last axis with y-ticks. Line 3 and 5 ensure that the x-axis is correctly labeled with the last objective name.
Be sure to remember to label the direction of preference, and one you’ve saved your plot, you’re done!
The source code to generate the following plot can be found here. I hope this makes parallel axis plots a little more understandable and less intimidating.
For part three of this introduction to ggplot, I will go over an example of a user-friendly library that can easily animate your plot: “gganimate”. It works with different types of graphs; I will apply it to the bar plot in order to visualize the variations in sensitivity indices. This would be helpful for time-varying sensitivity analysis, which is an option to prevent information loss and gain understanding of system behavior, compared to analyzing just the aggregated sensitivity indices. You can download the test-case data from here. These are first and total order sensitivity indices corresponded to annual crop yield for several parameters, categorized by different groups (such as parameters that are related to estimate crop phenology and biomass, or parameters that are related to capturing the effect of temperature or soil hydrological variables on the yield). We can create a subset of just one year of data to explore, using the general command line to make a bar plot based on different groups:
We can flip Cartesian coordinates so that the horizontal becomes vertical and the vertical becomes horizontal, by adding coord_flip(). Next, we will use the whole dataset to visualize the annual variations in total order indices. By adding transition_time() and specifying the column corresponding to the variations (in our case, “Year”), we can animate our graph. We can also add a label for each time frame in the “labs/title” section.
With the animate() function, we can specify how long we want to wait to change the frame in the animated graph, and how long we want to pause between repetitions of the time series. The result is a gif file that can be saved using the animate() function.
It is also possible to compare the three different sensitivity indices. Total order and first order indices are provided in the dataset; by subtracting the first order indices from the total order, we can calculate the total interactions each parameter has with the rest.
One of the best ways to provide data to ggplotfunctions is the format that the melt function provides. This format is in fact the format that ggplot prefers. In these cases, we can reshape the data frame (using the reshape2 library) to be able to use a single ggplot command line with the few different layout panels, or different types of graph. We can us melt() function to perform this task, and reshape the data frame to have all the indices’ variables in a row instead of various columns, with the correct corresponding information about which year, group, and parameter label the data belong to. To use the meltfunction, we need to specify columns that identify data values and we use the id.vars argument to do that. More information about the reshape2 package can be found here.
library(reshape2)
annual_S1T_melted<- melt(annual_S1T, id.vars = c("Parameters","Group","Year"), measure.vars = c("Total_Interaction", "S1_sobol","ST_sobol"))
head(annual_S1T_melted)
# Parameters Group Year variable value
#1 a1 A 1990 Total_Interaction 0.003130285
#2 a1 A 1998 Total_Interaction 0.005965492
#3 a1 A 1999 Total_Interaction 0.005563933
#4 a1 A 1991 Total_Interaction 0.007473186
#5 a1 A 1995 Total_Interaction 0.006950200
#6 a1 A 1992 Total_Interaction 0.001914845
Now, we can create the same graph that we saw above for three variables. In this case, we can use facet_grid().