Parallel axis plots for the absolute beginner

A parallel axis plot is a simple way to convey a lot of information in a meaningful and easy-to-understand way. Also known as parallel coordinate plots (PCP), it is a visualization technique used to analyze multivariate numerical data (Weitz, 2020), or in the case of multi-objective optimization, to analyze tradeoffs between multiple conflicting objectives. As someone new to the field of multi-objective optimization, I found them particularly helpful as I tried to wrap my head around the multi-dimensional aspects of this field.

There are multiple tools in Python that you can use to generate PCPs. There are several different posts by Bernardo and Jazmin that utilize the Pandas and Plotting libraries to do so. In this post, I would like to explain a little about how you can generate a decent PCP using only Numpy and Matplotlib.

For context, I used a PCP to contrast the non-dominated solutions from the entire reference set of the optimized GAA problem reference set.

For the beginner, the figure above demonstrates three important visualization techniques in generating PCPs: color, brushing, and axis ordering. Firstly, it is important to consider using colors that stand on opposite sides on the color wheel to contrast the different types of information you are presenting. Next, brushing should be used to divert the viewer’s attention away from any information deemed unnecessary, highlight vital information, or to prove a point using juxtaposition. Finally, the ordering of the axes is important, particularly when presenting conflicting objectives. It is best for all axes to be oriented in one “direction of preference”, so that the lines between each adjacent axis can represent the magnitude of the tradeoff between two objectives. Thus, the order in which these axes are placed will significantly affect the way the viewer perceives the tradeoffs, and should be well-considered.

To help with understanding the how to generate a PCP, here is a step-by-step walk-through of the process.

1. Import all necessary libraries, load data and initialize the Matplotlib figure

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import ticker

# load data
all_soln = np.loadtxt(open("GAA-reference-set.csv"), delimiter=",")
nd_indices = np.loadtxt(open("non-dominated-index.csv"), delimiter=",")

# identify and list the objectives of the reference set
objs = ['NOISE', 'WEMP', 'DOC', 'ROUGH', 'WFUEL', 'PURCH', 'RANGE', 'LDMAX', 'VCMAX', 'PFPF']

# create an array of integers ranging from 0 to the number of objectives                    
x = [i for i, _ in enumerate(objs)]

# sharey=False indicates that all the subplot y-axes will be set to different values
fig, ax  = plt.subplots(1,len(x)-1, sharey=False, figsize=(15,5))

Two sets of data are loaded:

  • all_soln: the entire reference set
  • nd_indices: an array of row indices of the non-dominated solutions from all_soln

In Line 16, we are initializing a figure fig and an array of axis objects ax. I find that having an array of axes helps me better control tick locations and labeling, since I can iterate over them in a loop.

Bear in mind that this is simply an example. It is also possible to obtain the non-dominated set directly from the the reference set by performing a Pareto sort.

2. Normalize the objective values in all_soln

min_max_range = {}

for i in range(len(objs)):
    all_soln[:,i] = np.true_divide(all_soln[:,i] - min(all_soln[:,i]), np.ptp(all_soln[:,i]))
    min_max_range[objs[i]] = [min(all_soln[:,i]), max(all_soln[:,i]), np.ptp(all_soln[:,i])]

All values in all_soln are normalized by subtracting the minimum value from each objective, then dividing it by the range of values for that objective. The min_max_range dictionary contains the minimum, maximum and range of values for each objective. This will come in handy later on.

3. Iterate through all the axes in the figure and plot each point

I used the enumerate function here. It may seem somewhat confusing at first, but it basically keeps count of your iterations as your are iterating through an object (ie: a list, an array). More information on how it works can be found here.

for i, ax_i in enumerate(ax):
    for d in range(len(all_soln)):
        if ((d in nd_indices)== False):
            if (d == 0):
                ax_i.plot(objs, all_soln[d, :], color='lightgrey', alpha=0.3, label='Dominated', linewidth=3)
            else:
                ax_i.plot(objs, all_soln[d, :], color='lightgrey', alpha=0.3, linewidth=3)
    ax_i.set_xlim([x[i], x[i+1]])

for i, ax_i in enumerate(ax):
    for d in range(len(all_soln)):
        if (d in nd_indices):
            if (d == nd_indices[0]):
                ax_i.plot(objs, all_soln[d, :], color='c', alpha=0.7, label='Nondominated', linewidth=3)
            else:
                ax_i.plot(objs, all_soln[d, :], color='c', alpha=0.7, linewidth=3)
    ax_i.set_xlim([x[i], x[i+1]])

All solutions from the non-dominated set are colored cyan, while the rest of the data is greyed-out. This is an example of brushing. Note that only the first line plotted for both sets are labeled, and that the grey-out data is plotted first. This is so the non-dominated lines are shown clearly over the brushed lines.

4. Write a function to position y-axis tick locations and labels

The set_ticks_for_axis() function is key to this process as it grants you full control over the labeling and tick positioning of your y-axes. It has three inputs:

  • dim: the index of a value from the objs array
  • ax_i: the current axis
  • ticks: the desired number of ticks
def set_ticks_for_axis(dim, ax_i, ticks):
    min_val, max_val, v_range = min_max_range[objs[dim]]
    step = v_range/float(ticks)
    tick_labels = [round(min_val + step*i, 2) for i in range(ticks)]
    norm_min = min(all_soln[:,dim])
    norm_range = np.ptp(all_soln[:,dim])
    norm_step =(norm_range/float(ticks-1))
    ticks = [round(norm_min + norm_step*i, 2) for i in range(ticks)]
    ax_i.yaxis.set_ticks(ticks)
    ax_i.set_yticklabels(tick_labels)

Hello min_max_range! This dictionary essentially makes accessing the extreme values and range of each objective easier and less mistake-prone. It is optional, but I do recommend it.

Overall, this function does two things:

  1. Creates ticks-evenly spaced tick-marks along ax_i.
  2. Labels ax_i with tick labels of size ticks. The tick labels are evenly-spaced values generated by adding step*i to min_val for each iteration i.

A nice thing about this function is that is also preserves the order that the objective values should be placed along the axis, which makes showing a direction of preference easier. It will be used to label each y-axis in our next step.

5. Iterate over and label axes

for dim, ax_i in enumerate(ax):
    ax_i.xaxis.set_major_locator(ticker.FixedLocator([dim]))
    set_ticks_for_axis(dim, ax_i, ticks=10)

FixedLocator() is a subclass of Matplotlib’s ticker class. As it’s name suggests, it fixes the tick locations and prevents changes to the tick label or location that may possibly occur during the iteration. More information about the subclass can be found here.

Here, you only need to label the x-axis with one label and one tick per iteration (hence Line 2). On the other hand, you are labeling the entire y-axis of ax_i, which is where you need to use set_ticks_for_axis().

6. Create a twin axis on the last axis in ax

ax2 = plt.twinx(ax[-1])
dim = len(ax)
ax2.xaxis.set_major_locator(ticker.FixedLocator([x[-2], x [-1]]))
set_ticks_for_axis(dim, ax2, ticks=10)
ax2.set_xticklabels([objs[-2], objs[-1]])

Creating a twin axis using plt.twinx() enables you to label the last axis with y-ticks. Line 3 and 5 ensure that the x-axis is correctly labeled with the last objective name.

7. Finally, plot the figure

plt.subplots_adjust(wspace=0, hspace=0.2, left=0.1, right=0.85, bottom=0.1, top=0.9)
ax[8].legend(bbox_to_anchor=(1.35, 1), loc='upper left', prop={'size': 14})
ax[0].set_ylabel("$\leftarrow$ Direction of preference", fontsize=12)
plt.title("PCP Example", fontsize=12)
plt.savefig("PCP_example.png")
plt.show()

Be sure to remember to label the direction of preference, and one you’ve saved your plot, you’re done!

The source code to generate the following plot can be found here. I hope this makes parallel axis plots a little more understandable and less intimidating.

References

Weitz, D. (2020, July 27). Parallel Coordinates Plots. Retrieved November 09, 2020, from https://towardsdatascience.com/parallel-coordinates-plots-6fcfa066dcb3

Keen, B.A., Parallel Coordinates in Matplotlib. (2017, May 17). Retrieved November 09, 2020, from https://benalexkeen.com/parallel-coordinates-in-matplotlib/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s