Parallel axis plots for the absolute beginner

A parallel axis plot is a simple way to convey a lot of information in a meaningful and easy-to-understand way. Also known as parallel coordinate plots (PCP), it is a visualization technique used to analyze multivariate numerical data (Weitz, 2020), or in the case of multi-objective optimization, to analyze tradeoffs between multiple conflicting objectives. As someone new to the field of multi-objective optimization, I found them particularly helpful as I tried to wrap my head around the multi-dimensional aspects of this field.

There are multiple tools in Python that you can use to generate PCPs. There are several different posts by Bernardo and Jazmin that utilize the Pandas and Plotting libraries to do so. In this post, I would like to explain a little about how you can generate a decent PCP using only Numpy and Matplotlib.

For context, I used a PCP to contrast the non-dominated solutions from the entire reference set of the optimized GAA problem reference set.

For the beginner, the figure above demonstrates three important visualization techniques in generating PCPs: color, brushing, and axis ordering. Firstly, it is important to consider using colors that stand on opposite sides on the color wheel to contrast the different types of information you are presenting. Next, brushing should be used to divert the viewer’s attention away from any information deemed unnecessary, highlight vital information, or to prove a point using juxtaposition. Finally, the ordering of the axes is important, particularly when presenting conflicting objectives. It is best for all axes to be oriented in one “direction of preference”, so that the lines between each adjacent axis can represent the magnitude of the tradeoff between two objectives. Thus, the order in which these axes are placed will significantly affect the way the viewer perceives the tradeoffs, and should be well-considered.

To help with understanding the how to generate a PCP, here is a step-by-step walk-through of the process.

1. Import all necessary libraries, load data and initialize the Matplotlib figure

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import ticker

# load data
all_soln = np.loadtxt(open("GAA-reference-set.csv"), delimiter=",")
nd_indices = np.loadtxt(open("non-dominated-index.csv"), delimiter=",")

# identify and list the objectives of the reference set
objs = ['NOISE', 'WEMP', 'DOC', 'ROUGH', 'WFUEL', 'PURCH', 'RANGE', 'LDMAX', 'VCMAX', 'PFPF']

# create an array of integers ranging from 0 to the number of objectives                    
x = [i for i, _ in enumerate(objs)]

# sharey=False indicates that all the subplot y-axes will be set to different values
fig, ax  = plt.subplots(1,len(x)-1, sharey=False, figsize=(15,5))

Two sets of data are loaded:

  • all_soln: the entire reference set
  • nd_indices: an array of row indices of the non-dominated solutions from all_soln

In Line 16, we are initializing a figure fig and an array of axis objects ax. I find that having an array of axes helps me better control tick locations and labeling, since I can iterate over them in a loop.

Bear in mind that this is simply an example. It is also possible to obtain the non-dominated set directly from the the reference set by performing a Pareto sort.

2. Normalize the objective values in all_soln

min_max_range = {}

for i in range(len(objs)):
    all_soln[:,i] = np.true_divide(all_soln[:,i] - min(all_soln[:,i]), np.ptp(all_soln[:,i]))
    min_max_range[objs[i]] = [min(all_soln[:,i]), max(all_soln[:,i]), np.ptp(all_soln[:,i])]

All values in all_soln are normalized by subtracting the minimum value from each objective, then dividing it by the range of values for that objective. The min_max_range dictionary contains the minimum, maximum and range of values for each objective. This will come in handy later on.

3. Iterate through all the axes in the figure and plot each point

I used the enumerate function here. It may seem somewhat confusing at first, but it basically keeps count of your iterations as your are iterating through an object (ie: a list, an array). More information on how it works can be found here.

for i, ax_i in enumerate(ax):
    for d in range(len(all_soln)):
        if ((d in nd_indices)== False):
            if (d == 0):
                ax_i.plot(objs, all_soln[d, :], color='lightgrey', alpha=0.3, label='Dominated', linewidth=3)
            else:
                ax_i.plot(objs, all_soln[d, :], color='lightgrey', alpha=0.3, linewidth=3)
    ax_i.set_xlim([x[i], x[i+1]])

for i, ax_i in enumerate(ax):
    for d in range(len(all_soln)):
        if (d in nd_indices):
            if (d == nd_indices[0]):
                ax_i.plot(objs, all_soln[d, :], color='c', alpha=0.7, label='Nondominated', linewidth=3)
            else:
                ax_i.plot(objs, all_soln[d, :], color='c', alpha=0.7, linewidth=3)
    ax_i.set_xlim([x[i], x[i+1]])

All solutions from the non-dominated set are colored cyan, while the rest of the data is greyed-out. This is an example of brushing. Note that only the first line plotted for both sets are labeled, and that the grey-out data is plotted first. This is so the non-dominated lines are shown clearly over the brushed lines.

4. Write a function to position y-axis tick locations and labels

The set_ticks_for_axis() function is key to this process as it grants you full control over the labeling and tick positioning of your y-axes. It has three inputs:

  • dim: the index of a value from the objs array
  • ax_i: the current axis
  • ticks: the desired number of ticks
def set_ticks_for_axis(dim, ax_i, ticks):
    min_val, max_val, v_range = min_max_range[objs[dim]]
    step = v_range/float(ticks)
    tick_labels = [round(min_val + step*i, 2) for i in range(ticks)]
    norm_min = min(all_soln[:,dim])
    norm_range = np.ptp(all_soln[:,dim])
    norm_step =(norm_range/float(ticks-1))
    ticks = [round(norm_min + norm_step*i, 2) for i in range(ticks)]
    ax_i.yaxis.set_ticks(ticks)
    ax_i.set_yticklabels(tick_labels)

Hello min_max_range! This dictionary essentially makes accessing the extreme values and range of each objective easier and less mistake-prone. It is optional, but I do recommend it.

Overall, this function does two things:

  1. Creates ticks-evenly spaced tick-marks along ax_i.
  2. Labels ax_i with tick labels of size ticks. The tick labels are evenly-spaced values generated by adding step*i to min_val for each iteration i.

A nice thing about this function is that is also preserves the order that the objective values should be placed along the axis, which makes showing a direction of preference easier. It will be used to label each y-axis in our next step.

5. Iterate over and label axes

for dim, ax_i in enumerate(ax):
    ax_i.xaxis.set_major_locator(ticker.FixedLocator([dim]))
    set_ticks_for_axis(dim, ax_i, ticks=10)

FixedLocator() is a subclass of Matplotlib’s ticker class. As it’s name suggests, it fixes the tick locations and prevents changes to the tick label or location that may possibly occur during the iteration. More information about the subclass can be found here.

Here, you only need to label the x-axis with one label and one tick per iteration (hence Line 2). On the other hand, you are labeling the entire y-axis of ax_i, which is where you need to use set_ticks_for_axis().

6. Create a twin axis on the last axis in ax

ax2 = plt.twinx(ax[-1])
dim = len(ax)
ax2.xaxis.set_major_locator(ticker.FixedLocator([x[-2], x [-1]]))
set_ticks_for_axis(dim, ax2, ticks=10)
ax2.set_xticklabels([objs[-2], objs[-1]])

Creating a twin axis using plt.twinx() enables you to label the last axis with y-ticks. Line 3 and 5 ensure that the x-axis is correctly labeled with the last objective name.

7. Finally, plot the figure

plt.subplots_adjust(wspace=0, hspace=0.2, left=0.1, right=0.85, bottom=0.1, top=0.9)
ax[8].legend(bbox_to_anchor=(1.35, 1), loc='upper left', prop={'size': 14})
ax[0].set_ylabel("$\leftarrow$ Direction of preference", fontsize=12)
plt.title("PCP Example", fontsize=12)
plt.savefig("PCP_example.png")
plt.show()

Be sure to remember to label the direction of preference, and one you’ve saved your plot, you’re done!

The source code to generate the following plot can be found here. I hope this makes parallel axis plots a little more understandable and less intimidating.

References

Weitz, D. (2020, July 27). Parallel Coordinates Plots. Retrieved November 09, 2020, from https://towardsdatascience.com/parallel-coordinates-plots-6fcfa066dcb3

Keen, B.A., Parallel Coordinates in Matplotlib. (2017, May 17). Retrieved November 09, 2020, from https://benalexkeen.com/parallel-coordinates-in-matplotlib/

Setting up Python and Eclipse

According to its website, Python is:

…an easy to learn, powerful programming language. It has efficient high-level data structures and a simple but effective approach to object-oriented programming. Python’s elegant syntax and dynamic typing, together with its interpreted nature, make it an ideal language for scripting and rapid application development in many areas on most platforms.

The Python interpreter and the extensive standard library are freely available in source or binary form for all major platforms from the Python Web site,http://www.python.org/, and may be freely distributed. The same site also contains distributions of and pointers to many free third party Python modules, programs and tools, and additional documentation.

This post covers how to set up Python and the Eclipse development environment.  We also provide a collection of posts on how to use Python for data analysis, starting here.

————————————————————————————————————————————————————–

PYTHON:

————————————————————————————————————————————————————–

The first step is to download Python and its various packages that will likely be useful to you at some point.

Python itself is available at: http://www.python.org/download/

I would recommend downloading and installing version 2.7.2, the latest production release under the 2.X series.  Also, stick with the 32-bit version as most all packages will be available for this version.  Avoid Python 3.X for now.  It is not as widely supported among the various Python packages that you might find useful and as such, should be avoided for now.  Keep in mind that there are some syntax differences as well between versions 2.X and 3.X that would need to be addressed whenever it does come time to update.

Just use the default settings during installation.

NOTE: If you have Cygwin installed on your system, it too has likely installed a version of Python.  Whenever you run Python from the command line, you should be careful to ensure that you are using the version that you expect (i.e., the default Cygwin installed Python versus the one that you installed).  Just be aware of this.  In general, it is easy to identify the version being picked up from the path name.  Also, it is generally best to use the version that you have installed.  It will usually be located in C:\Python27 whereas the Cygwin version will be located in C:\Cygwin\bin.

Now, install the various packages that may be useful. You should always be careful to install a version of the package that matches your version of Python (i.e., 2.7 if you are following my instructions).  Sometimes, if a package is not available for the version you are using (i) you may still be able to use it, or (ii) you may need to make minor tweaks to the package source to get things running. Also, always download the package installers, not the source.  Here are the common ones that you should definately install:

  • NumPy and SciPy available at http://numpy.scipy.org/.  These packages are useful for performing scientific computing within Python.  Download the “win32 superpacks” for each of these packages for the version of Python that you have installed.
  • PIL – the Python Imaging Library available at http://www.pythonware.com/products/pil/.  This package is useful to manipulating image files.
  • matplotlib – a 2D plotting library with Matlab-like syntax available at http://matplotlib.sourceforge.net/.  This package is very good for creating good publication quality figures.  If you starting using it, you will probably notice that the appearance of the figures, even on-screen, is much improved over what Matlab can produce.

The following are some optional packages based on your particular needs:

  • Py2exe – a package for bundling Python scripts into MS Windows executable programs available at http://www.py2exe.org/.  This is what I use to bundle all of the libraries and source code required by AeroVis into a self contained package that can be installed on any Windows system without the need to build or install Python, VTK, Qt, etc.
  • wxPython – GUI package for Python available at http://wxpython.org/.  Note, this is for developing graphical user interfaces (GUIs) for your Python scripts, it is not a GUI for Python.
  • PyQt – another GUI package for Python available at http://www.riverbankcomputing.co.uk/software/pyqt/intro.  PyQt is a set bindings for Nokia’s Qt application framework – a very rich and full featured graphical interface development framework.  AeroVis uses PyQt for its graphical interface.

————————————————————————————————————————————————————–

ECLIPSE:

————————————————————————————————————————————————————–

Now that you have Python and all of your needed packages installed, you can now move on to Eclipse. Eclipse is available from http://www.eclipse.org/downloads/packages/release/indigo/r.  The latest release (and probably the version you should be using) is Indigo.  Since we primarily use Visual Studio for C/C++ development, I would recommend downloading the IDE for Java as this will serve to provide you with a Java environment should you choose to explore this down the road.  I think you should be able to install either the 32-bit or 64-bit versions without issue.  Just make sure you are running a 64-bit OS if you choose to install that version.  When you go to download, Penn State actually has a mirror so choose this.  BTW, don’t choose the BitTorrent option – not a good idea on PSU networks.

Once you have downloaded the zip file containing Eclipse, you just unzip it wherever you want it to be installed.  This includes portable drives etc.  The beauty of Eclipse is that unlike many Windows programs, it is completely self contained and as such, can be run from any location.  Once unzipped, create a shortcut to the Eclipse executable and start it up.

————————————————————————————————————————————————————–

PYDEV:

————————————————————————————————————————————————————–

Now that Eclipse is installed, we can add a Python development environment inside Eclipse that will provide a very nice Python IDE with debugging capabilities, etc.

The install for packages inside Eclipse proceeds a little differently than what you may be used to.

The best option for installing PyDev is probably to install Aptana Studio which includes a variety of development tools.  Go to this site for instructions http://www.aptana.com/downloads/start or read on.

1) In the Eclipse Help menu, select Install New Software
2) Paste this URL into the Work With box: http://download.aptana.com/studio3/plugin/install
3) Check the box for Aptana Studio and click Next
4) Accept the license, etc., and restart Eclipse

Another option is to only install PyDev from within Eclipse, carefully follow the instructions available at: http://pydev.org/manual_101_install.html.  There’s no need for me to rehash all of these instructions here as they are quite good at the PyDev site.

Once PyDev is installed, you should be ready to go.

————————————————————————————————————————————————————–

Let me know if you run into any problems by leaving a comment.

————————————————————————————————————————————————————–

Up Next Time…

Developing and debugging Python scripts and projects in Eclipse