Creating parallel axes plots

This post will describe how to create parallel axis plots using the web based tool available at http://reed.cee.cornell.edu/parallel-axis/, as well as using the open source and widely used software GNUPlot 5.0.

Parallel axis plot is a simple and intuitive way of visualizing high dimensional data (4 or more dimensions). It allows for easy pairwise dimension correlation assessment, easy identification of properties of each individual points and, if it is the case, easy visualization of what an ideal or best solution would be.

There are several software able to generate parallel plots based on a dataset, with the most famous and widely used being Matlab. However, Matlab is expensive, which makes it not viable for a researcher or business user who does not have interest in its other features. This argument motivated the choices of tools for this post. Before starting to describe the usage of the tools, below is a comparison between their features for you to choose which one best fit your needs:

Comparison between GNUPlot and Web Cornell
Feature Web Cornell GNUPlot
Interactive Yes No
Exports figures No Yes
GUI instead of scripting Yes No
Color schemes Limited Infinite
Adjustable font/axis No Yes

Bottom line, the Web Cornell tool is more appropriate for actual data analysis and decision making but, if creating a figure for a report, GNUPlot would probably be more appropriate.

Web Cornell Tool

The first step when using the Web Cornell Tool is to choose the desired color scheme for the plot. Several options are available in the drop down menu, as in the figure below:

Fig1

After the color scheme is selected, the user should load the data file by clicking on the “Choose File” button, as in the figure below. The data file MUST be in comma separated values (csv) with linux line breaks.

Fig2

If the “Blues” color scheme was selected, the plot should appear automatically and look similar to the figure below:

Fig3

One interesting feature of the Web Cornell Tool is that of highlighting in the plot a particular solution shown in the table. This allows the user to see how a data point with a particular ID compares to the other data points. For this, the user just needs to place the cursor over the line corresponding to he solution to be highlighted, as in the figure below:

Fig12

Another interesting feature of this tool is the brushing feature. In order to display only the data points with inertia between 0.84 and 0.93, just click on the Inertia axis and drag a box that covers the desired range. The plot should now look like this:

Fig4

Multiple axes can be brushed, looking like this:

Fig5

Brushes can be removed by either pressing the “Reset Brushes” button or by right clicking on a brushed axis.

The brushing showed so far is of the type AND, which means that only points with values within the ranges defined for both brushed axes will be displayed. If the option OR is selected in the Predicate drop-down is selected, points with values within the ranges of any at least one of the brushed axes will be displayed. The outcome should look like this:

Fig7

The brushing mode can also be changed. The option used to create the previous plots is the 1D-axes mode. Try playing with the 2D-strums option by selecting it and clicking between axes for a different way of brushing your data.

Gnuplot

Gnuplot is an open source standalone ploting tool that can be used to create several types of plots, including parallel axis plots (v 5.0 and above). The source code and binaries can be download here. You should then click on the folder of version 5.0.0 or later.

Installing on Windows

If using windows, you should download the file gp500-winxx-mingw.zip, where xx is the number of bits of your processor (32 or 64). If unsure, download the 64 bits version, since most computers nowadays are 64 bits. In order to open gnuplot, unzip the file, go to the folder where it was unzipped, and open the file gnuplot/bin/wgnuplot.exe. If this does not work, try downloading the 32 bit version.

Installing on Linux

If using linux, download the source code, as opposed to an executable, and compile it. Even though Gnuplot is present in the repositories of all major linux distributions, it is still in version 4.6 to this date, which cannot create parallel plots. In order to compile the source code, download file gnuplot-5.0.0.tar.gz, unzip it in a folder hat is convenient for you, open a terminal pointing to that folder, and then type:


$ ./configure
$ sudo make
$ sudo make install

After the installation, open a terminal and type “gnuplot” in order to start it.

Setting up gnuplot and a blank script

The first step is to acknowledge that Gnuplot has no official GUI. Given that the GUIs available online are not comprehensive and do not support parallel axes plots, the only alternative is to learn a bit of scripting. The good thing is that its sintax is fairly simple and intuitive (or as intuitive as scripting can be).

Now that Gnuplot is up and running, it is time to create a parallel plot. Although one can always open Gnuplot and type commands in its terminal, it is often better to create a script and load it with gnuplot. To do this, create an empty text file named “example.txt” (the extension does not matter, as long as you create a text file) in the folder of your choice and then point gnuplot this folder with the command “cd “. Now, copy your data file to the same folder where the script is located and type the following command on the gnuplot terminal:


l 'example.txt'

Creating a script

If you did everything right, nothing should happen, since the file has no commands. This is the only thing that will be typed directly on the Gnuplot terminal. Everything from now on are commands that will be typed in your “example.txt” file.

In order to create a plot based on your data file, you will have to set the data separator of your file on Gnuplot. If your file is a csv (comma separated file), you should type the following command on you script file:


set datafile separator ','

If your datafile uses a different separator, just replace “,” by your separator. Now, if your file has, say, 7 columns but you want to plot columns 1, 2, 3, and 4 only, add the following command to your script.


plot 'myfile.csv' using 1:2:3:4 with parallel

Now, you just have to run the script again using “l ‘example.txt'” and voilà! Your first parallel plot with Gnuplot should have appeared on your screen with its purple lines that do not look all that great, but convey the information. It should look like the figure below:

gnuplot1

This is probably not the prettiest parallel plot you have seen so far, but we can work on embellishing it now.

One common way of making a parallel axes plot easier to understand is by assigning a color gradient for the line colors based on a certain data column. In order to do this, we need to tell gnuplot which column to use as the basis for the column gradient, say column 1, as well as to make the line color variable. The plot command will then look like this:


plot 'myfile.csv' using 1:2:3:4:1 with parallel linecolor variable

Notice that now that we have the option lc variable, the last column in the list of columns will indicate to gnuplot which column to use, which is why now we have five numbers as opposed to four, as before. The plot should look like this now:

gnuplot2

This simple script plots a very basic parallel axis plot to Gnuplot’s visualization terminal. In order plot your data to a figure file, like svg or png, as well as to eliminate the borders, name the axis, set up the tics, and fix other details, add the following to our script:


reset
# sets the terminal to be used and output file. In this case, I used svg.
set terminal svg size 800,480
set output 'parallel.svg'
# Sets the color palette to be used for the lines' gradint color.
set palette defined (1 "#fee6ce", 2 "#fdae6b", 3 "#e6550d")
# Tells Gnuplot this is a csv file, as opposed to tsv
set datafile separator ','
# Removes the legend, tics on the sides, and graph border.
unset key
unset colorbox
unset ytics
unset border
# Sets the axes titles ( )
set xtics ("Phosphorus" 1,"Benefit" 2,"Inertia" 3,"Reliability" 4) nomirror
# Turn on tics for all axis and set their ranges twice. 
set paxis 1 tics 
set paxis 2 tics
set paxis 3 tics
set paxis 4 tics
# Creates the plot. Let's break this down: 
# plot 'myfile.csv' - Plot file named "myfile.csv"
#    accorting to the parameters in the rest of the line.
# u 1:2:3:4:2 - 1:2:3:4 are the columns in the csv file that you
#    want to be plotted. The last :2 means that the color gradient is 
#    to be based of column 2. If this was 1:2:3:4:1, column 1 would be
#    the basis for the color gradient.
# w parallel - Means that this is a parallel plot with axes defined as 
#    with the u flag.
# lc palette - Tells Gnuplot to use the palette defined with set palette
#    for the color gradient.
# lw 2 - Defines the line weight as 2. Having it as 3 would give you a
#    thicker line.
plot 'myfile.csv' u 1:2:3:4:2 w parallel lc palette lw 2

The output should be a svg file that looks like the picture below:

parallel

From there you can add the fresh and beautiful parallel axes plot figure to your work!

Advertisements

5 thoughts on “Creating parallel axes plots

  1. I have a Python script that may be of interest as well: https://github.com/matthewjwoodruff/parallel.py. It has many of the same strengths and weaknesses as gnuplot, in that it’s a command-line tool. In addition, you have the options to use transparency for the lines, add glyph markers, and do a mixed (or fully rasterized, or fully vectorized) export. The mixed export renders the axes to SVG separately from the lines, which are rasterized, so that you can compose them for publication. This is useful for parallel coordinate plots that have so much data in them that they produce SVGs of overwhelming size. You get nice vector shapes for the axes without breaking your image editor.

  2. Sorry about the delay, I just now received an e-mail notification of your question.

    To my knowledge, you can only add a key if you create a plot from multiple data files. Besides splitting the data into multiple files, you would need to either remove the ‘unset key’ command or add ‘set key [options]’ if you want to get a custom key (e.g. set key out horiz cent top box opaque ‘). The script would would look like something like this:


    ...
    set key out horiz cent top box opaque
    ...
    ...
    plot 'durham.csv' u ($2/100):($3/100):($1/100):($4/100) w parallel lw 2 title 'Durham',\
    'raleigh.csv' u ($2/100):($3/100):($1/100):($4/100) w parallel lw 2 title 'Raleigh',\
    'cary.csv' u ($2/100):($3/100):($1/100):($4/100) w parallel lw 2 title 'Cary',\
    'owasa.csv' u ($2/100):($3/100):($1/100):($4/100) w parallel lw 2 title 'OWASA',\
    'axis_aux.csv' u ($2/100):($3/100):($1/100):($4/100) w parallel lw 1 lc rgb 'white' notitle

    In this example, the data of each file would be plotted in a different color and the name displayed in the key would correspond to the title assigned to each each file (title ‘Durham’, title ‘OWASA’, etc.). If you want the data in one of the files not to show up in the key, just assign ‘notitle’ instead of ‘title xxx’, as in the last line of the plot command.

    Hope this is clear enough!

  3. Pingback: A beginner’s guide to narrating the multiobjective tradeoff story – Water Programming: A Collaborative Research Blog

  4. Pingback: Water Programming Blog Guide (3) – Water Programming: A Collaborative Research Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s