Creating parallel axes plots

Creating parallel axes plots

This post will describe how to create parallel axis plots using the web based tool available at http://reed.cee.cornell.edu/parallel-axis/, as well as using the open source and widely used software GNUPlot 5.0.

Parallel axis plot is a simple and intuitive way of visualizing high dimensional data (4 or more dimensions). It allows for easy pairwise dimension correlation assessment, easy identification of properties of each individual points and, if it is the case, easy visualization of what an ideal or best solution would be.

There are several software able to generate parallel plots based on a dataset, with the most famous and widely used being Matlab. However, Matlab is expensive, which makes it not viable for a researcher or business user who does not have interest in its other features. This argument motivated the choices of tools for this post. Before starting to describe the usage of the tools, below is a comparison between their features for you to choose which one best fit your needs:

Comparison between GNUPlot and Web Cornell
Feature Web Cornell GNUPlot
Interactive Yes No
Exports figures No Yes
GUI instead of scripting Yes No
Color schemes Limited Infinite
Adjustable font/axis No Yes

Bottom line, the Web Cornell tool is more appropriate for actual data analysis and decision making but, if creating a figure for a report, GNUPlot would probably be more appropriate.

Web Cornell Tool

The first step when using the Web Cornell Tool is to choose the desired color scheme for the plot. Several options are available in the drop down menu, as in the figure below:

Fig1

After the color scheme is selected, the user should load the data file by clicking on the “Choose File” button, as in the figure below. The data file MUST be in comma separated values (csv) with linux line breaks.

Fig2

If the “Blues” color scheme was selected, the plot should appear automatically and look similar to the figure below:

Fig3

One interesting feature of the Web Cornell Tool is that of highlighting in the plot a particular solution shown in the table. This allows the user to see how a data point with a particular ID compares to the other data points. For this, the user just needs to place the cursor over the line corresponding to he solution to be highlighted, as in the figure below:

Fig12

Another interesting feature of this tool is the brushing feature. In order to display only the data points with inertia between 0.84 and 0.93, just click on the Inertia axis and drag a box that covers the desired range. The plot should now look like this:

Fig4

Multiple axes can be brushed, looking like this:

Fig5

Brushes can be removed by either pressing the “Reset Brushes” button or by right clicking on a brushed axis.

The brushing showed so far is of the type AND, which means that only points with values within the ranges defined for both brushed axes will be displayed. If the option OR is selected in the Predicate drop-down is selected, points with values within the ranges of any at least one of the brushed axes will be displayed. The outcome should look like this:

Fig7

The brushing mode can also be changed. The option used to create the previous plots is the 1D-axes mode. Try playing with the 2D-strums option by selecting it and clicking between axes for a different way of brushing your data.

Gnuplot

Gnuplot is an open source standalone ploting tool that can be used to create several types of plots, including parallel axis plots (v 5.0 and above). The source code and binaries can be download here. You should then click on the folder of version 5.0.0 or later.

Installing on Windows

If using windows, you should download the file gp500-winxx-mingw.zip, where xx is the number of bits of your processor (32 or 64). If unsure, download the 64 bits version, since most computers nowadays are 64 bits. In order to open gnuplot, unzip the file, go to the folder where it was unzipped, and open the file gnuplot/bin/wgnuplot.exe. If this does not work, try downloading the 32 bit version.

Installing on Linux

If using linux, download the source code, as opposed to an executable, and compile it. Even though Gnuplot is present in the repositories of all major linux distributions, it is still in version 4.6 to this date, which cannot create parallel plots. In order to compile the source code, download file gnuplot-5.0.0.tar.gz, unzip it in a folder hat is convenient for you, open a terminal pointing to that folder, and then type:


$ ./configure
$ sudo make
$ sudo make install

After the installation, open a terminal and type “gnuplot” in order to start it.

Setting up gnuplot and a blank script

The first step is to acknowledge that Gnuplot has no official GUI. Given that the GUIs available online are not comprehensive and do not support parallel axes plots, the only alternative is to learn a bit of scripting. The good thing is that its sintax is fairly simple and intuitive (or as intuitive as scripting can be).

Now that Gnuplot is up and running, it is time to create a parallel plot. Although one can always open Gnuplot and type commands in its terminal, it is often better to create a script and load it with gnuplot. To do this, create an empty text file named “example.txt” (the extension does not matter, as long as you create a text file) in the folder of your choice and then point gnuplot this folder with the command “cd “. Now, copy your data file to the same folder where the script is located and type the following command on the gnuplot terminal:


l 'example.txt'

Creating a script

If you did everything right, nothing should happen, since the file has no commands. This is the only thing that will be typed directly on the Gnuplot terminal. Everything from now on are commands that will be typed in your “example.txt” file.

In order to create a plot based on your data file, you will have to set the data separator of your file on Gnuplot. If your file is a csv (comma separated file), you should type the following command on you script file:


set datafile separator ','

If your datafile uses a different separator, just replace “,” by your separator. Now, if your file has, say, 7 columns but you want to plot columns 1, 2, 3, and 4 only, add the following command to your script.


plot 'myfile.csv' using 1:2:3:4 with parallel

Now, you just have to run the script again using “l ‘example.txt'” and voilà! Your first parallel plot with Gnuplot should have appeared on your screen with its purple lines that do not look all that great, but convey the information. It should look like the figure below:

gnuplot1

This is probably not the prettiest parallel plot you have seen so far, but we can work on embellishing it now.

One common way of making a parallel axes plot easier to understand is by assigning a color gradient for the line colors based on a certain data column. In order to do this, we need to tell gnuplot which column to use as the basis for the column gradient, say column 1, as well as to make the line color variable. The plot command will then look like this:


plot 'myfile.csv' using 1:2:3:4:1 with parallel linecolor variable

Notice that now that we have the option lc variable, the last column in the list of columns will indicate to gnuplot which column to use, which is why now we have five numbers as opposed to four, as before. The plot should look like this now:

gnuplot2

This simple script plots a very basic parallel axis plot to Gnuplot’s visualization terminal. In order plot your data to a figure file, like svg or png, as well as to eliminate the borders, name the axis, set up the tics, and fix other details, add the following to our script:


reset
# sets the terminal to be used and output file. In this case, I used svg.
set terminal svg size 800,480
set output 'parallel.svg'
# Sets the color palette to be used for the lines' gradint color.
set palette defined (1 "#fee6ce", 2 "#fdae6b", 3 "#e6550d")
# Tells Gnuplot this is a csv file, as opposed to tsv
set datafile separator ','
# Removes the legend, tics on the sides, and graph border.
unset key
unset colorbox
unset ytics
unset border
# Sets the axes titles ( )
set xtics ("Phosphorus" 1,"Benefit" 2,"Inertia" 3,"Reliability" 4) nomirror
# Turn on tics for all axis and set their ranges twice. 
set paxis 1 tics 
set paxis 2 tics
set paxis 3 tics
set paxis 4 tics
# Creates the plot. Let's break this down: 
# plot 'myfile.csv' - Plot file named "myfile.csv"
#    accorting to the parameters in the rest of the line.
# u 1:2:3:4:2 - 1:2:3:4 are the columns in the csv file that you
#    want to be plotted. The last :2 means that the color gradient is 
#    to be based of column 2. If this was 1:2:3:4:1, column 1 would be
#    the basis for the color gradient.
# w parallel - Means that this is a parallel plot with axes defined as 
#    with the u flag.
# lc palette - Tells Gnuplot to use the palette defined with set palette
#    for the color gradient.
# lw 2 - Defines the line weight as 2. Having it as 3 would give you a
#    thicker line.
plot 'myfile.csv' u 1:2:3:4:2 w parallel lc palette lw 2

The output should be a svg file that looks like the picture below:

parallel

From there you can add the fresh and beautiful parallel axes plot figure to your work!

Terminal basics for the truly newbies


Gaining access and having knowledge of terminal basics is the first step towards a path of geekdom.  Terminal skills will give you programming flexibility and will enable you to accomplish  tasks more efficiently in your computer, not to mention that  it is the requirement 0.0. for performing some of the exercises  found in this collaborative blog.  This post is intended for the beginner crowd, if you are a terminal-cognizant, please stop reading this post and do something productive.  In the other hand, if you are wondering what the terminal is? why do you need it?  how do you get it?  and how to use it? … this post may suit you.  Just to be clear, this is not a comprehensive introduction on terminal use, I’m just trying to give you brief background and point you in the right direction.

What is the Terminal ? 

In simple terms, the terminal is an interface that allows you to type and execute commands rapidly on your computer.

Why do you need it?

The terminal does wonders for a wide range of tasks. The first example that comes to mind is data management; the terminal enables you to create, move, remove and edit files and folders efficiently using a single line, you practically will never want to use a GUI for this purpose again.   As you advance in your programming skills, you’ll find yourself compiling and running programs through the terminal, and doing more cool things,  such as working remotely (check out The Cluster and Basic Unix Commands ).

How to get it?

If you are a Windows user, you need to install a Unix-like shell such as Cygwin.  This platform takes the Windows Command Prompt (which is unpleasant and less useful) and converts it into a UNIX terminal (which is neat and powerful).

For Windows users:  Download the latest version of Cygwin from the following link:  https://cygwin.com/install.html, then click on the “setup*.exe” link.  If you have trouble during the installation process, you can find step by step guides (including screenshots) in the following links:

http://www.techomech.com/install-cygwin-on-windows-7/

http://wiki.rootzwiki.com/Step_by_step_guide_how_to_install_cygwin

For Mac users:  You don’t need to install a thing!  You just need to type “command + spacebar” and a search bar will pop up, just type” terminal + enter” and voilà,  a terminal will open.

Own the terminal! 

This last part may require some time and patience; however, there are plenty of available resources to help you become a terminal expert or at least become terminal conversant.    I found the following links especially useful:

This first link gives a quick course on terminal use: http://cli.learncodethehardway.org/book/

I also like this life-hacker link; it provides a list of common commands by topic:  http://lifehacker.com/5633909/who-needs-a-mouse-learn-to-use-the-command-line-for-almost-anything

Common terminal nicknames

When looking for more documentation, do not get confused by the lingo: terminal, console, shell, command line and command prompt, are the same thing.

Setting Borg parameters from the Matlab wrapper

In previous posts we talked about how to compile and run the Borg Matlab wrapper on Windows and OSX/Linux. To run the optimization after everything is compiled, the function call looks like this:

[vars, objs] = borg(11, 2, 0, @DTLZ2, 1000, zeros(1,11), ones(1,11), 0.01*ones(1,2), parameters);

In order, the arguments are:

- number of decision variables
- number of objectives
- number of constraints
- handle to function to be optimized
- number of function evaluations to perform
- decision lower bounds
- decision upper bounds
- epsilon values
- parameters

The last argument parameters is the topic of this post. This is a cell array that allows you to set the parameters without recompiling anything. For example, to set the random seed, you might do this:

parameters = {'rngstate', 12345, 'initialPopulationSize', 200, 'de.stepSize', 0.3}

before running the borg() function with parameters as the last argument. Notice this follows the Matlab cell array options format of {key, value, key, value ...}, so there must be an even number of items in the cell array.

Using the nativeborg.cpp file I tracked down all of the parameters (string keys) that the user can choose:

'rngstate': random seed
'initialPopulationSize'
'minimumPopulationSize'
'maximumPopulationSize'
'injectionRate'
'selectionRatio'
'maxMutationIndex'
'frequency': (for saving approximation sets during runtime)

// these two are enumerated types in the C code, not advised to change them
'restartMode'
'probabilityMode'

// operator-specific parameters
'pm.rate'
'pm.distributionIndex'
'sbx.rate'
'sbx.distributionIndex'
'de.crossoverRate'
'de.stepSize'
'um.rate'
'spx.parents'
'spx.epsilon'
'pcx.parents'
'pcx.eta'
'pcx.zeta'
'undx.parents'
'undx.zeta'
'undx.eta'

That’s all, thanks to Liz Houle at Boulder for the suggestion.

latexdiff: “track changes” for LaTeX

If you’re looking to display changes to a LaTeX document in different colors, say for a paper or thesis revision, you might be interested in latexdiff. This is a command-line tool that most likely came packaged with your LaTeX distribution—try running latexdiff --version in your terminal to see if you have it.

The idea is simple, just run:

latexdiff original.tex revised.tex > diff.tex

Then compile diff.tex into a PDF just like you would any LaTeX file. The result should look something like this:

This is a great tool! It isn’t completely foolproof, and you will sometimes encounter errors when compiling diff.tex, especially if you’ve changed citations between the original and revised versions. These can be resolved manually by editing diff.tex, just keep track of the line numbers where the errors are occurring.

A more detailed summary of options for latexdiff is available here:
https://www.sharelatex.com/blog/2013/02/16/using-latexdiff-for-marking-changes-to-tex-documents.html

Thanks for reading!