Getting started using Git and GitHub can be overwhelming. The intent of this post is to provide basic background information and easy-to-follow instruction for a new user of Git and GitHub. After reading this post, I recommend reading Jon Herman’s Intro to git: Part 1 and Part 2 posts for additional information, including greater detail on important commands. Joe Kasprzyk’s post on GitHub Pages is also helpful.

What are Git and Github?

Git is an open source VCS (Version Control System). What does that mean? Essentially, it is a tool for managing and sharing file revisions. It may be utilized for code as well as other file types, such as Microsoft Word documents. Version control is important in group programming collaboration, so you definitely want to “git” on Git. Git is a distributed VCS, which allows you to push (share) and pull (acquire) version changes from a remote shared copy. Thus, you may work on your own changes of a shared code locally with options of sending revisions to the remote master copy and incorporating collaborators’ changes from the remote copy into your local copy. Although Git is particularly useful for code collaboration, it is also beneficial for individual use to reduce headaches from losing changes or breaking code. To learn more about Git and how it differs from other VCSs, please see the Getting Started – Git Basics section of the Git Reference Book. .

So, what is GitHub? GitHub hosts Git repositories (essentially project folders) and offers additional collaboration features. BitBucket is another example of a Git host. Since GitHub is public (private repositories are not free), it allows users to see how your code is evolving over time and offer input – this is the real power of Git / GitHub.

In order to utilize GitHub, you must first download Git and then set-up GitHub. Both Git and GitHub are operated through using the command line interface as the shell, which is a mechanism for the user to communicate with the operating system through a series of commands rather than by point-and-click. However, if you are uncomfortable using the command line, there are GUIs available for both Git and GitHub.

Basic Terminology

There is quite a bit of lingo that you will want to get a handle on before continuing onward. Below, I have provided a boiled down list of terms you need to know to get started.

Repository (or Repo): Location or “folder” for a project’s files and revision history

Fork: Copy of (or to copy) another user’s repo for you to use and/or edit without affecting the original repo

Clone: Copy of (or to copy) a repo on your local machine rather than on a server

Remote: Copy of a repo on a server that can be updated through syncing with local clones

Master Branch: Primary version of a repo

Branch: Parallel version of a repo that allows you to make changes without affecting the master version

Upstream / Downstream: Upstream refers to previous versions or primary branches and downstream refers to changes on forks or branches you are working on.

Merge: Applying the changes from one branch to another

Commit: Change (or revision) made to a repo. Be sure to write a clear commit message when “saving” or making the commit so that the next user understands the changes.

Pull: Taking changes from a remote repo and merging them with your local branch

Pull Request: Method to submit changes to a remote repo

Push: Sending updates to a remote repo

Owner: Original creator of a repo

Collaborator: One that is invited to contribute to a repo by the owner

Contributor: One that has contributed to a repo without collaborator access

Steps to Get Started

Follow the outlined steps below to get up-and-running on Git / GitHub. Please provide comments if any steps are unclear.

1. Create a GitHub login

Go to https://github.com/ then pick a username, type in your e-mail address, and create a password to Sign Up for GitHub. Make sure that you use this same e-mail address to set up your identity on Git in Step 3.

2. Install GIT

Visit http://git-scm.com/downloads and select the download that is right for your system.

For Windows Installation:

Leave the default components
Opt to use “Git from Git Bash only” to prevent changes to your PATH

3. Set-Up GIT

After the download is complete, open the Git bash (Windows) or the terminal (Mac / Linux). Bash is a UNIX shell – this means that you need to use Linux commands instead of Windows commands used typically on the command line interface.

First, you want to make a few configuration changes to set up your identity so that your commits are labeled. Since you will be using GitHub, no other setup is required for Git.


$ git config --global user.name "Your Name in Quotes"

$ git config --global user.email "Your E-mail in Quotes"

Second, you want to authenticate with GitHub from Git, which means that you will select a communications protocol, HTTPS or SSH, that will allow you to connect to a GitHub repo from Git. Based on your choice, GitHub has very clear instructions on set-up found at https://help.github.com/articles/set-up-git.

4. Download GitHub Desktop Client

If you would like to limit time using the command line, you will want to download the GitHub desktop client (for Windows or for Mac). This is especially helpful if you want to clone with SSH because the desktop client will configure SSH keys for you without use of the command line.

What’s Next?

You are all set to start using Git / GitHub to collaborate on code. You will want to practice creating a repo, forking a repo, making a commit, etc – follow John Herman’s posts, Intro to git: Part 1 and Part 2.

Some other helpful resources include:

Git Reference Manual, Book, and Videos

GitHub Help – Bootcamp

How the Heck Do I Use GitHub? – Lifehacker (Adam Dachis)

September 25, 2014 by Jon Herman

Introduction to mpi4py

If you like parallel computing, and you like Python, chances are you ‘ll like mpi4py. It’s a Python package that provides MPI bindings, which is very helpful if you’re trying to pass arrays or dictionaries between processors without worrying about the details as you would in a lower level language. Installation can be tricky, but we already have mpi4py installed on the Cube cluster for those of you who have accounts.

Getting started is easy:

from mpi4py import MPI
comm = MPI.COMM_WORLD
print "Hello from rank %d out of %d !" % (comm.rank, comm.size)
comm.Barrier() # wait to sync here (not needed for this example)

Then in your submission script:

#!/bin/bash
#PBS -l walltime=1:00:00
#PBS -l nodes=4:ppn=16
#PBS -j oe

cd $PBS_O_WORKDIR
mpirun python myfile.py

And the output:

...
Hello from rank 12 out of 64 !
Hello from rank 14 out of 64 !
Hello from rank 13 out of 64 !
Hello from rank 51 out of 64 !
Hello from rank 52 out of 64 !
...

Let’s try something more interesting. In this example we’ll do a set of parallel runs of a very simple linear rainfall-runoff model (one bucket, where dS/dt = -kS). The only free parameter is k, so we’ll sample a range of values between (0,1) based on the rank of each node.

(Aside: this will be an example of a model built on top of stockflow, which is a Python wrapper for solving system dynamics ODEs. You can read more about the linear reservoir example in this notebook.)

Preamble:

from __future__ import division
from stockflow import simulation
import numpy as np
from mpi4py import MPI

comm = MPI.COMM_WORLD

Set up the model, with only one state variable (or “stock”):

# Model setup - linear reservoir
tmin = 0
tmax = 365
dt = 1
t = np.arange(tmin,tmax,dt)

data = np.loadtxt('leaf-river-data.txt', skiprows=2)
data_P = data[tmin:tmax,0]
data_PET = data[tmin:tmax,1]
data_Q = data[tmin:tmax,2]

s = simulation(t)
s.stocks({'S': 0})

Based on the processor’s rank, assign a value of the parameter k:

k = (comm.rank+1)/comm.size

Define flows, run the model, and calculate RMSE:

# Flows: precip, ET, and streamflow
s.flow('P', start=None, end='S', f=lambda t: data_P[t])
s.flow('ET', start='S', end=None, f=lambda t: min(data_PET[t], s.S))
s.flow('Q', start='S', end=None, f=lambda t: k*s.S)
s.run()

RMSE = np.sqrt(np.mean((s.Q-data_Q)**2))
comm.Barrier()

At this point we have completed model results sitting on every processor, but we’d like to collect them all on the root node to do some analysis, for example to find the best RMSE value. We can do this with MPI’s gather operation. There is a great beginner tutorial here describing the basic MPI operations.

Qs = comm.gather(s.Q, root=0)
RMSEs = comm.gather(RMSE, root=0)
ks = comm.gather(k, root=0)

We’ve now collected all of the results onto the root node (0). Let’s find the best and worst values of k and print them.

if comm.rank==0:
  best = np.argmin(RMSEs)
  worst = np.argmax(RMSEs)
  print "best k = %f" % ks[best]
  print "best rmse = %f" % RMSEs[best]
  print "worst k = %f" % ks[worst]
  print "worst rmse = %f" % RMSEs[worst]

Pretty simple! If we run this with the following job script:

#!/bin/bash
#PBS -l walltime=1:00:00
#PBS -l nodes=16:ppn=16
#PBS -j oe

cd $PBS_O_WORKDIR

mpirun python linear_reservoir.py

… it runs on 256 processors almost instantly, and we see this output:

best k = 0.183594
best rmse = 2.439382
worst k = 1.000000
worst rmse = 4.470532

Then we can come back locally and plot the best and worst hydrographs to see what they look like:

This looks a little messy, but you can see that the “best” model run matches the observed data much more closely than the “worst”, which overshoots the peaks considerably.

That’s all for now. Happy HPCing!

September 18, 2014 by Jon Herman

Scientific figures in Illustrator

Once you have a figure sequence almost ready to publish in a journal, it’s time to make your figures look good. This may sound like a vain exercise, but if you consider that the journal article will be around for a long time, it’s worth it. Publication-quality scientific figures are typically in vector format (SVG, PDF, or EPS files) rather than raster format (JPEG, PNG, TIFF, and others) which may become pixelated if resized. You can read this discussion of raster vs. vector formats if you’re interested, or just take my word for it that vector is what you want.

Programs capable of editing vector graphics include Adobe Illustrator (paid) and Inkscape (free). This post will be about Illustrator, but keep in mind that Inkscape offers much of the same functionality. You can not edit vector images with photo editing programs like Photoshop or GIMP, nor can you make publishable vector images with MS Office programs like Excel or Powerpoint. This post will take a figure created in Matlab and clean it up into something that will look nice in a publication using Illustrator.

If you run the following Matlab code …

x = 0:0.1:5;

y(1,:) = exp(-1*x);
y(2,:) = exp(-0.5*x);
y(3,:) = exp(-1.5*x);

h = plot(x,y);
legend(h, {'Thing 1', 'Thing 2', 'Thing 3'});
grid on;
xlabel('Time (hours)');
ylabel('Amount of thing');
title('My plot title');

… it will give you this figure:

Default figure style. Bleh.

You’ve seen this before: the text is too small, the grid lines are noisy, the colors are uninspiring, the plot lines are too thin … and so on. Some of those issues you can fix from inside Matlab, which is a good idea if you plan to regenerate your plot several times. (An example of doing this is available here in the Matlab Plot Gallery). But here we’re going to do all the work in Illustrator instead (which as you may have gathered is not a very “repeatable” process, so only do it once you agree with coauthors on the general layout of the figure). If you’re following along at home, save your Matlab figure in .eps format. Unfortunately Matlab does not have an option to save in SVG format as of this post, which is typically the option of choice in Python or R.

Go find the .eps figure you just created, and open it with Illustrator. If your file extensions aren’t associated with Illustrator, you may need to right-click and select Open With / Adobe Illustrator. When opening the file, you may see some warnings about missing fonts, just click OK and proceed. Note that we are not “inserting” the file into an existing Illustrator document—we are opening the file itself. You should be greeted by our lovely figure, inside of what may be an overwhelming interface the first time you see it. Let’s break down the toolbar on the left-hand side to start with, and we’ll cover other options as we go.

These annotations were professionally done.

This is only the top of the toolbar, but you’ll be using these “tools” (cursor types) probably 95% of the time. The select cursor is the default, and when in doubt you should return to it. The text, line, and shape options are for adding new things to your drawing, which is not so different from, say, Powerpoint. Stay on “Select” for now. Let’s get cleaning.

Step 0: Ungroup the main parts of the figure

When Matlab exports EPS files, it tends to group different parts of the figure together. This can be somewhat unpredictable and makes editing difficult. You’ll notice if you left-click any part of the figure, it will select everything. So, one of the first things you’ll want to do when editing almost any figure is to “ungroup”, which you can do by right-clicking any part of the figure and selecting the ungroup option:

How to ungroup the elements of the figure (everything is grouped by default).

Now you should be able to select individual parts of the figure. Matlab may have saved a big white rectangle in the background of your figure (you can see it selected around the border of the above image), which you should feel free to delete. Illustrator’s white “Artboard” will serve as our background, and you can edit the artboard size by pressing Shift+O.

Step 1: Replace all text

This may sound harsh, but hear me out. What we really want to do is just resize the text, or maybe change the font type, nothing major. But what usually happens here is that the EPS file splits text objects into separate boxes, especially when a decimal point is involved. Which means if you resize, you get the following:

The text elements are split into two, so resizing won’t work. Need to start from scratch.

This is not an indictment of Matlab specifically—in fact, when Matplotlib (Python) exports SVGs, it usually saves text as vector paths, which aren’t fonts at all! Rather than trying to repair this strangeness, just replace all of the text yourself. Choose a clean sans-serif font, unless you want to use Computer Modern serif for equations. I like Gill Sans for figures (Gill Sans MT on Windows), but your mileage may vary. My rule of thumb for text sizing is: Title > Axis labels = Legend > Tick labels, something like this:

Same figure with fonts and text sizes fixed.

That’s already so much better, just by improving the text readability. It looks like we actually spent some time on it, instead of just copypasting the raw Matlab output. You can insert text using the text cursor (shown on the toolbar figure above), and edit its properties from the Window/Type/Character window, pictured below, which you’ll usually want to keep visible.

The Type/Character window is where you change font types, sizes, and tracking.

A quick digression: the other window you’ll usually want to keep open is the Align window, which looks like this:

The align window is a must for lining up multiple elements. Don’t try to do it by hand.

Those little button symbols are actually pretty intuitive. If you select two or more objects, then press one of these “alignment” buttons, it will either left-, center-, or right-align them horizontally (the left three buttons) or vertically (the right three buttons). This comes in handy when you’re trying to line up tick labels that you’ve edited manually, or combining multiple EPS files as subplots in a larger figure file and you want to make sure they’re aligned properly. Even if you don’t learn how to do this right now, just remember that alignment shouldn’t be done manually.

Ok, digression over. When resizing text, pay particular attention to the aspect ratio (the width:height ratio of the object). Never ever resize these out of proportion. Either use the text size in the character window, shown above (recommended), or Shift+Drag when you resize, which will preserve the aspect ratio.

If you take one thing away from this post …

Step 2: Make grid lines solid and lighter

Right now, the grid lines are noisy and distracting—let’s clean them up. Fortunately the grid lines should all be grouped together, so we don’t need to do any fancy selection to get them all selected at the same time. Just left-click on any one of the grid lines and you should see them all highlighted. Then, on the right-hand toolbar, go to the “Stroke” options and deselect the “Dashed Line” checkbox:

Changing line weight in Illustrator.

(I’m not doing the snapshots in parallel with the actual saved figure versions, so the font formatting isn’t shown in this snapshot). This should turn the grid lines into solid lines. But wait, they’re solid black! This is even more distracting than before! Have no fear, just go to the “Color” options on the right-hand toolbar and knock it down to maybe 30% gray or so:

Changing stroke color in Illustrator.

While we’re at it, let’s also bump up the thickness of the plot lines. To select the plot lines, you will need to Ctrl+click, since they are grouped (somehow, bizarrely) with the plot itself. You can Ctrl+Shift+click to select the other lines after that, so that you have all three selected at once. Then go back to the “Stroke” options on the right-hand toolbar (pictured in one of the above images) and increase the line weight to 2px. You should now have a figure that looks something like this:

We’re getting there … thicker lines, thinner grid, and nicer fonts. Only a few steps to go.

This is already a terrific improvement over the original version. I would be fine with publishing it at this point. But, there are a few other things we can do to make it even better.

Step 3: Care about Colors

As we saw in the very first image, Matlab’s default colors are red, green, and blue. This is fine, but you’ll find if you move beyond the defaults, people appreciate the extra effort. Illustrator makes the colors a little less grating when it imports the EPS file, because it converts them to CMYK format and does some magic behind the scenes that I don’t fully understand. But let’s imagine the case where the three lines represent something changing along a continuum (which they are—the exponential decay coefficient, to be exact). In this case, we might want their colors to also lie along a continuous spectrum.

I know of no better place to find discrete color palettes than Colorbrewer (http://colorbrewer2.org/). I am certainly not the first person to sing its praises, so you can go read about it elsewhere. Basically you choose whether your data are “Sequential”, “Diverging”, or “Qualitative”, give it the number of colors you want, and it generates a well-spaced set of colors for you to use. The case I just described would fall into the “Sequential” category, so let’s grab a set of 3 colors (for our three lines) from the single hue blues:

Colorbrewer: if you’re not using it, you should be.

You can see in the bottom-right the RGB values for these three colors. You can change this to CMYK or HEX format if you prefer, but I’ll stick to RGBs in this example. Note if I were more responsible, I would have configured the colors of these lines in the Matlab script, so I could regenerate the figure as many times as I needed to. Oh well, we’re doing it in Illustrator now.

We want to change the color of each line, along with its legend entry, to correspond to the three colors that Colorbrewer gave us above. To do this quicker, we’ll use a neat selection trick. First Ctrl+click to select one of the lines. Then go to the menu option Select / Same / Stroke Color to add the line’s legend entry to your selection (see below). Now you can change the color of both of them at the same time!

Illustrator has an option to select all other elements in the figure that have the same (appearance). In this case we want the same stroke color.

How do we set their color, you ask? Go back to the “Color” options on the right-hand toolbar. The color format might be set to something other than RGB, like CMYK. To change it to RGB, click the small options box in the upper-right corner and select “RGB”:

Change the color format to RGB.

Then enter the RGB values from Colorbrewer. Repeat for the other two plot lines, and you’ll end up with something like this:

Same figure as before, but with a “sequential” color palette. Useful if the lines represent some variable changing along a continuum.

Step 4: “Flat” Legend and Outer Box

At this point, I only have a few remaining grievances about this figure. First, the legend is too small. Second, there is a ton of whitespace in the upper right of the figure that is being wasted. Third, the solid black borders around the legend box, and around the plot itself, are distracting attention from the plot lines (which should be the focus, after all). Let’s see what we can do about this.

Select the legend and its components. Since we ungrouped everything at the beginning of this exercise, this will be annoying. Remember you can Shift+Click to select multiple elements at once. To make it easier to edit the legend, just move it outside the plot area for now. (Once you’re more experienced with Illustrator you can use layers to more easily edit “stuff” that’s sitting on top of other “stuff”).

To make the legend bigger (proportionally), it’s fine to select everything and Shift+Drag one of the corners of the bounding box. This will rescale the text proportionally as well. Make sure after you do this that the line thickness in the legend matches the thickness of the plot lines themselves (an important technicality), using the “Stroke” options in the right-hand toolbar.

Things get a bit confusing here because there may actually be two rectangles behind our legend—one defining the black border, and one defining the white background. If this seems to be the case in your figure (it is in mine), delete the black border completely, and then select the white background. Go to the “Color” options on the right-hand toolbar and change from white to a very light gray, maybe around 5% or so:

Changing the fill color of a rectangle.

See those two squares in the top-left of the Color options box? Those are the fill and stroke, respectively. You can toggle back and forth between the two by clicking them. The icon with the red line through it means there is “no fill” (or “no stroke”, respectively). The options for “none”, “black”, and “white” are always visible at the bottom because these are so common. Otherwise, you choose your own color. If you wanted something besides grayscale here, you could change the color format like we did before.

Take your enlarged legend and move it back inside the plot box. Here’s mine:

The “flat” legend at work. It solves our whitespace problem and is also nicer to look at.

Some may call it a fad, but I absolutely love these “flat” legends. Borders around legends are distracting, whereas a borderless light rectangle looks so clean. It can sit on top of the plot grid lines with no problem at all. And, by including the title inside the legend box, we’ve saved a bunch of whitespace at the top of the figure, too. I also rearranged the legend entries to go in the same vertical order as the plot lines—not required, but I think it makes sense here, especially if you pretend they’re called something besides “Thing 2”.

As a final step in our crusade against dark borders, let’s change the outer box to the same line style as the grid lines. For the record, I’m not completely sure I like the outcome of this, but it’s at least worth discussing. Select one of the black tick marks with Ctrl+Click (they are grouped with something else) and then use the Select / Same / Stroke Color to select all of the remaining black lines in the plot. See if you can remember how to change their line width to 0.333px (matching the grid lines), and their color to grayscale 30%. You should end up with something like this:

The final result! Send it off to Nature and take a well-deserved break.

The border of the plot doesn’t announce itself anymore, but I consider this a good thing. The focus of the plot should be on the values being plotted, not on the box and grid lines. As I said before, this may be overkill, but now this seems like a polished, easily interpretable figure that I would be happy to see in a journal.

By the way, if you save your figure as a PDF and want to include it in a LaTeX document, it’s easy:

begin{figure}begin{center}
includegraphics[width=1.0columnwidth]{my-figure-filename.pdf}
caption{My Caption.}
end{center}end{figure}

That’s all for this tutorial. Go back and look at the original plot at the top of this post, and then at the final product we just made—what a difference! Don’t despair if it takes you a while to do this, because eventually you’ll learn where all of the options are and you’ll get much faster. Vector graphics are definitely worth learning, partly for clear scientific communication, but also because they just plain look nice.

September 4, 2014 by benlivneh

Running jobs on the supercomputer: JANUS

The power of supercomputing is undeniable. However, there is often a hurdle in syntax to get jobs to run on them. What I’m including below are ways to submit jobs to run on the CU-Boulder supercomputer, JANUS, which I hope will be helpful.

To log on, open up a terminal window (e.g. Terminal on a Mac or CygWin on a PC): ssh <username>@login.rc.colorado.edu

To copy items to JANUS from a shell, simply use the following:

scp <path and filename on local machine> <username>@login.rc.colorado.edu:<destination path on JANUS>/

The purpose of the job script is to tell JANUS where to run the job. I will cover two types of job scripts, (1) to submit a job to an entire node, and (2) to submit to a single processor. Note, nodes on JANUS contain multiple processors, usually more than 12, so that if you have a memory intensive job you may wish to submit the former. Also, the jobs that occupy entire nodes offer the user a larger number of total processors to work with (several thousand cores versus several hundred). Nevertheless, here are the examples:

1. Example script to submit to a node is below: The body of text should be saved to a text file with a “.sh” suffix (i.e. shell script). Also notice that lines that begin with “#” are not read by the program, but rather are for comments/documentation. To submit the script, first be sure you’ve loaded the slurm module:

module load slurm

sbatch <path and filename of script>

#!/bin/bash
# Lines starting with #SBATCH are interpreted by slurm as arguments.
#

# Set the name of the job, e.g. MyJob
#SBATCH -J MyJob

#
# Set a walltime for the job. The time format is HH:MM:SS - In this case we run for 12 hours. **Important, this length should be commensurate with the type of node
# you're submitting to, debug is less than 1 hour, but others can be much longer, check the online documentation for assistance

#SBATCH --time=12:00:00
#
# Select one node
#
#SBATCH -N 1

# Select one task per node (similar to one processor per node)
#SBATCH --ntasks-per-node 12
# Set output file name with job number

#SBATCH -o MyJob-%j.out

# Use the standard 'janus' queue. This is confusing as the online documentation is incorrect, use the below to get a simple 12 core node

#SBATCH --qos janus

# The following commands will be executed when this script is run.

# **Important, in order to get 12 commands to run at the same time on your node, enclose them in parentheses "()" and follow them with an ampersand "&"

# to get all jobs to run in the background. The last thing is be sure to include a "wait" command at the end, so that the job script waits to terminate until these

# jobs complete. Theoretically you could have more than 12 command below.

# ** Note replace the XCMDX commands below with the full path to your executable as well as any command line options exactly how you'd run them from the

# command line.

echo The job has begun

(XCMD1X) &

(XCMD2X) &

(XCMD3X) &

(XCMD4X) &

(XCMD5X) &

(XCMD6X) &

(XCMD7X) &

(XCMD8X) &

(XCMD9X) &

(XCMD10X) &

(XCMD11X) &

(XCMD12X) &

# wait ensures that job doesn't exit until all background jobs have completed

wait

EOF

2. Example script to submit to a single processor is below. The process is almost identical to above, except for 4 things: (i) the queue that we’ll submit to is called ‘serial’, (ii) number of tasks per node is 1, (iii) the number of executable lines is 1, and (iv) we do not need the “wait” command.

#!/bin/bash

# Lines starting with #SBATCH are interpreted by slurm as arguments.

#

# Set the name of the job, e.g. MyJob

#SBATCH -J MyJob

#

# Set a walltime for the job. The time format is HH:MM:SS - In this case we run for 6 hours. **Important, this length should be commensurate with the type of node

# you're submitting to, debug is less than 1 hour, but others can be much longer, check the online documentation for assistance

#SBATCH --time=6:00:00

#

# Select one node

#

#SBATCH -N 1

# Select one task per node (similar to one processor per node)

#SBATCH --ntasks-per-node 1

# Set output file name with job number

#SBATCH -o MyJob-%j.out

# Use the standard 'serial' queue. This is confusing as the online documentation is incorrect, use the below to get a single processor

#SBATCH --qos serial

# The following commands will be executed when this script is run.

# ** Note replace the XCMDX commands below with the full path to your executable as well as any command line options exactly how you'd run them from the

# command line.

echo The job has begun

XCMDX

EOF

September 2, 2014 by JR Kasprzyk

Using a local copy of Boost on your cluster account

Boost is a set of libraries for C++. It increases the language’s functionality, allowing you to do all sorts of interesting things (for example it has lots of random number generators). Boost may already be installed on your local research computing cluster. But there are several reasons why it may be a good idea to have your own copy of Boost to use within your user account:

It may be difficult or impossible to actually find the location of your computer’s Boost libraries.
Boost functions are introduced with newer and newer versions of the software. So what if you want to use a function that came out in a later version (i.e., 1.5.6) that is not in the version installed on your computer?
Perhaps you want to be able to see the source code of the Boost functions within your own account, to better understand how they work.

If so, it’s easy enough to download Boost to your local computer, then upload the files to your user account. Click “current release” on the main Boost website (see the link above). Then download the files to your computer. If you’re on a Windows machine, use a program like 7-zip to unpack all the files (or simply keep the tgz file and unpack them on the cluster, that’s probably faster anyway). Then, upload the Boost files to the cluster. I recommend placing the boost_1_56_0 folder inside the /lib/ folder on your home directory, that way all your libraries can be in one place.

Here’s the important part: any time you use Boost you need to point to where the libraries are stored. Because of that, you’ll need to know the path of Boost, that is, where the files “live” on your computer. There is probably a command in your makefile already that starts with -I. All you have to do is add your new Boost path to the command, on my system it looks something like:

CPPFLAGS=-I/home/myUsername/lib/boost_1_56_0

That’s it! Comments questions and concerns shall be given below.

Water Programming: A Collaborative Research Blog

Tips and tricks on programming, evolutionary algorithms, and doing research

Month: September 2014

Interpolation and resampling across projections and spatial resolutions with GDAL

Getting Started: Git and GitHub

Introduction to mpi4py

Scientific figures in Illustrator

Running jobs on the supercomputer: JANUS

Using a local copy of Boost on your cluster account