All of the Analysis Code for my Latest Study is on GitHub

I’ve published to GitHub all of the code I wrote for the paper I’m currently working on.  This includes:

  • Python PBS submission script
  • Python scripts to automate reference set generation using MOEAFramework
  • Python scripts to automate hypervolume calculation using MOEAFramework and the WFG hypervolume engine
  • Python / Pandas scripts for statistical summaries of the hypervolume data
  • Python scripts to automate Sobol’ sensitivity analysis using MOEAFramework and tabulate the results.  (If I were starting today, I’d have an SALib version too.)
  • Python / Pandas / Matplotlib figure generation scripts:
    • Control maps for hypervolume attainment
    • Radial convergence plots (“spider plots”) for Sobol’ global sensitivity analysis results
    • Bar charts for Sobol’ global sensitivity analysis results
    • CDF plots (dot / shaded bar, plus actual CDF plots) for hypervolume attainment
    • Parallel coordinate plots
    • Input file generation for AeroVis glyph plotting
    • Joint PDF plots for hypervolume attainment across multiple problems

Not all of the figures I mentioned will turn up in the paper, but I provide them as examples in case they prove helpful.

Intro to git Part 2: Remote Repositories

This is Part 2 of a multi-part series about how to use git for version control. Part 1 described how to get started with git for small projects. This post will detail how to set up and work with remote repositories. Additional posts may happen as needed.

Remote repositories can really speed up your workflow. Think about every time you’ve wanted to transfer files to a remote computer. My previous methods to accomplish this usually involved either using scp in the terminal, or using a Windows GUI to accomplish the same. In either case, you have to identify which files you want to transfer. This can become a burden when you’re making edits to your code. How do you know if you’re working on the local or remote file? How do you know if the directories are synced? Git solves all of these problems.

To set up remote repositories, there are two steps, one on the local system and one on the remote system. Let’s say you’re working with the same repository from Part 1, which just contained a one-line text file.

You can do these two steps in either order, but let’s start by creating a bare repository on the remote system. Log in over ssh as usual and navigate to the directory where you want to host your repository. From that directory, run the following:

mkdir .git
cd .git
git init --bare

This creates a “bare” git repository in the .git directory. A bare repository is not meant to be worked in. It is only intended to receive updates from other repositories. Thus, we will be working on our local copies and pushing them to the remote repo.

Before you log off the remote system, there is one more confusing but necessary step. By default, you can push updates to this bare repo, but the code won’t be copied to any place where you can actually use it. To fix this, stay in your .git directory and create the file hooks/post-receive. As you might have guessed, this file will contain instructions for git to perform after it receives updates over the network. In this file, copy the following:

#!/bin/sh
GIT_WORK_TREE=./../ git checkout -f

(After you save the file, run chmod +x hooks/post-receive to make sure it has execute permissions).

Basically, we are telling git to take the files it just received and check them out to the directory specified by GIT_WORK_TREE, which in this case is just one level up from here. Bare repositories by definition do not have a working tree, so you will need to preface most git commands by setting the work tree as shown here. For example, this applies if you want to switch branches in the remote repository.

That’s all you need to do to set up the remote repository. Now back on your local system, navigate to the directory containing your repository and run:

git remote

The remote command will list all remotes associated with this repository. In this case we don’t have any yet, so let’s add the one we just created.

git remote add <repo-name> <user>@<host>:<path/to/file/.git>

Notice a few things about this command. First, repo-name can be called whatever you want, but you should probably name it based on the location of your remote repository. For example if your remote is located on the PSU clusters, you might call it psu. The second part of the command tells git to transfer files over ssh, using the standard user@host format. The path specified after the : symbol tells git where to find the repository you just created on the remote system.

This last part is important. Git will let you “add” remote repositories without checking whether they exist or not. This check will be performed, however, when you try to push updates. It’s up to you to make sure your path is specified correctly, and that the repository is set up already. That’s why we did that step first.

To see a usable example of adding a remote repo, I would do something like this:

git remote add psu jdh33@cyberstar.psu.edu:~/work/myDirectory/.git

The nice thing about the PSU clusters is that the distributed filesystem will still be available regardless of which cluster (Cyberstar, Lion-XO, etc.) you listed in your add remote command. Now that you’ve created a remote repository, you should be able to run:

git remote -v

and see your remote listed. (The -v flag toggles verbose output, which helpfully includes the host/path of the repository). At this point, you’re ready to push your code to the remote system:

git push <repo-name> <branch-name> (optional)

The optional branch-name parameter is used if you only want to push one branch at a time. I have found that for the initial push, I’ve needed to modify the command as follows (example):

git push psu +master:refs/heads/master

which specifies the location of the branch. You may need to do this every time you push a new branch for the first time. But after you do this the first time, you should be able to just run:

git push psu

to push the whole repository (all branches) to the remote system. Since pushing is set up to work via ssh, you may be prompted for your remote password when you run these commands unless you have ssh keys enabled (maybe a topic for another post). By the way, git push is another great thing to alias — “gp“, maybe?

Now if you go check your remote repository, all of your files should be there! The cool thing about this is that the files themselves are not sent over the network — only the changes to the files are sent. (In git lingo, they’re called “deltas”). This results in fast, efficient transfer, without having to figure out which files changed, which need to be updated, etc., since git handles all of this for you.

A last note: you can have as many remotes as you want! Just add new ones using the git remote add command discussed above. This is especially helpful if you need to push changes to multiple remote systems, one after the other. Hopefully you can see the possibilities here; I’m just discovering them myself. Your mileage may vary, but this has really increased my efficiency in using the clusters. You can do all of your editing locally, and then push the changes wherever you need to very quickly.

This post has only covered one-way pushing to remote repositories. Of course you can pull from them, too! This back-and-forth sharing is the heart of standard git usage (distributed development), which would be a good topic for a future post.

Intro to git Part 1: Local version control

The Basics
git is a distributed version control system. I started using it for my projects (with ample starting assistance from Matt), and have been sufficiently convinced by its power and ease-of-use to begin spreading the word. This will be part 1 of a multi-part series, describing how to start using git for small projects — it’s still a useful tool even when you’re the sole contributor. Part 2 will describe how to set up and work with remote repositories. Additional parts may happen as needed. If you would like to know more about git and distributed version control in general, there’s a Google Tech Talk by Linus Torvalds (who led the effort to create git): http://www.youtube.com/watch?v=4XpnKHJAok8.

To get started, make sure that you have git installed on your system. On Linux (Debian-based), you can just do sudo apt-get install git. It’s available as a Cygwin package on Windows, and on PSU clusters using module load git (a good thing to put in your .bashrc if you’re going to be using it a lot). I know there are also straightforward installation processes for native Mac and Windows if you search around for them.

Let’s start out with a simple test repository. (If you prefer, you can also follow these steps using an existing project). Go to an empty directory and create a text file with something written in it, say “Hello World”. Now initialize version control by navigating to this folder in your terminal and typing:

git init

This creates a hidden folder, .git/, which tracks all the changes to the repository. Note that our text file has not been added to the repository yet. You can see this by running:

git status

This is a useful command that you can run at any time to see which files have changed in this repository, and which files in the folder are untracked. You should see that your text file is listed as untracked. Add it to the repository using:

git add yourFile.txt

You can also use wildcards here if you want to add all files and subdirectories to the repository. If you have any files or directories that you want to be ignored by version control, you should create a file called .gitignore and list them there, one file or directory per line.

Our text file has been added to the repository, but hasn’t been committed yet! (“Adding” and “committing” is a two-step process, but we’ll circumvent this in a minute). To commit your file for the first time, do the following:

git commit yourFile.txt -m "Initial commit"

The -m flag specifies the message to go along with this commit. If you don’t use the -m flag, git will open your default text editor and ask you to enter a commit message. This is sort of a pain, so it’s better to write your message as part of the command. Writing good commit messages can help you remember what you were doing, so it’s usually worth the extra time to give a good description of your latest commit. Convention says that commit messages should be written in the present tense, e.g. “Fix segfault in main()”, although I’m not sure why this is.

So now you created a repository and committed your file (or files, if you’re working on an existing project). You can run git status again to see where you are, and it should say that there are no changes to be committed. You can also run git log to see the history of your most recent commits.

Your Own “Undo” Button
You might be wondering why you would go through all this trouble. git offers obvious benefits as a multi-user version control system, which I’ll get to in a future post. But it’s also very useful even for your personal repositories, just to undo your changes and create different branches to explore new ideas without breaking anything in the master branch. I’ll go over these two topics in this section.

First, let’s undo a commit. Change something in your text file, say, from “Hello World” to “Wello Horld”. Commit it to the repository:

git commit -am "Change my text"

Note that the -a flag here is letting us skip the typical add step in the commit process. By using this flag we can automatically stage every file for commit, and then commit them. I’m sure there is a good reason why the two stages are separate, but I always use this flag to simplify the process. I recommend aliasing git commit -am to something like gcam just to speed things up. Anyway — if you run git log at this point, you should see 2 commits listed in your history.

But that was a silly change! We need to undo it. Run the following:

git reset --hard HEAD~1

This tells git to reset the repository to 1 commit before the current HEAD. (You could also revert by, 2, 3, or 10 commits if you wanted to). If you open your text file, you should see that the text has reverted to “Hello World”; similarly, if you run git log, you should see that we’re back on the first commit, and our second commit has been erased.

Using reset --hard is usually safe for personal repositories, as long as you’re absolutely sure that you want to revert. However, it’s not recommended for larger projects that are shared among several developers — in that case, you’d usually want the whole history to be visible. I’m not entirely clear on this distinction yet, so use with caution.

A potentially safer way to “undo” yourself is to create multiple branches, and then merge back with the master branch when you’re satisfied with your changes. If you run:

git branch

with no arguments, it will print out a list of branches in the current repository. Right now you will only have a master branch. The asterisk next to master indicates that you’re on this branch right now. You can create a new branch and switch to it using:

git checkout -b myOtherBranch

Now if you run git branch again, you should see the asterisk next to myOtherBranch. Make some changes in your text file (if you’re not feeling creative, “Goodbye World” should suffice). Commit your changes:

git commit -am "Say goodbye to the world"

Here’s the cool part. This change only happened in myOtherBranch, and not in master. To see this, switch back to the master branch:

git checkout master

and open your text file again. It should say “Hello World” as usual. (Note that running git checkout with the -b flag will create the specified branch. If you omit the -b flag, it will just switch between existing branches).

At this point, you have two options. If you don’t like the changes you made in myOtherBranch, you can delete it:

git branch -d myOtherBranch

(You may not be able to delete myOtherBranch yet since it hasn’t been merged. If you want to force-delete an unmerged branch, I believe you can use the -D flag rather than -d). Or, if you decide you do like your changes, you can merge myOtherBranch into master: (you should be on the master branch to do this)

git merge myOtherBranch

Now if you run git log from the master branch, you should see that the full commit history has been merged from the other branch. You should also see that your text file now says “Goodbye World”. Once your changes are safely merged, you can delete your other “test” branch since you no longer need it. (Note that merging becomes much more complicated when there are other people involved). I’ve found that this method of branching, testing, and merging (or not, if things go awry) is extremely useful for adding/testing new features one at a time.

This should give you an idea of how git works. This was a silly example, but you can see how the same principles could be extended to large code projects. In the next post I’ll talk about pushing to remote repositories, which is particularly useful for remote cluster computing. (If you’re tired of dragging-and-dropping files in WinSCP, stay tuned!)