October 29, 2012 by Jon Herman

Intro to git Part 2: Remote Repositories

This is Part 2 of a multi-part series about how to use git for version control. Part 1 described how to get started with git for small projects. This post will detail how to set up and work with remote repositories. Additional posts may happen as needed.

Remote repositories can really speed up your workflow. Think about every time you’ve wanted to transfer files to a remote computer. My previous methods to accomplish this usually involved either using scp in the terminal, or using a Windows GUI to accomplish the same. In either case, you have to identify which files you want to transfer. This can become a burden when you’re making edits to your code. How do you know if you’re working on the local or remote file? How do you know if the directories are synced? Git solves all of these problems.

To set up remote repositories, there are two steps, one on the local system and one on the remote system. Let’s say you’re working with the same repository from Part 1, which just contained a one-line text file.

You can do these two steps in either order, but let’s start by creating a bare repository on the remote system. Log in over ssh as usual and navigate to the directory where you want to host your repository. From that directory, run the following:

mkdir .git
cd .git
git init --bare

This creates a “bare” git repository in the .git directory. A bare repository is not meant to be worked in. It is only intended to receive updates from other repositories. Thus, we will be working on our local copies and pushing them to the remote repo.

Before you log off the remote system, there is one more confusing but necessary step. By default, you can push updates to this bare repo, but the code won’t be copied to any place where you can actually use it. To fix this, stay in your .git directory and create the file hooks/post-receive. As you might have guessed, this file will contain instructions for git to perform after it receives updates over the network. In this file, copy the following:

#!/bin/sh
GIT_WORK_TREE=./../ git checkout -f

(After you save the file, run chmod +x hooks/post-receive to make sure it has execute permissions).

Basically, we are telling git to take the files it just received and check them out to the directory specified by GIT_WORK_TREE, which in this case is just one level up from here. Bare repositories by definition do not have a working tree, so you will need to preface most git commands by setting the work tree as shown here. For example, this applies if you want to switch branches in the remote repository.

That’s all you need to do to set up the remote repository. Now back on your local system, navigate to the directory containing your repository and run:

git remote

The remote command will list all remotes associated with this repository. In this case we don’t have any yet, so let’s add the one we just created.

git remote add <repo-name> <user>@<host>:<path/to/file/.git>

Notice a few things about this command. First, repo-name can be called whatever you want, but you should probably name it based on the location of your remote repository. For example if your remote is located on the PSU clusters, you might call it psu. The second part of the command tells git to transfer files over ssh, using the standard user@host format. The path specified after the : symbol tells git where to find the repository you just created on the remote system.

This last part is important. Git will let you “add” remote repositories without checking whether they exist or not. This check will be performed, however, when you try to push updates. It’s up to you to make sure your path is specified correctly, and that the repository is set up already. That’s why we did that step first.

To see a usable example of adding a remote repo, I would do something like this:

git remote add psu jdh33@cyberstar.psu.edu:~/work/myDirectory/.git

The nice thing about the PSU clusters is that the distributed filesystem will still be available regardless of which cluster (Cyberstar, Lion-XO, etc.) you listed in your add remote command. Now that you’ve created a remote repository, you should be able to run:

git remote -v

and see your remote listed. (The -v flag toggles verbose output, which helpfully includes the host/path of the repository). At this point, you’re ready to push your code to the remote system:

git push <repo-name> <branch-name> (optional)

The optional branch-name parameter is used if you only want to push one branch at a time. I have found that for the initial push, I’ve needed to modify the command as follows (example):

git push psu +master:refs/heads/master

which specifies the location of the branch. You may need to do this every time you push a new branch for the first time. But after you do this the first time, you should be able to just run:

git push psu

to push the whole repository (all branches) to the remote system. Since pushing is set up to work via ssh, you may be prompted for your remote password when you run these commands unless you have ssh keys enabled (maybe a topic for another post). By the way, git push is another great thing to alias — “gp“, maybe?

Now if you go check your remote repository, all of your files should be there! The cool thing about this is that the files themselves are not sent over the network — only the changes to the files are sent. (In git lingo, they’re called “deltas”). This results in fast, efficient transfer, without having to figure out which files changed, which need to be updated, etc., since git handles all of this for you.

A last note: you can have as many remotes as you want! Just add new ones using the git remote add command discussed above. This is especially helpful if you need to push changes to multiple remote systems, one after the other. Hopefully you can see the possibilities here; I’m just discovering them myself. Your mileage may vary, but this has really increased my efficiency in using the clusters. You can do all of your editing locally, and then push the changes wherever you need to very quickly.

This post has only covered one-way pushing to remote repositories. Of course you can pull from them, too! This back-and-forth sharing is the heart of standard git usage (distributed development), which would be a good topic for a future post.

October 29, 2012 by Jon Herman

Intro to git Part 1: Local version control

The Basics
git is a distributed version control system. I started using it for my projects (with ample starting assistance from Matt), and have been sufficiently convinced by its power and ease-of-use to begin spreading the word. This will be part 1 of a multi-part series, describing how to start using git for small projects — it’s still a useful tool even when you’re the sole contributor. Part 2 will describe how to set up and work with remote repositories. Additional parts may happen as needed. If you would like to know more about git and distributed version control in general, there’s a Google Tech Talk by Linus Torvalds (who led the effort to create git): http://www.youtube.com/watch?v=4XpnKHJAok8.

To get started, make sure that you have git installed on your system. On Linux (Debian-based), you can just do sudo apt-get install git. It’s available as a Cygwin package on Windows, and on PSU clusters using module load git (a good thing to put in your .bashrc if you’re going to be using it a lot). I know there are also straightforward installation processes for native Mac and Windows if you search around for them.

Let’s start out with a simple test repository. (If you prefer, you can also follow these steps using an existing project). Go to an empty directory and create a text file with something written in it, say “Hello World”. Now initialize version control by navigating to this folder in your terminal and typing:

git init

This creates a hidden folder, .git/, which tracks all the changes to the repository. Note that our text file has not been added to the repository yet. You can see this by running:

git status

This is a useful command that you can run at any time to see which files have changed in this repository, and which files in the folder are untracked. You should see that your text file is listed as untracked. Add it to the repository using:

git add yourFile.txt

You can also use wildcards here if you want to add all files and subdirectories to the repository. If you have any files or directories that you want to be ignored by version control, you should create a file called .gitignore and list them there, one file or directory per line.

Our text file has been added to the repository, but hasn’t been committed yet! (“Adding” and “committing” is a two-step process, but we’ll circumvent this in a minute). To commit your file for the first time, do the following:

git commit yourFile.txt -m "Initial commit"

The -m flag specifies the message to go along with this commit. If you don’t use the -m flag, git will open your default text editor and ask you to enter a commit message. This is sort of a pain, so it’s better to write your message as part of the command. Writing good commit messages can help you remember what you were doing, so it’s usually worth the extra time to give a good description of your latest commit. Convention says that commit messages should be written in the present tense, e.g. “Fix segfault in main()”, although I’m not sure why this is.

So now you created a repository and committed your file (or files, if you’re working on an existing project). You can run git status again to see where you are, and it should say that there are no changes to be committed. You can also run git log to see the history of your most recent commits.

Your Own “Undo” Button
You might be wondering why you would go through all this trouble. git offers obvious benefits as a multi-user version control system, which I’ll get to in a future post. But it’s also very useful even for your personal repositories, just to undo your changes and create different branches to explore new ideas without breaking anything in the master branch. I’ll go over these two topics in this section.

First, let’s undo a commit. Change something in your text file, say, from “Hello World” to “Wello Horld”. Commit it to the repository:

git commit -am "Change my text"

Note that the -a flag here is letting us skip the typical add step in the commit process. By using this flag we can automatically stage every file for commit, and then commit them. I’m sure there is a good reason why the two stages are separate, but I always use this flag to simplify the process. I recommend aliasing git commit -am to something like gcam just to speed things up. Anyway — if you run git log at this point, you should see 2 commits listed in your history.

But that was a silly change! We need to undo it. Run the following:

git reset --hard HEAD~1

This tells git to reset the repository to 1 commit before the current HEAD. (You could also revert by, 2, 3, or 10 commits if you wanted to). If you open your text file, you should see that the text has reverted to “Hello World”; similarly, if you run git log, you should see that we’re back on the first commit, and our second commit has been erased.

Using reset --hard is usually safe for personal repositories, as long as you’re absolutely sure that you want to revert. However, it’s not recommended for larger projects that are shared among several developers — in that case, you’d usually want the whole history to be visible. I’m not entirely clear on this distinction yet, so use with caution.

A potentially safer way to “undo” yourself is to create multiple branches, and then merge back with the master branch when you’re satisfied with your changes. If you run:

git branch

with no arguments, it will print out a list of branches in the current repository. Right now you will only have a master branch. The asterisk next to master indicates that you’re on this branch right now. You can create a new branch and switch to it using:

git checkout -b myOtherBranch

Now if you run git branch again, you should see the asterisk next to myOtherBranch. Make some changes in your text file (if you’re not feeling creative, “Goodbye World” should suffice). Commit your changes:

git commit -am "Say goodbye to the world"

Here’s the cool part. This change only happened in myOtherBranch, and not in master. To see this, switch back to the master branch:

git checkout master

and open your text file again. It should say “Hello World” as usual. (Note that running git checkout with the -b flag will create the specified branch. If you omit the -b flag, it will just switch between existing branches).

At this point, you have two options. If you don’t like the changes you made in myOtherBranch, you can delete it:

git branch -d myOtherBranch

(You may not be able to delete myOtherBranch yet since it hasn’t been merged. If you want to force-delete an unmerged branch, I believe you can use the -D flag rather than -d). Or, if you decide you do like your changes, you can merge myOtherBranch into master: (you should be on the master branch to do this)

git merge myOtherBranch

Now if you run git log from the master branch, you should see that the full commit history has been merged from the other branch. You should also see that your text file now says “Goodbye World”. Once your changes are safely merged, you can delete your other “test” branch since you no longer need it. (Note that merging becomes much more complicated when there are other people involved). I’ve found that this method of branching, testing, and merging (or not, if things go awry) is extremely useful for adding/testing new features one at a time.

This should give you an idea of how git works. This was a silly example, but you can see how the same principles could be extended to large code projects. In the next post I’ll talk about pushing to remote repositories, which is particularly useful for remote cluster computing. (If you’re tired of dragging-and-dropping files in WinSCP, stay tuned!)

Welcome to our blog!

Welcome to Water Programming! This blog is by Pat Reed’s group at Cornell, who use computer programs to solve problems — Multiobjective Evolutionary Algorithms (MOEAs), simulation models, visualization, and other techniques. Use the search feature and categories on the right panel to find topics of interest. Feel free to comment, and contact us if you want to contribute posts.

To find software: Please consult the Pat Reed group website, MOEAFramework.org, and BorgMOEA.org.

The MOEAFramework Setup Guide: A detailed guide is now available. The focus of the document is connecting an optimization problem written in C/C++ to MOEAFramework, which is written in Java.

The Borg MOEA Guide: We are currently writing a tutorial on how to use the C version of the Borg MOEA, which is being released to researchers here.

Call for contributors: We want this to be a community resource to share tips and tricks. Are you interested in contributing? Please email Lillian Lau at lbl59@cornell.edu. You’ll need a WordPress.com account.

October 27, 2012 by matt

Symbolic Links

A symbolic link (symlink) is a file that’s a pointer to another file (or a directory). One use for symlinks is to have a big file be in two places without using up twice as much space. ln -s /gpfs/home/abc123/scratch/big_file /gpfs/home/abc123/work/big_file

You can see another use in your home directory on the cluster: ls -l scratch gives us this directory listing:

lrwxrwxrwx 1 mjw5407 mjw5407 21 Jun 22 2011 scratch -> /gpfs/scratch/mjw5407.

The l right at the beginning of the line tells us that scratch is a symlink to /gpfs/scratch/mjw5407.

And if for some reason you see that scratch is not a symlink but a regular directory, something has gone wrong. (This happened recently to a cluster user. Check your directory listings!)

October 27, 2012 by matt

What’s taking up space on your cluster account?

A quick tip on finding and deleting big files on your cluster account. Use mmlsquota to inspect your quota usage on GPFS filesystems. This will tell you whether you really need to clean up. Use du -hs * to figure out how big your subdirectories are, and ls -lh to inspect the files in a directory. Use (with great caution) rm -rf <directory> to remove a big directory, and rm <filename> to delete a file (both commands are irreversible).

October 26, 2012 by JR Kasprzyk

Training video: Cluster job submission basics

In the next few weeks, we’ll be adding blog posts relating to our MOEAframework training. They are a little bit out of order now, but we may rearrange them as things move forward.

Jon Herman’s helpful comments about submitting jobs on HPC clusters is below.

October 26, 2012 by JR Kasprzyk

Training video: MOEAFramework, submitting multiple random seeds

In the next few weeks, we’ll be adding blog posts relating to our MOEAframework training. They are a little bit out of order now, but we may rearrange them as things move forward.

This video discusses submitting multiple random seeds in MOEAframework, by Jon Herman.

October 26, 2012 by JR Kasprzyk

Training video: MOEAframework Sobol Sensitivity Analysis

In the next few weeks, we’ll be adding blog posts relating to our MOEAframework training. They are a little bit out of order now, but we may rearrange them as things move forward.

This video discusses how to use MOEAframework to do Sobol Sensitivity Analysis, by Jon Herman.

October 26, 2012 by JR Kasprzyk

Training video: Elements of Problem Formulation

In the next few weeks, we’ll be adding blog posts relating to our MOEAframework training. They are a little bit out of order now, but we may rearrange them as things move forward.

The following video is Matt Woodruff’s commentary on problem formulation. Enjoy!

October 24, 2012 by Jon Herman

Get more screens … with screen

Tired of switching back and forth between local and remote terminal sessions by logging in and out? Then screen is (one of) the tool (s) for you!

screen will be available on most Linux systems, including the PSU clusters. It is also available as a package in Cygwin. Once you have it installed, try the following sequence of steps:

1. Open a terminal window and type screen. A welcome message will be displayed. Your terminal window is now running the screen program, even though you may not notice a difference.

2. Press (Ctrl+a c) — that is, hold down Ctrl, then press a and c. This will create a new “screen”. Note that all screen commands start with Ctrl+a by default.

3. You now have two “screens” running. To see this, press (Ctrl+a w) to display the list of open windows. In Cygwin, this will display in the top bar of your terminal. There will be an asterisk next to the window you’re currently in.

4. Press (Ctrl+a n) to toggle through your “screens”. Try ssh-ing into a remote system in one of the screens, then switching back to the other screen on your local system. This is a great way to avoid logging in and out every time you need to move between systems!

5. You can close a single screen by pressing (Ctrl+a K) (note the capital K). You can quit the whole screen program by running screen -X quit.

There are many other things you can do with this. I just discovered it not long ago, although I suspect the more Linux-minded people among us have known about it for a while. There is a more comprehensive list of commands available here: http://kb.iu.edu/data/acuy.html

Thanks for reading, and feel free to add to this post if you learn/know some good tips.

Water Programming: A Collaborative Research Blog

Tips and tricks on programming, evolutionary algorithms, and doing research

Month: October 2012

Intro to git Part 2: Remote Repositories

Intro to git Part 1: Local version control

Welcome to our blog!

Symbolic Links

What’s taking up space on your cluster account?

Training video: Cluster job submission basics

Training video: MOEAFramework, submitting multiple random seeds

Training video: MOEAframework Sobol Sensitivity Analysis

Training video: Elements of Problem Formulation

Get more screens … with screen