The Basics
git is a distributed version control system. I started using it for my projects (with ample starting assistance from Matt), and have been sufficiently convinced by its power and ease-of-use to begin spreading the word. This will be part 1 of a multi-part series, describing how to start using git for small projects — it’s still a useful tool even when you’re the sole contributor. Part 2 will describe how to set up and work with remote repositories. Additional parts may happen as needed. If you would like to know more about git and distributed version control in general, there’s a Google Tech Talk by Linus Torvalds (who led the effort to create git): http://www.youtube.com/watch?v=4XpnKHJAok8.
To get started, make sure that you have git installed on your system. On Linux (Debian-based), you can just do sudo apt-get install git. It’s available as a Cygwin package on Windows, and on PSU clusters using module load git (a good thing to put in your .bashrc if you’re going to be using it a lot). I know there are also straightforward installation processes for native Mac and Windows if you search around for them.
Let’s start out with a simple test repository. (If you prefer, you can also follow these steps using an existing project). Go to an empty directory and create a text file with something written in it, say “Hello World”. Now initialize version control by navigating to this folder in your terminal and typing:
git init
This creates a hidden folder, .git/, which tracks all the changes to the repository. Note that our text file has not been added to the repository yet. You can see this by running:
git status
This is a useful command that you can run at any time to see which files have changed in this repository, and which files in the folder are untracked. You should see that your text file is listed as untracked. Add it to the repository using:
git add yourFile.txt
You can also use wildcards here if you want to add all files and subdirectories to the repository. If you have any files or directories that you want to be ignored by version control, you should create a file called .gitignore and list them there, one file or directory per line.
Our text file has been added to the repository, but hasn’t been committed yet! (“Adding” and “committing” is a two-step process, but we’ll circumvent this in a minute). To commit your file for the first time, do the following:
git commit yourFile.txt -m "Initial commit"
The -m flag specifies the message to go along with this commit. If you don’t use the -m flag, git will open your default text editor and ask you to enter a commit message. This is sort of a pain, so it’s better to write your message as part of the command. Writing good commit messages can help you remember what you were doing, so it’s usually worth the extra time to give a good description of your latest commit. Convention says that commit messages should be written in the present tense, e.g. “Fix segfault in main()”, although I’m not sure why this is.
So now you created a repository and committed your file (or files, if you’re working on an existing project). You can run git status again to see where you are, and it should say that there are no changes to be committed. You can also run git log to see the history of your most recent commits.
Your Own “Undo” Button
You might be wondering why you would go through all this trouble. git offers obvious benefits as a multi-user version control system, which I’ll get to in a future post. But it’s also very useful even for your personal repositories, just to undo your changes and create different branches to explore new ideas without breaking anything in the master branch. I’ll go over these two topics in this section.
First, let’s undo a commit. Change something in your text file, say, from “Hello World” to “Wello Horld”. Commit it to the repository:
git commit -am "Change my text"
Note that the -a flag here is letting us skip the typical add step in the commit process. By using this flag we can automatically stage every file for commit, and then commit them. I’m sure there is a good reason why the two stages are separate, but I always use this flag to simplify the process. I recommend aliasing git commit -am to something like gcam just to speed things up. Anyway — if you run git log at this point, you should see 2 commits listed in your history.
But that was a silly change! We need to undo it. Run the following:
git reset --hard HEAD~1
This tells git to reset the repository to 1 commit before the current HEAD. (You could also revert by, 2, 3, or 10 commits if you wanted to). If you open your text file, you should see that the text has reverted to “Hello World”; similarly, if you run git log, you should see that we’re back on the first commit, and our second commit has been erased.
Using reset --hard is usually safe for personal repositories, as long as you’re absolutely sure that you want to revert. However, it’s not recommended for larger projects that are shared among several developers — in that case, you’d usually want the whole history to be visible. I’m not entirely clear on this distinction yet, so use with caution.
A potentially safer way to “undo” yourself is to create multiple branches, and then merge back with the master branch when you’re satisfied with your changes. If you run:
git branch
with no arguments, it will print out a list of branches in the current repository. Right now you will only have a master branch. The asterisk next to master indicates that you’re on this branch right now. You can create a new branch and switch to it using:
git checkout -b myOtherBranch
Now if you run git branch again, you should see the asterisk next to myOtherBranch. Make some changes in your text file (if you’re not feeling creative, “Goodbye World” should suffice). Commit your changes:
git commit -am "Say goodbye to the world"
Here’s the cool part. This change only happened in myOtherBranch, and not in master. To see this, switch back to the master branch:
git checkout master
and open your text file again. It should say “Hello World” as usual. (Note that running git checkout with the -b flag will create the specified branch. If you omit the -b flag, it will just switch between existing branches).
At this point, you have two options. If you don’t like the changes you made in myOtherBranch, you can delete it:
git branch -d myOtherBranch
(You may not be able to delete myOtherBranch yet since it hasn’t been merged. If you want to force-delete an unmerged branch, I believe you can use the -D flag rather than -d). Or, if you decide you do like your changes, you can merge myOtherBranch into master: (you should be on the master branch to do this)
git merge myOtherBranch
Now if you run git log from the master branch, you should see that the full commit history has been merged from the other branch. You should also see that your text file now says “Goodbye World”. Once your changes are safely merged, you can delete your other “test” branch since you no longer need it. (Note that merging becomes much more complicated when there are other people involved). I’ve found that this method of branching, testing, and merging (or not, if things go awry) is extremely useful for adding/testing new features one at a time.
This should give you an idea of how git works. This was a silly example, but you can see how the same principles could be extended to large code projects. In the next post I’ll talk about pushing to remote repositories, which is particularly useful for remote cluster computing. (If you’re tired of dragging-and-dropping files in WinSCP, stay tuned!)