In this tutorial, you will log onto a computing cluster and get comfortable with some basic UNIX commands. This post is about 2 years old at this point! It was originally written by Jon Herman and edited by Joe Kasprzyk, most recently on 9/27/2013.
One comment before we get started. At first, this post was written to help get started on the Penn State cluster. Now, folks in our research groups may be using computers at Cornell, University of Colorado, through the NSF Xsede System, or in other places! But generally the steps are about the same.
What is a cluster anyway?
When you use Excel or Matlab on your own laptop, all the calculations are being done right on your computer’s processor. On the internet, though, we’re used to having calculations done remotely “in the cloud” on a server somewhere. For example when you upload a video to YouTube, the conversion from your video format to Flash isn’t done on your laptop, it’s done somewhere in Iowa.
Using a computing cluster is the same idea. It may be fine to run a single MOEA run on your own laptop, but what happens when you want to run 50 random trials? Or the function evaluation time is really long? Plus, your laptop may not be that powerful and you may want to turn it off and go home, or someone might spill something on it, etc.
So using a computing cluster takes all the calculations and performs them somewhere else — on the cluster! So the idea is that you upload your files to a server, and then you can actually interact with the computer remotely, submit the computing jobs, and then download the results. For example, you can compile your code on the cluster (on the initial computer that you connect to called the login node, and then submit a remote job that gets performed on the compute notes.
You’ll need to interact with the cluster in two ways.
- Enter commands on the command line. Use this to submit jobs, run programs, process files, etc. There are several software packages available to do this. If you’re on a Mac or Linux machine, you should just be able to use the terminal. On Windows there are several options. The first is SSH Secure Shell, which can be downloaded from the Penn State ITS center (if you’re at Penn State). The second, which a lot of members in the group use, is Cygwin. Cygwin installs many unix-like programs integrated within the Windows environment. Third are a selection of different terminal programs such as Putty. On a Mac, I’ve seen people use a program called Fugu. But the workflow is similar across most of the programs:
Each of these options uses SSH to connect to the cluster. SSH stands for “secure shell” and provides remote access to the command line interface on the clusters. You will first need to define a connection—if you use the “Quick Connect” option, you will need to re-enter the connection information every time. To simplify future access, use the Profiles -> Add Profile option, then use Profiles -> Edit Profile to define the profile you just created.
A remote connection requires a host name, a user name, and a password. The host name will depend on which cluster you want to access.
Penn State Right now, the largest and most powerful cluster at Penn State is Cyberstar (hostname: cyberstar.psu.edu). You can also access smaller clusters which may be less crowded such as Lion-XO (lionxo.aset.psu.edu). A detailed list of available systems and their specifications is available here.
University of Colorado We have access to a computer called Janus. For more information, click here. Researchers in Joe Kasprzyk’s group have access to several computing allocations, for more information, email joseph.kasprzyk “at” colorado.edu.
Cornell The Reed group cluster, “TheCube”, is currently coming online. There may be an additional future post about this once it’s operational. In the meantime, contact Jon Herman at jdh366 “at” cornell.edu for more information.
When you connect, you will be prompted for your password (this is the same as your university logon). If you have already been approved for access to your chosen system, your login should be almost immediate.
Congratulations, you are now on the cluster! You should see a prompt like [username@hostname ~]$ with a blinking cursor to the right. The ~ symbol means that you are currently in your home directory. This is the UNIX command line interface—Windows also has a command line, but we rarely use it because its graphical interface is so convenient.
Let’s try out some basic commands. Commands allow you to move around your file system and move, copy, edit, and run your files. Some good ones to know starting out:
- ls (List contents of current directory)
- pwd (Print working directory)
- cd newFolder (Change directory to newFolder)
- cp filename newLocation (Copy a file to a new location)
- mv filename newLocation (Move a file to a new location)
- rm filename (Delete a file. This is permanent, use with care.)
- tar -cvf zippedFolder.tar oldFolder (Compress a directory. Tar is the UNIX version of zip)
- tar -xvf zippedFolder.tar (Uncompress a tarred folder to the directory zippedFolder/)
When moving around, remember that UNIX uses the forward slash (‘/’) to denote directories rather than the Windows backslash (‘\’). The current directory is denoted as a dot (‘.’) and the parent directory is denoted by two dots (‘..’).
From your home directory (~), the two main directories are your work and scratch folders. These are both associated with your username. Your work folder is where you will store and run most of your programs. The scratch folder offers unlimited file storage and is sometimes useful for holding large result files.
- Transferring files between your computer and the cluster. The first choice is the sftp command, covered in our post about using the cluster on a Mac. The second choice is using a program called WinSCP, which provides a graphical drag-and-drop interface for transferring files. The instructions to do so are below.
- Open WinSCP and connect to any of the clusters, similar to how you did with the SSH client. Note that your home directory is accessible from any of the clusters, so it doesn’t really matter which one you use with WinSCP.
- The transfer protocol on the first screen can remain at the default, “stfp”, with the default port being 22. Then simply type your user id and the remote host like you did with the SSH client.
- If you are prompted to choose an interface style, use the “Commander” interface. This shows you both the local and remote directories at the same time.
- The right-hand window will show your file system on the cluster. Use WinSCP to drag and drop files between local folders and your cluster folders. You can also drag and drop to/from your regular Windows folders.
- WinSCP also has a simple (but useful) text editor. If you have a text file on the cluster, right-click it in WinSCP and select “Edit”. A window will open that allows you to edit the file directly on the remote machine.
This should get you started with using the cluster. You will use the SSH client for compiling and running programs, and WinSCP to transfer files with your local machine.