I recently learned about a service called Globus Online that allows you to easily transfer files to the cluster. It’s similar to WinSCP or SSH, but the transfers can happen in the background and get resumed if they are interrupted. It is supported by the University of Colorado: https://www.rc.colorado.edu/filetransfer and the NSF XSEDE machines: https://www.xsede.org/globus-online. Also, courtesy of Jon Herman, a note about Blue Waters: There is an endpoint called ncsa#NearLine where you can push your data for long-term storage (to avoid scratch purges). However on NearLine there is a per-user quota of 5TB. So if you find yourself mysteriously unable to transfer any more files, you’ll know why.
To get started, first create a Globus Online account. Then, you’ll need to create “endpoints” on your account. The obvious endpoint is, say, the cluster. The University of Colorado for example has instructions on how to add their cluster to your account. Then, you need to make your own computers an endpoint! To do this, click Manage Endpoints then click “Add Globus Connect.” Give your computer a name, and then it will generate a unique key that you can then use on the desktop application for the service. Download the program for Mac, Unix, or Windows. The cool thing is you can do this on all your computers. For example I have a computer called MacBookAir, using OSX, and another one called MyWindows8 or something like that, that uses Windows.
File transfers are then initiated as usual, only you’re using a web interface instead of a standalone program.
As usual feel free to comment in the comments below.
I’ve been working lately with some visiting students, people outside the research group, and new students within the group that need to install some software on a personal laptop to get started. Here is a guide to what you need to install. (Last Updated January 20, 2012).
- (Required, Available to ANGEL Group Members Only) If you haven’t already done so, contact Josh Kollat to become part of the AeroVis user group on Penn State’s ANGEL course management system, http://cms.psu.edu/. There, you can download AeroVis, which is the visualization software that we use. Also grab a copy of the sample data files and documentation which are really helpful.
- (Required, Available to Everyone) Cygwin is used to connect to the cluster and see visual programs such as Matlab that are running on the cluster. It can be found at http://www.cygwin.com/. Download setup.exe and save it in a place (such as your Documents folder) where you will remember where to find it, since you may need to run it again. Double click on setup.exe to get started. You can either select the “default” packages when it asks which packages to install, or go ahead and select “all” to install all packages. Either way, you may have to go back and select additional packages in subsequent runs of setup.exe if everything doesn’t install right. The main package we’ll be using is “X-Window”, but according to their website, you install X-Window by following the same process of installing the cygwin package. This is probably the hardest of any of the software packages to install, and it takes a long time. Let us know if you need any help, and you can leave a comment if you have any tips on how to make the install process easier. Afterwards, change your windows environment variables to add ;C:\cygwin\bin to your windows path. Note: Ryan has some additional ideas about the best way to access remote computers. Stay tuned for his post on the subject, and he can edit this post too with more details.
- (Required, Available to Everyone) You’ll need a text editor, such as Notepad++ or PSPad. Try both and see which one you like better, they are both free. While you can use notepad or wordpad to do a lot of the text editing, these programs are a lot more comfortable for working with data and programming.
- (Required, Available to Everyone) Use WinSCP for file transfer to the cluster.Ryan’s suggestions will probably pertain to this advice too. I know Matt and Ryan use different software packages so I’d love their input here. See comments to this post for additional discussion about this.
- (Required Only for Visiting Students who Need College of Engineering Wireless) A group member can contact Bob White to get access for you. A Penn State student must download software at the college’s site and load it on the visitor’s laptop.
- (Required for Most Students Publishing Papers and Writing Theses Internally in the Group) You’ll need a LaTeX system. LyX is an open source “document processor” that has compatability with LaTeX. Please add additional suggestions for LaTeX environments and editors, preferably ones with syntax highlighting and some graphical features (such as adding equations using symbols). We have a license for WinEdt, but it’s not free for personal use.
- (Optional, Available to Everyone) Open source tips: When in a Penn State computer lab or at a computer in our office, you can use Adobe Photoshop and Illustrator for figure editing and Microsoft Office products. If you want access to some nice programs on your personal computer, though, for free, try Inkscape (for vector images), GIMP (for raster images and pictures), and Open Office (an alternative to Microsoft) which are all freely available.
- (Optional, Available to Penn State Students Only) Some good software is available at http://downloads.its.psu.edu/. Secure Shell Client, under “File Transfer” at that site, is a file transfer/terminal program used to connect to the cluster. Different people have varying preferences for file transfer and cluster stuff. I personally recommend WinSCP and running terminal commands on Cygwin, so Secure Shell is not really required.
As far as software that costs money or for computers in the office, we would generally need Microsoft Office, Microsoft Visual Studio, Matlab, and the Adobe Suite. Students shouldn’t have to worry about installing those programs on their personal computers. If you get Cygwin working correctly, your cluster access will allow you to use Matlab, Mathematica, programming compilers, and other software, so even if you don’t have access to your own copy of Matlab, you can use it interactively on the cluster.
Let me know if you have any questions by emailing jrk301 at psu.edu.
It turns out there are some pros and cons to running on Macs for doing these activities. Here are some updates on how to efficiently work on a mac:
- You don’t need Cygwin at all since X11/XWindow is included in the operating system already!
- You don’t need something like WinSCP, since you can use SFTP to transfer files from a local computer to the remote computer. Here’s how:
- Open a terminal window on your local computer.
- Use cd and ls to get to the local directory on your hard drive where you have files you want to send to the remote computer.
- Type sftp user@host to connect to the remote computer. Enter your password.
- Use cd and ls to get to the directory on the remote system where you want to put the files from your local system.
- Type put filename to transfer a file to the remote system. You can also use mput *abc to transfer multiple files (in this example, everything ending in abc). The asterisk is a wildcard; it matches any character, any number of times.
- If you want to transfer in the other direction, i.e. from the remote machine to the local machine, use the get and mget commands, which work just like put and mput.
- Summary of useful sftpcommands:
- get filename Copy a file from the remote computer to the local computer.
- mget filenames Copy several files from the remote computer to the local computer. Can use wildcards.
- put filename Copy a file from the local computer to the remote computer. Use -r if you want to upload a whole directory. Note that the command can’t create a directory that already exists, so when you’re on the remote computer, use mkdir to make a new directory that matches the one you want to copy first.
- mput filenames Copy several files from the local computer to the remote computer. Can use wildcards.
- cd path Change directories on the remote computer.
- ls List the files in the current directory on the remote computer.
- pwd Display the path of the current directory on the remote computer.
- lpwd Display the path of the current directory on the local computer.
- lcd Change directories on the local computer.