Enhance your (Windows) remote terminal experience with MobaXterm

Jazmin and Julie recently introduced me to a helpful program for Windows called “MobaXterm” that has significantly sped up my workflow when running remotely on the Cube (our cluster here at Cornell). MobaXterm bills itself as an “all in one” toolbox for remote computing. The program’s interface includes a terminal window as well as a graphical SFTP browser. You can link the terminal to the SFTP browser so that as you move through folders on the terminal the browser follows you. The SFTP browser allows you to view and edit files using your text editor of choice on your windows desktop, a feature that I find quite helpful for making quick edits to shell scripts or pieces of code as go.

mobaxtermsnip

A screenshot of the MobaXterm interface. The graphical SFTP browser is on the left, while the terminal is on the right (note the checked box in the center of the left panel that links the browser to the terminal window).

 

You can set up a remote Cube session using MobaXterm with the following steps:

  1. Download MobaXterm using this link
  2.  Follow the installation instructions
  3. Open MobaXterm and select the “Session” icon in the upper left corner.
  4. In the session popup window, select a new SSH session in the upper left, enter “thecube.cac@cornell.edu” as the name of the remote host and enter your username.
  5. When the session opens, check the box below the SFTP browser on the left to link the browser to your terminal
  6. Run your stuff!

Note that for a Linux system, you can simply link your file browser window to your terminal window and get the same functionality as MobaXterm. MobaXterm is not available for Mac, but Cyberduck and Filezilla are decent alternatives. An alternative graphical SFTP browser for Windows is WinSCP, though I prefer MobaXterm because of its linked terminal/SFTP interface.

For those new to remote computing, ssh or UNIX commands in general, I’d recommend checking out the following posts to get familiar with running on a remote cluster:

 

 

 

Porting GCAM onto a UNIX cluster

Hi All,

One of my recent research tasks has been to port GCAM (Global Change Assessment Model) on to the Cube, our groups HPC cluster, and some of the TACC resources.  This turned out to be more challenging than I had anticipated, so I wanted to share my experiences in the hopes that it will save you all some time in the future.  This post might be a bit pedestrian for some readers, but I hope it’s helpful to folks that are new to C++ and Linux.

Before getting started, some background on GCAM.  GCAM was developed by researches at the Joint Global Change Research Institute (JGCRI) at the Pacific Northwest National Lab (PNNL).  GCAM is an integrated assesment model (IAM), which means it pairs a climate model with a global economic model.  GCAM is unique from many IAMs in that it has a fairly sophisticated representation of various sectors of the global economy modeled on a regional level…meaning the model is a beast compared to other IAMs (like DICE). I’ll be writing a later post on IAMs more generally.  GCAM is coded in C++ and makes extensive use of xml databases for both input and output.  The code is open source (available here), has a Wiki (here), and a community listserve where researches can pose questions to their peers.

There are three flavors of the model available: one for Windows users, one for Mac users, and one for Unix users.  The Windows version comes compiled, with a nice user-interface for post-processing the results.  The Unix version comes as uncompiled code, and requires the installation of some third party C++ libraries.  For those who’d like to sniff around GCAM the Windows or Mac versions are a good starting point, but if you’re going to be doing HPC heavy lifting, you’ll need to work with the Unix code.

The GCAM Wiki and the detailed user manual provide excellent documentation about running GCAM the Model Interface tools, but are a bit limited when describing how to compile the Unix version of the code.  One helpful page on the Wiki can be found here.  GCAM comes with a version of the Boost C++ library in the standard Unix download (located in Main_User_Workspace/libs).  GCAM also uses the Berkeley DBXML library, which can be downloaded here.  You’ll want to download version 2.5.16 to be sure it runs well with GCAM.

Once you’ve downloaded DBXML, you’ll need to build the library.  As it turns out, this is fairly easy.  Once you’ve ported the DBXML library onto your Unix cluster, simply navigate into the main directory and run the script buildall.sh.  The buildall.sh script accepts flags that allow you to customize your build.  We’re going to use the -b flag to build the 64-bit version of DBXML:

sh buildall.sh -b 64

once you’ve build the DBXML library, you are nearly ready to build GCAM, but must first export the paths to the Boost library and to the DBXML include and lib directories.  It seems that the full paths must be declared.  In other words, they should read something like:

export BOOST_INCLUDE=/home/fs02/pmr82_0001/jrl276/GCAM_Haewon_trans/libs/boost_1_43_0/
export DBXML_INCLUDE=/home/fs02/pmr82_0001/jrl276/dbxml-2.5.16/install/include/
export DBXML_LIB=/home/fs02/pmr82_0001/jrl276/dbxml-2.5.16/install/lib/

One tricky point here is that the full path must be typed out instead of using ‘~’ as a short-cut (I’m not sure why). Once this is done, you are ready to compile gcam.  Navigate into the ‘Main_User_Workspace’ directory and type the command:


make gcam

Compiling GCAM takes several minutes, but at the end there should be a gcam.exe executable file located in the ‘/exe/’  folder of the ‘Main_User_Workspace’.  It should be noted that there is a multi-thread version of gcam (more info here), which I’ll be compiling in the next few days.  If there are any tricks to that process, I’ll post again with some tips.

That’s it for now.

Useful Linux commands to handle text files and speed up work

Most of us, nice and normal human beings, tend to prefer programs with GUIs over typing commands on a command prompt because the former looks more “real” and is more visual than the latter. However, one thing we don’t know (or, sometimes, don’t want to know) is that learning a few terminal commands can dramatically increase productivity. These commands can save us a lot of time by sparing us from opening and closing programs, navigating through menus and windows, moving the mouse around, as well as moving the hand back and forth from the mouse to the keyboard.

This post will mention and briefly describe some useful “intermediate level” Linux commands (some basic commands are presented in this post by Jon Herman), which can be called from a Linux OS, Cygwin (mostly), or Mac. Among the numerous tedious tasks these commands can greatly simplify is the particularly interesting chore of handling text files, be they scripts or data files. Commands for other tasks are covered as well. Keep in mind that the symbol * is a wild card (character that can mean any string of characters when searching for files), which is really useful when the goal is to repeatedly apply one command to multiple files. For all commands listed here skip the “$” character.

DELIMITER SEPARATED FILES HANDLING

  • Remove columns 30 to 36 (starting from 0) from a comma separated file and export the output to another file.
    $ cut -d',' -f1-30,36 input.file >> output.file

    (More information on the post by Joe Kasprzyk)

  • Print only columns 2 and 4 (starting from 1) of a comma separated file.
    $ awk -F "," '{print $2,$4}' input.file >> output.file
  • Count number of columns in a file separated either by spaces or commas:
    $ head -1 input.file | sed 's/[^, ]//g' | wc -c
    or:
    $ awk -F "[, ]" 'END{print NF}' input.file
  • Print lines of a comma separated file in which the value in the 2nd column is lower than 100 and the value in the 5th column is higher than 0.3:
    $ awk -F "," '$2<100 && $5>0.3' input.file >> output.file
  • Print lines between 10th and 20th lines (not inclusive) of a file:
    $ awk 'NR>10 && NR<20' input.file >> output.file
  • Add a string to the end of multiple files:
    $ echo "your string" | tee -a *.set
  • Add a string to the end of one file:
    $ echo "your string" >> file

FILE SEARCHING

  • Find all text files in a folder that contain a certain string:
    $ grep -rn './folder' -e your_string
  • Find files recursively (* is a wildcard):
    $ find -type f -name name_of*.file

FILES INFO

  • See the contents of a zip/tar file without extracting it. Press q to quit.
    $ less file.tar
  • Count number of lines in a file:
    $ wc -l your.file
  • List all files with a certain extension in a directory:
    $ ls *.ext
  • Print files and folders in tree fashion:
    $ tree
  • Print the size of all subfolders and files in (sub)folders to a certain max depth in the folder hierarchy:
    $ du -h -a --max-depth=2

IMAGE HANDLING

  • Convert svg files to png (you need to have Inkscape installed):
    $ inkscape input.svg -d 300 -e output.png
  • Convert svg files to pdf-latex (you need to have Inkscape installed):
    $ inkscape input.svg --export-pdf output.pdf --export-latex
  • Rotate a picture:
    $ convert Fig6_prim.png -rotate 90 Fig6_prim_rotated.png

MISCELLANEOUS

  • See the history of commands you have typed:
    $ history
  • See a calendar (month and year optional):
    $ cal [month] [year]
  • Combine pdf files into one (you need to have pdftk installed):
    $ pdftk file1.pdf file2.pdf file3.pdf cat output newfile.pdf
    or, to merge all pdf files in a directory:
    $ pdftk *.pdf cat output newfile.pdf

    In order to see how to combine only certain pagers of pdf files, as well as how to splits all pages into separate pdf files, see this page.

  • See the manual of a command:
    $ man command

Another useful idea is that of piping outputs of a command to another command. For example, if you want print the number of files in a directory, you can pipe the output of the ls command (list all files in a directory) to the wc -l command (count the number of lines in a string). For this, use the “|” character:

$ ls | wc -l

However, you may want instead to check the number of lines in all the 20 files in a directory at once, which can also be achieved by combining the ls and wc commands with the command xargs. The command then would look like:

$ ls | xargs wc -l

The command xargs breaks down the output from ls into one string per line and then calls wc -l for each line of the output of ls.

Hope all this saves you some time!

Running jobs on the supercomputer: JANUS

The power of supercomputing is undeniable. However, there is often a hurdle in syntax to get jobs to run on them. What I’m including below are ways to submit jobs to run on the CU-Boulder supercomputer, JANUS, which I hope will be helpful.

To log on, open up a terminal window (e.g. Terminal on a Mac or CygWin on a PC): ssh <username>@login.rc.colorado.edu

To copy items to JANUS from a shell, simply use the following:

scp <path and filename on local machine>   <username>@login.rc.colorado.edu:<destination path on JANUS>/

The purpose of the job script is to tell JANUS where to run the job. I will cover two types of job scripts, (1) to submit a job to an entire node, and (2) to submit to a single processor. Note, nodes on JANUS contain multiple processors, usually more than 12, so that if you have a memory intensive job you may wish to submit the former. Also, the jobs that occupy entire nodes offer the user a larger number of total processors to work with (several thousand cores versus several hundred). Nevertheless, here are the examples:

1. Example script to submit to a node is below: The body of text should be saved to a text file with a “.sh” suffix (i.e. shell script). Also notice that lines that begin with “#” are not read by the program, but rather are for comments/documentation. To submit the script, first be sure you’ve loaded the slurm module:

module load slurm

sbatch <path and filename of script>

#!/bin/bash
# Lines starting with #SBATCH are interpreted by slurm as arguments.
#

# Set the name of the job, e.g. MyJob
#SBATCH -J MyJob

#
# Set a walltime for the job. The time format is HH:MM:SS - In this case we run for 12 hours. **Important, this length should be commensurate with the type of node
# you're submitting to, debug is less than 1 hour, but others can be much longer, check the online documentation for assistance

#SBATCH --time=12:00:00
#
# Select one node
#
#SBATCH -N 1

# Select one task per node (similar to one processor per node)
#SBATCH --ntasks-per-node 12
# Set output file name with job number

#SBATCH -o MyJob-%j.out

# Use the standard 'janus' queue. This is confusing as the online documentation is incorrect, use the below to get a simple 12 core node

#SBATCH --qos janus

# The following commands will be executed when this script is run.

# **Important, in order to get 12 commands to run at the same time on your node, enclose them in parentheses "()" and follow them with an ampersand "&"

# to get all jobs to run in the background. The last thing is be sure to include a "wait" command at the end, so that the job script waits to terminate until these

# jobs complete. Theoretically you could have more than 12 command below.

# ** Note replace the XCMDX commands below with the full path to your executable as well as any command line options exactly how you'd run them from the

# command line.

echo The job has begun

(XCMD1X) &

(XCMD2X) &

(XCMD3X) &

(XCMD4X) &

(XCMD5X) &

(XCMD6X) &

(XCMD7X) &

(XCMD8X) &

(XCMD9X) &

(XCMD10X) &

(XCMD11X) &

(XCMD12X) &

# wait ensures that job doesn't exit until all background jobs have completed

wait

EOF

2. Example script to submit to a single processor is below. The process is almost identical to above, except for 4 things: (i) the queue that we’ll submit to is called ‘serial’, (ii) number of tasks per node is 1, (iii) the number of executable lines is 1, and (iv) we do not need the “wait” command.

#!/bin/bash

# Lines starting with #SBATCH are interpreted by slurm as arguments.

#

# Set the name of the job, e.g. MyJob

#SBATCH -J MyJob

#

# Set a walltime for the job. The time format is HH:MM:SS - In this case we run for 6 hours. **Important, this length should be commensurate with the type of node

# you're submitting to, debug is less than 1 hour, but others can be much longer, check the online documentation for assistance

#SBATCH --time=6:00:00

#

# Select one node

#

#SBATCH -N 1

# Select one task per node (similar to one processor per node)

#SBATCH --ntasks-per-node 1

# Set output file name with job number

#SBATCH -o MyJob-%j.out

# Use the standard 'serial' queue. This is confusing as the online documentation is incorrect, use the below to get a single processor

#SBATCH --qos serial

# The following commands will be executed when this script is run.

# ** Note replace the XCMDX commands below with the full path to your executable as well as any command line options exactly how you'd run them from the

# command line.

echo The job has begun

XCMDX

EOF

Compiling OpenSees on the Janus cluster

We said this blog was about “research” not just “water-related research” so here’s a completely random set of instructions for you!

I recently had a conversation with a structural engineer about combining some of our optimization techniques with an open source earthquake engineering performance simulation called OpenSees. Talk about long evaluation times… this simulation takes minutes! Maybe even, 10 minutes! And it contains thousands of files! But here goes.

First, go to the website (the link is above), and sign up for an account. Once you have an account, the website is pretty easy to navigate. The download link allows you to get executables, but what we really want is the source code. If you click the source code link, you find a list of all the files… but what they want you to do is actually use Subversion to download the code to your own machine.

Here’s the trick! I found a helpful forum post linked here, that explains where to find the download instructions. When you click on the Source Code link, you’re going to see a bunch of folders. Navigate to trunk/MAKES/ and you should see a whole bunch of Makefiles for different systems. If you’re using one of the big computing clusters such as STAMPEDE, follow the instructions for their file. You’ll probably not need to do anything else, but sometimes you have to adjust things here and there for your own system.

So if you’re at the University of Colorado, you are going to be compiling on Janus, which uses the REDHAT operating system. So click on the Makefile.def.EC2-REDHAT-ENTERPRISE link. Here’s a link to it but I’m not sure if the link works properly because it’s a complicated URL.

See where it says “Following are commands to build OpenSees”? That’s what you want to follow! But, a lot of these commands aren’t needed on your own system because the packages like gcc and tcl are already installed. Ok so here goes.

1. Log on to Janus in the normal way. You need to load some modules for yourself that will help you. To load them, type (each line should be followed by an ‘enter’):

module load tcl/tcl-8.5.13
module load gcc
module load tcl/activetcl-8.5.13

2. When you are first logged in, you are typically in your home directory. Type ‘pwd’ to ensure where you are. It will look like /home/[your identikey]. Remember, you can always get back to your home directory by typing cd ~.
3. The first thing the commands want you to do is to create two special directories in your home folder called bin, where executable programs are stored, and lib, where libraries are stored. These commands are:

mkdir bin
mkdir lib

4. Now, use svn to obtain the code. Note, I am copying the commands here but please refer to the actual list of commands in the Makefile from the website, because those will be the latest commands!

svn co svn://opensees.berkeley.edu:/usr/local/svn/OpenSees/trunk OpenSees

This will take a few minutes. What it is doing, is downloading all these source code files to your account on Janus, to get ready for you to compile. If you need a refresher on what compiling is, I have some comments about this on a related post here, and I’ve also posted about makefiles too.
5. At this point, you’ve created the special directories in your home directory, and also downloaded all the data. Next, you’ll want to change into the new directory that you just created:

cd OpenSees

and of course, you can make sure you’re in the proper place by running the pwd command.
6. Now that you’re in the OpenSees directory, you want to basically pick out the proper Makefile for you to use. The commands show you how to do this:

cp ./MAKES/Makefile.def.EC2-REDHAT-ENTERPRISE ./Makefile.def

What this command does, is it says, “There are like, dozens of makefiles in the directory… which one am I supposed to use?” and what you do is copy the specific makefile into a generic makefile in your OpenSees directory called, simply, Makefile.def.
7. Now you’re going to want to use a text editor to edit the Makefile.def for your own purposes. I like to use emacs, but there are others. To open the file for editing, type:

emacs Makefile.def

This is a really nice makefile, the folks at Berkeley have done a great job. However, there are a few things you need to edit to make this work for the University of Colorado system, in my experience. They are listed in the following items
8. If you scroll down to section 2, the value of the BASE variable is set to /usr/local. I got rid of this and just wrote BASE =
9. Take a look at section 3, which lists the libraries. The nice thing, all these libraries are going to be created for you and placed in the lib folder you created in your home directory. But watch for the tcl library! The path for this library is not correct. By my calculation, on the University of Colorado computer the path should be:

TCL_LIBRARY = /curc/tools/x_86_64/rh6/tcl/8.5.13/lib/libtcl8.5.so

10. Section 4 tells the makefile where to look for the compilers. This also needs to change on the Colorado computer! The default commands have /usr/bin in front of them but we need to change them to:

C++ = g++
CC = gcc
FC = gfortran

11. The last part of Section 4 gives flags to the compiler that gives the compiler options on how to actual compile and link the code. In the LINKFLAGS variable, there are two options: -rdynamic and -Wl. Remove the -Wl option.
12. To save the file, hold Control and type ‘X s’. Then, to exit the program, Control and ‘x c’.
13. We are almost ready to compile, but first triple check you are in the correct directory. When you type ‘pwd’ you should see:

home/[identikey]/OpenSees

If you don’t see that, type: cd ~/OpenSees.
14. Type:

make

and if everything went right, you should be all set! Note, this will take about 15 minutes.
15. To run the program from any directory, type ~/bin/OpenSees. In other words, the executable is in the /bin/ folder in your home directory.

Now don’t ask me the first thing about what this program does… I just helped you compile it. 🙂

Compiling MODFLOW2005 on the Janus computing cluster

Perhaps you are new to working with source code, and you want to know how to compile your first program on a super computer or cluster. Here’s some instructions on how to do so.

The link from USGS (http://water.usgs.gov/ogw/modflow/MODFLOW.html#downloads) had a link that said “MODFLOW-2005 source code and sample problems converted to the Unix file structure”.

I downloaded the folder, extracted it, and transferred it to Janus. If you don’t know how to transfer files, use Globus Connect or another service, as documented in our post about what software to install and our post on cluster commands.

The folder contains some subfolders, /doc/, /src/, /test-out/, and /test-run/. The source code is in /src/.

When you look at source code, different folks break it up in different ways. Sometimes the actual code is in /src/ and then the actual executable files are stored in a ‘binary’ folder called /bin/. Or something like that. But anyway, what you want to look for is a file called a makefile. Luckily, the friendly folks at USGS have put the makefile in the /src/ folder.

On Janus, navigate to the /src/ folder and type the command:

make

and this will compile the files. Of course, there is a problem (there’s always a problem!) This time it says:

f90: command not found

After looking at the webpage for compilers at research computing (https://www.rc.colorado.edu/support/userguide/compilersandmpi) it turns out that they have several packs of compilers we can load up. In Linux/Unix, the GNU compilers are standard. There is also such a compiler as the Intel family of compilers, too.

For our purposes, we will use the GNU fortran compiler. First, to load that family of compilers run the command:

module load gcc

The makefile for the project needs to know that you are using the gfortran compiler. So, edit the Makefile in the /src/ folder. Instead of modifying the actual line it’s always a good idea to comment out the command. So replace:

F90= f90

with:

#F90= f90
F90= gfortran

and save the file. Now, when you run:


make

you’ll see some magic happen. If you see a few warnings that’s ok.

Finally, to run the program, simply type the name of the executable:


./mf2005

I don’t know anything about MODFLOW but these words indicate it looks like it’s working:

MODFLOW-2005
U.S. GEOLOGICAL SURVEY MODULAR FINITE-DIFFERENCE GROUND-WATER FLOW MODEL
Version 1.11.00 8/8/2013

Enter the name of the NAME FILE:

Voila! Now follow the readme and documentation to actually try some examples. But hopefully this will help someone out there that is new to compiling and using clusters.

Running tmux on the cluster

tmux is a terminal multiplexer — a program that lets you do more than one thing at a time in a terminal window. For example, tmux lets you switch between running an editor to modify a script and running the script itself without exiting the editor. As an added bonus, should you lose your ssh connection, your programs are still running inside tmux and you can bring them up again when you re-connect. If you’ve ever used GNU Screen, it’s the same idea.

Here’s what it looks like in the MinTTY terminal that comes with Cygwin.   In this example I’ve split the window across the middle, with an editor open in the bottom and a command prompt at the top.  (You can do vertical splits too.)

tmux terminal multiplexer showing a split window

tmux terminal multiplexer showing a split window

Building tmux

I recently built tmux again. It’s pretty easy to do, and it takes up less than 7 megabytes of disk space. Here’s how:

Make a directory for your own installs. Mine is called ~/local

mkdir ~/local

You’ll probably have to build libevent 2.0 first if, like me, it’s not already on your system.

Make a directory for building things, if you haven’t already got one.

mkdir build
cd build

Download libevent. Be sure to replace 2.0.21 with the latest stable version.
wget https://github.com/downloads/libevent/libevent/libevent-2.0.21-stable.tar.gz

Extract libevent

tar xzvf libevent-2.0.21-stable.tar.gz
cd libevent-2.0.21-stable

Build libevent. I used an absolute path with the prefix to be on the safe side.

./configure --prefix=/path/to/my/home/directory/local
make
make install

Download and extract tmux. Get the latest version, which is 1.8 at the time of this writing.

wget http://downloads.sourceforge.net/tmux/tmux-1.8.tar.gz
tar xzvf tmux-1.8.tar.gz
cd tmux-1.8

If you rolled your own libevent, you’ll need to set appropriate CFLAGS and LDFLAGS when you run configure. Otherwise you can skip export CFLAGS and export LDFLAGS.

export CFLAGS=-I/path/to/my/home/directory/local/include
export LDFLAGS="-Wl,-rpath=/path/to/my/home/directory/local/lib -L/path/to/my/home/directory/local/lib"
./configure --prefix=/path/to/my/home/directory/local
make
make install

Then add tmux to your path, and you’re done. Do this by putting the following lines in your .bashrc:

export MYPACKAGES=/path/to/my/home/directory/local/
export PATH=${MYPACKAGES}/bin:${PATH}