Making Movies of Time-Evolving Global Maps with Python

Making Movies of Time-Evolving Global Maps with Python

Hi All,

These past few months I’ve been working with the Global Change Assessment Model (GCAM) which is an integrated assessment model (IAM) that combines models of the global climate and economic systems. I wrote an earlier post on compiling GCAM on a Unix cluster.  This post discusses some visualization tools I’ve developed for GCAM output.

GCAM models energy and agriculture systems at a regional level, where the world is composed of 32 regions.  We’re interested in tracking statistics (like the policy cost of stabilization) over time and across regions.  This required three things:

  1. The ability to draw a global map.
  2. The ability to shade individual political units on that map.
  3. The ability to animate this map.

Dr. Jon Herman has already posted a good example of how to do (1) in python using matplotlib’s Basemap.  We’ll appropriate some of his example for this example.  The Basemap has the option to draw coastlines and boundaries, but these boundaries are not tied to shapes, meaning that you can’t assign different colors to individual countries (task (2) above).  To do that, we need a shapefile containing information about political boundaries.  You can find these for free from a number of sources online, but I like Natural Earth.  They provide data on many different scales. For this application I downloaded their coarsest data set.  To give each country a shade which is tied to data, we use matplotlib’s color map.  The basic plan is to generate a colored map for each time-step in our data, and then to animate the maps using the convert linux command.

Now that we’ve described roughly how we’ll proceed, a word about the data we’re dealing with and how I’ve handled it.  GCAM has 32 geo-political regions, some of which are individual countries (like the USA or China), while others are groups of countries (like Australia & New Zealand). I stored this information using a list of lists (i.e. a 32-element list, where each element is a list of countries in that region). I’ve creatively named this variable list_list in this example (see code below). For each of the regions GCAM produces a time series of policy costs as a fraction of GDP every 5 years from 2020-2100. I’ve creatively named this variable data. We want to tie the color of a country in each time to its policy cost relative to costs across countries and times.  To do this, I wrote the following (clumsy!) Python function, which I explain below.


def world_plot(data,idx,MN,MX):
 from mpl_toolkits.basemap import Basemap
 import matplotlib.pyplot as plt
 from matplotlib.patches import Polygon
 from matplotlib.collections import PatchCollection
 import matplotlib.cm as cm
 import matplotlib as mpl
 import numpy as np

 norm = mpl.colors.Normalize(vmin=MN, vmax=MX)
 cmap = cm.coolwarm
 colors=cm.ScalarMappable(norm=norm, cmap=cmap)
 colors.set_array(data)
 a = np.zeros([32,4])
 a = colors.to_rgba(data)

 fig = plt.figure(figsize=(10,10))
 ax = fig.add_subplot(111)

 m = Basemap(projection='robin', lon_0=0,resolution='c')
 m.drawmapboundary(fill_color='white', zorder=-1)
 m.drawparallels(np.arange(-90.,91.,30.), labels=[1,0,0,1], dashes=[1,1], linewidth=0.25, color='0.5',fontsize=14)
 m.drawmeridians(np.arange(0., 360., 60.), labels=[1,0,0,1], dashes=[1,1], linewidth=0.25, color='0.5',fontsize=14)

 year = [1990,2005,2010,2015,2020,2025,2030,2035,2040,2045,2050,2055,2060,2065,2070,2075,2080,2085,2090,2095,2100]
 GCAM_32 = ['PRI','USA','VIR']
 GCAM_1 = ['BDI','COM','DJI','ERI','ETH','KEN','MDG','MUS','REU','RWA','SDS','SDN','SOM','UGA','SOL']
 GCAM_2 = ['DZA','EGY','ESH','LBY','MAR','TUN','SAH']
 GCAM_3 = ['AGO','BWA','LSO','MOZ','MWI','NAM','SWZ','TZA','ZMB','ZWE']
 GCAM_4 = ['BEN','BFA','CAF','CIV','CMR','COD','COG','CPV','GAB','GHA','GIN','GMB','GNB','GNQ','LBR','MLI','MRT','NER','NGA','SEN','SLE','STP','TCD','TGO']
 GCAM_6 = ['AUS','NZL']
 GCAM_7 = ['BRA']
 GCAM_8 = ['CAN']
 GCAM_9 = ['ABW','AIA','ANT','ATG','BHS','BLZ','BMU','BRB','CRI','CUB','CYM','DMA','DOM','GLP','GRD','GTM','HND','HTI','JAM','KNA','LCA','MSR','MTQ','NIC','PAN','SLV','TTO','VCT']
 GCAM_10 = ['ARM','AZE','GEO','KAZ','KGZ','MNG','TJK','TKM','UZB']
 GCAM_11 = ['CHN','HKG','MAC']
 GCAM_13 = ['BGR','CYP','CZE','EST','HUN','LTU','LVA','MLT','POL','ROM','SVK','SVN']
 GCAM_14 = ['AND','AUT','BEL','CHI','DEU','DNK','ESP','FIN','FLK','FRA','FRO','GBR','GIB','GRC','GRL','IMN','IRL','ITA','LUX','MCO','NLD','PRT','SHN','SMR','SPM','SWE','TCA','VAT','VGB','WLF']
 GCAM_15 = ['BLR','MDA','UKR']
 GCAM_16 = ['ALB','BIH','HRV','MKD','MNE','SCG','SRB','TUR','YUG']
 GCAM_17 = ['CHE','ISL','LIE','NOR','SJM']
 GCAM_18 = ['IND']
 GCAM_19 = ['IDN']
 GCAM_20 = ['JPN']
 GCAM_21 = ['MEX']
 GCAM_22 = ['ARE','BHR','IRN','IRQ','ISR','JOR','KWT','LBN','OMN','PSE','QAT','SAU','SYR','YEM']
 GCAM_23 = ['PAK']
 GCAM_24 = ['RUS']
 GCAM_25 = ['ZAF']
 GCAM_26 = ['GUF','GUY','SUR','VEN']
 GCAM_27 = ['BOL','CHL','ECU','PER','PRY','URY']
 GCAM_28 = ['AFG','ASM','BGD','BTN','LAO','LKA','MDV','NPL']
 GCAM_29 = ['KOR']
 GCAM_30 = ['BRN','CCK','COK','CXR','FJI','FSM','GUM','KHM','KIR','MHL','MMR','MNP','MYS','MYT','NCL','NFK','NIU','NRU','PCI','PCN','PHL','PLW','PNG','PRK','PYF','SGP','SLB','SYC','THA','TKL','TLS','TON','TUV','VNM','VUT','WSM']
 GCAM_31 = ['TWN']
 GCAM_5 = ['ARG']
 GCAM_12 = ['COL']

 list_list = [GCAM_1,GCAM_2,GCAM_3,GCAM_4,GCAM_5,GCAM_6,GCAM_7,GCAM_8,GCAM_9,GCAM_10,GCAM_11,GCAM_12,GCAM_13,GCAM_14,GCAM_15,GCAM_16,GCAM_17,GCAM_18,GCAM_19,GCAM_20,GCAM_21,GCAM_22,GCAM_23,GCAM_24,GCAM_25,GCAM_26,GCAM_27,GCAM_28,GCAM_29,GCAM_30,GCAM_31,GCAM_32]
 m.readshapefile('ne_110m_admin_0_countries_lakes/ne_110m_admin_0_countries_lakes','comarques')
 num = len(list_list)
 for info, shape in zip(m.comarques_info,m.comarques):
 for i in range(num):
 if info['adm0_a3'] in list_list[i]:
 patches1 = []
 patches1.append( Polygon(np.array(shape), True) )
 ax.add_collection(PatchCollection(patches1,facecolor=a[i,:],edgecolor='k',linewidths=1.,zorder=2))
 ax.set_title('Policy Cost',fontsize=25,y=1.01)#GDP Adjusted Policy Cost#Policy Cost#Policy Cost Reduction from Technology
 plt.annotate('%s'%year[idx],xy=(0.1,0.2),xytext=(0.1,0.2),xycoords='axes fraction',fontsize=30)
 cb = m.colorbar(colors,'right')
 cb.ax.tick_params(labelsize=14)
 filename = "out/map_%s.png" %(str(idx).rjust(3,"0"))
 plt.show()
 fig.savefig(filename)
 return

The function’s name is world_plot and it’s inputs are:

  1. The raw data for a specific time step.
  2. The index of the time step for the map we are working with (e.g. idx=0 for 2020).
  3. The minimum and maximum of the data across countries and time.

(1) is plotted, (2) is used to name the resulting png figure (line 73), and (3) is used to scale the color colormap (line 11).  On lines 2-8 we import the necessary Python packages, which in this case are pretty standard Matplotlib packages and numpy.  On lines 11-16 we generate a numpy array which contains the rgba color code for each of the data points in data.  In lines 18-19 we create the pyplot figure object.

On lines 21-24 we create and format the Basemap object.  Note that on line 21 I’ve selected the Robinson projection, but that the Basemap provides many options.

Lines 26-60 are specific for this application, and certainly could have been handled more compactly if I wanted to invest the time.  year is a list of time steps for our GCAM experiment, and lines 27-58 contain lists of three letter ID codes for each GCAM region, which are assembled into a list of lists (creatively called list_list) on line 60.

On line 61 we read the data from the shapefile database which was downloaded from Natural Earth. From lines 63-68 we loop through the info and shape attributes of the shapefile database, and determine which of the GCAM geo-political units each of the administrative units in the database is associated with.  Once this is determined, the polygon associated with that administrative unit is given the correct color (lines 66-68).

Lines 69-72 are doing some final formatting and labeling, and in lines 73-75 we are giving the file a unique name (tied to the time step plotted) and saving the images to some output directory.

When we put this function into a loop over time, we generate a sequence of figures looking something like this:

test_017

To convert this sequence of PNGs to a gif file, we use the convert command in linux (or in my case Cygwin).  So, we go to the command line and cd into the directory where we’ve saved our figures and type:

convert -delay 45 -loop 0 *.png globe_Cost_Reduction_faster.gif

Here the delay flag controls the framerate of the gif (in milliseconds), the loop flag controls whether the gif repeats, next I’m using a wildcat to include all of the pngs in the output directory, and the final input is the resulting name of the gif. The final product:

globe_GDP_Cost_Low_faster

 

Porting GCAM onto a UNIX cluster

Hi All,

One of my recent research tasks has been to port GCAM (Global Change Assessment Model) on to the Cube, our groups HPC cluster, and some of the TACC resources.  This turned out to be more challenging than I had anticipated, so I wanted to share my experiences in the hopes that it will save you all some time in the future.  This post might be a bit pedestrian for some readers, but I hope it’s helpful to folks that are new to C++ and Linux.

Before getting started, some background on GCAM.  GCAM was developed by researches at the Joint Global Change Research Institute (JGCRI) at the Pacific Northwest National Lab (PNNL).  GCAM is an integrated assesment model (IAM), which means it pairs a climate model with a global economic model.  GCAM is unique from many IAMs in that it has a fairly sophisticated representation of various sectors of the global economy modeled on a regional level…meaning the model is a beast compared to other IAMs (like DICE). I’ll be writing a later post on IAMs more generally.  GCAM is coded in C++ and makes extensive use of xml databases for both input and output.  The code is open source (available here), has a Wiki (here), and a community listserve where researches can pose questions to their peers.

There are three flavors of the model available: one for Windows users, one for Mac users, and one for Unix users.  The Windows version comes compiled, with a nice user-interface for post-processing the results.  The Unix version comes as uncompiled code, and requires the installation of some third party C++ libraries.  For those who’d like to sniff around GCAM the Windows or Mac versions are a good starting point, but if you’re going to be doing HPC heavy lifting, you’ll need to work with the Unix code.

The GCAM Wiki and the detailed user manual provide excellent documentation about running GCAM the Model Interface tools, but are a bit limited when describing how to compile the Unix version of the code.  One helpful page on the Wiki can be found here.  GCAM comes with a version of the Boost C++ library in the standard Unix download (located in Main_User_Workspace/libs).  GCAM also uses the Berkeley DBXML library, which can be downloaded here.  You’ll want to download version 2.5.16 to be sure it runs well with GCAM.

Once you’ve downloaded DBXML, you’ll need to build the library.  As it turns out, this is fairly easy.  Once you’ve ported the DBXML library onto your Unix cluster, simply navigate into the main directory and run the script buildall.sh.  The buildall.sh script accepts flags that allow you to customize your build.  We’re going to use the -b flag to build the 64-bit version of DBXML:

sh buildall.sh -b 64

once you’ve build the DBXML library, you are nearly ready to build GCAM, but must first export the paths to the Boost library and to the DBXML include and lib directories.  It seems that the full paths must be declared.  In other words, they should read something like:

export BOOST_INCLUDE=/home/fs02/pmr82_0001/jrl276/GCAM_Haewon_trans/libs/boost_1_43_0/
export DBXML_INCLUDE=/home/fs02/pmr82_0001/jrl276/dbxml-2.5.16/install/include/
export DBXML_LIB=/home/fs02/pmr82_0001/jrl276/dbxml-2.5.16/install/lib/

One tricky point here is that the full path must be typed out instead of using ‘~’ as a short-cut (I’m not sure why). Once this is done, you are ready to compile gcam.  Navigate into the ‘Main_User_Workspace’ directory and type the command:


make gcam

Compiling GCAM takes several minutes, but at the end there should be a gcam.exe executable file located in the ‘/exe/’  folder of the ‘Main_User_Workspace’.  It should be noted that there is a multi-thread version of gcam (more info here), which I’ll be compiling in the next few days.  If there are any tricks to that process, I’ll post again with some tips.

That’s it for now.

Useful Linux commands to handle text files and speed up work

Most of us, nice and normal human beings, tend to prefer programs with GUIs over typing commands on a command prompt because the former looks more “real” and is more visual than the latter. However, one thing we don’t know (or, sometimes, don’t want to know) is that learning a few terminal commands can dramatically increase productivity. These commands can save us a lot of time by sparing us from opening and closing programs, navigating through menus and windows, moving the mouse around, as well as moving the hand back and forth from the mouse to the keyboard.

This post will mention and briefly describe some useful “intermediate level” Linux commands (some basic commands are presented in this post by Jon Herman), which can be called from a Linux OS, Cygwin (mostly), or Mac. Among the numerous tedious tasks these commands can greatly simplify is the particularly interesting chore of handling text files, be they scripts or data files. Commands for other tasks are covered as well. Keep in mind that the symbol * is a wild card (character that can mean any string of characters when searching for files), which is really useful when the goal is to repeatedly apply one command to multiple files. For all commands listed here skip the “$” character.

DELIMITER SEPARATED FILES HANDLING

  • Remove columns 30 to 36 (starting from 0) from a comma separated file and export the output to another file.
    $ cut -d',' -f1-30,36 input.file >> output.file

    (More information on the post by Joe Kasprzyk)

  • Print only columns 2 and 4 (starting from 1) of a comma separated file.
    $ awk -F "," '{print $2,$4}' input.file >> output.file
  • Count number of columns in a file separated either by spaces or commas:
    $ head -1 input.file | sed 's/[^, ]//g' | wc -c
    or:
    $ awk -F "[, ]" 'END{print NF}' input.file
  • Print lines of a comma separated file in which the value in the 2nd column is lower than 100 and the value in the 5th column is higher than 0.3:
    $ awk -F "," '$2<100 && $5>0.3' input.file >> output.file
  • Print lines between 10th and 20th lines (not inclusive) of a file:
    $ awk 'NR>10 && NR<20' input.file >> output.file
  • Add a string to the end of multiple files:
    $ echo "your string" | tee -a *.set
  • Add a string to the end of one file:
    $ echo "your string" >> file

FILE SEARCHING

  • Find all text files in a folder that contain a certain string:
    $ grep -rn './folder' -e your_string
  • Find files recursively (* is a wildcard):
    $ find -type f -name name_of*.file

FILES INFO

  • See the contents of a zip/tar file without extracting it. Press q to quit.
    $ less file.tar
  • Count number of lines in a file:
    $ wc -l your.file
  • List all files with a certain extension in a directory:
    $ ls *.ext
  • Print files and folders in tree fashion:
    $ tree
  • Print the size of all subfolders and files in (sub)folders to a certain max depth in the folder hierarchy:
    $ du -h -a --max-depth=2

IMAGE HANDLING

  • Convert svg files to png (you need to have Inkscape installed):
    $ inkscape input.svg -d 300 -e output.png
  • Convert svg files to pdf-latex (you need to have Inkscape installed):
    $ inkscape input.svg --export-pdf output.pdf --export-latex
  • Rotate a picture:
    $ convert Fig6_prim.png -rotate 90 Fig6_prim_rotated.png

MISCELLANEOUS

  • See the history of commands you have typed:
    $ history
  • See a calendar (month and year optional):
    $ cal [month] [year]
  • Combine pdf files into one (you need to have pdftk installed):
    $ pdftk file1.pdf file2.pdf file3.pdf cat output newfile.pdf
    or, to merge all pdf files in a directory:
    $ pdftk *.pdf cat output newfile.pdf

    In order to see how to combine only certain pagers of pdf files, as well as how to splits all pages into separate pdf files, see this page.

  • See the manual of a command:
    $ man command

Another useful idea is that of piping outputs of a command to another command. For example, if you want print the number of files in a directory, you can pipe the output of the ls command (list all files in a directory) to the wc -l command (count the number of lines in a string). For this, use the “|” character:

$ ls | wc -l

However, you may want instead to check the number of lines in all the 20 files in a directory at once, which can also be achieved by combining the ls and wc commands with the command xargs. The command then would look like:

$ ls | xargs wc -l

The command xargs breaks down the output from ls into one string per line and then calls wc -l for each line of the output of ls.

Hope all this saves you some time!

Re-evaluating solutions using Python subprocesses

Our series of Python examples continues!  If you’ve missed our previous posts, we have some tutorials in parts one and two, tips on setting up Python and Eclipse, cluster submission guides, and so forth.

Matt has been helping me get up to speed using Python.  He always told me that some processes are a lot easier in Python than in C++, and I suppose I didn’t believe him til recently.  Here is my first shot at a Python program that’s all my own, and I wanted to share it with you here.  The code is below, then some comments follow.  The group has recently begun several GitHub code repositories!  You can link to this code sample at my repository.

import re
import os
import sys
import time
from subprocess import Popen
from subprocess import PIPE

LINE_BUFFERED = 1
        
def main():

    #Define the command that you would like to run through the pipe.  This will typically be your
    #executable set up to work with MOEAframework, and associated arguments.  Specifically here
    #we are working with the LRGV problem.
    cmd = ['./lrgvForMOEAFramework',  '-m', 'std-io', '-c', 'combined', '-b', 'AllDecAll']

    #Verify the command
    print "The command to run is: %s" % cmd

    #Use popen to open a child process.  The standard in, out, and error are sent through a pipe.
    child = Popen(cmd, cwd='//home//joka0958//re-evaluator_2013-04-05//', bufsize=LINE_BUFFERED, stdin=PIPE, stdout=PIPE, stderr=PIPE)

    #The current version of the model spits out some lines to the console when it initializes them.
    #When using this python child process, we need to intentionally send and receive all output (i.e. it doesn't
    #automatically do it for us.  Here there are 3 initialization lines to catch:
    print "Reading initializer lines."
    for i in range(0,3):
       line = child.stdout.readline()
       if line:
         print line
       else:
         raise Exception("Evaluator died!")

    #Now we want to step through an existing Borg output file, which already contains decision variables and objectives.
    #We are going to step through each line, read in the decision variables from the file, and then evaluate those decision
    #variables in our external program.
    myFilename = "AllDecAllExperimentData.txt"
    fp = open(myFilename, 'rb')
    for line in fp:
        if "#" in line:
            #This "if" statement is helpful if you want to preserve the generation separators and header lines that appeared
            #in the original file.
            print line
        else:
            #Read in all the variables on the line (this is both decisions and objectives)
            allVariables = [float(xx) for xx in re.split("[ ,\t]", line.strip())]

            #Only keep what you want
            variables = allVariables[0:8]

            #We want to send the decision variables, separated by a space, and terminated by a newline character
            decvarsAsString = '%f %f %f %f %f %f %f %f\n' % (variables[0], variables[1], variables[2], variables[3], variables[4], variables[5], variables[6], variables[7])

            #We send that string to the child process and catch the result.
            print "Sending to process"
            child.stdin.write(decvarsAsString)
            child.stdin.flush() #you flush, so that the program knows the line was sent

            #Now obtain the program's result
            print "Result:"
            outputLine = child.stdout.readline()
            print outputLine

            #Since this is in a loop, it will operate for every line in the input file.

if __name__ == "__main__":
    main()

Basically, your child process gets created in the beginning of this script. It is hanging out, waiting for input. The cool thing about the way input and output is handled here, is that you have complete control over what gets sent to the child program and what gets read in from it. Also, you can send multiple solutions to the program, and read their output sequentially.

Python also gives you lots of great control over input and output formatting in a relatively simple fashion. Notice how we are moving back and forth from strings and floats effortlessly. I also like how the control sequences (for loops, if else statements) are very straightforward and easy to read.

Feel free to adapt this code for your own purposes, and provide comments below!

Using linux “split”

Today I’d like to quickly talk about the linux command “split”.  I like writing about new simple Linux commands as evidenced here and here.

I often write customized C++ scripts to manipulate large data files.  There’s obviously a time and place for this, since you get ultimate control on every aspect of how your data looks going in and coming out.  We’ve written about this before, and I think string processing is an important skill no matter what language.  There’s a post about matlab (and another one here), some sample bash scripting, and a post about python among other things.  You should also see Matt’s series on python data analysis, since I’m doing some shameless plugging!

Anyway… little did I know that something very complicated in C++ can be easily done in linux/unix with “split”!

To split a large file into smaller files with, say, 100 lines, you use: “split -l 100 myLargerFile.txt”  There are also commands to change the filenames of the output files, and so forth.

Read the man page for split, and check out forum posts here and here to get on your way!

grep allows you to find an expression in one or more files in a folder on Linux.  I find it useful for programming.  Say, for example, I want to look for the string “nrec” in a set of source code and header files.  Maybe “nrec” is a variable and I forgot where I declared it (if this sounds a little too specific to be merely an example, you’re right. This is what I’m having to do right this second!).  The grep command is:

grep -in “nrec” *.*

What this means is, search for the “nrec” expression in every file in the folder.  There are two useful flags set here as well.  “i” means that the search is case insensitive (that is, NREC and NrEc and nrec are each treated as equal).  “n” means that the program will show me the line number of each occurrence of my desired phrase.  There are other options that I’m not using, including “inverting” a search to find all occurrences of NOT that phrase, suppressing the file name or only showing the file name, etc.

If you were curious, here’s a sample of the output:

iras.h:144: int num_flow_datapoints; //originally: NRec
SimSysClass.cpp:806: flowrecs=sysstat(nrec)

(If you’re curious, the first instance is in a header file, on line 144.  I’m translating this code from one language to another, and originally the variable was called “nrec”. So in the header file I made a note that now my variable is called something else.  In the second instance, I had copied the original code into my file as a placeholder, so now I know that I need to use my new name in its place.  Also, the “i” flag in grep is helpful since fortran is not case-sensitive, and here you can see there were two different case styles for this variable even in our simple example.)

For more info, please consult some casual reference such as this excellent post about linux command line utilities,  a similar blog post about grep, and of course the Linux man page for the command. Also look at 15 grep tips.  As usual, remember that “man [insert command here]” gives you all the low-down on each command you’d like to learn.

Thanks for reading and please comment with additional tips or questions!

Some ideas for your Bash submission scripts

I’ve been playing around with some design options for PBS submission scripts that may help people doing cluster work.  Some things to look for in the source code:

  • You can use a list in bash that contains multiple text entries, and then access those text entries to create strings for your submissions.  Note that you can actually display the text first (see the ‘echo ${PBS}’) before you do anything; that way you aren’t requesting thousands of jobs that have a typo in them!
  • Using “read” allows the bash programmer to interact with the user.  Well, in reality you are usually both the programmer and the user.  But lots of times, I want to write a script and try it out first, before I submit hundreds of hours of time on the cluster.  The flags below can help with that process.
  • I added commands to compile the source code before actually submitting the jobs.  Plus, by using flags and pauses intelligently, you can bail out of the script if there’s a problem with compilation.
#!/bin/bash
NODES=32
WALLHOURS=5

PROBLEMS=("ProblemA" "ProblemB")
NSEEDS=10
SEEDS=$(seq 1 ${NSEEDS}) #note there are multiple ways to declare lists and sequences in bash

NFES=1000000
echo "NFEs is ${NFES}" #echo statements can improve usability of the script, especially if you're modifying it a lot for various trials

ASSUMEPERMISSIONFLAG=No #This is for pausing the submission script later

echo "Compile? Y or N."

read COMPILEFLAG
if [ "$COMPILEFLAG" = "Y" ]; then
    echo "Cleaning.."
    make clean -f MakefileParallel
    echo "Compiling.."
    make -f MakefileParallel
else
        echo "Not compiling."
fi

for PROBINDEX in ${!PROBLEMS[*]}
do
    PROBLEM=${PROBLEMS[$PROBINDEX]} #note the syntax to pull a list member out here
    echo "Problem is ${PROBLEM}"

    for SEED in ${SEEDS}
    do
        NAME=${PROBLEM}_${SEED} #Bash is really nice for manipulating strings like this
        echo "Submitting: ${NAME}"

        #Here is the actual PBS command, with bash variables used in place of different experimental parameters.  Note the use of getopt-style command line parsing to pass different arguments into the myProgram executable.  This implementation is also designed for parallel processing, but it can also be used for serial jobs too.

        PBS="#PBS -l nodes=32\n\
        #PBS -N ${NAME}\n\
        #PBS -l walltime=05:00:00\n\
        #PBS -j oe\n\
        #PBS -o ${NAME}.out\n\
        cd \$PBS_O_WORKDIR\n\
        module load openmpi/intel\n\
        mpirun ./myProgram -b ${PROBLEM} -c combined -f ${NFES} -s ${SEED}"

        #The first echo shows the user what is about to be passed to PBS.  The second echo then pipes it to the command qsub, and actually submits the job.

        echo ${PBS}

        if [ "$ASSUMEPERMISSIONFLAG" = "No" ]; then

            echo "Continue submitting? Y or N."

            read SUBMITFLAG

            #Here, the code is designed to just keep going after the user says Y once.  You can redesign this for your own purposes.  Also note that this code is fairly brittle in that the user MUST say Y, not y or yes.  You can build that functionality into the if statements if you'd like it.

            if [ "$SUBMITFLAG" = "Y" ]; then
                 ASSUMEPERMISSIONFLAG=Yes #this way, the user won't be asked again
                 echo -e ${PBS} | qsub
                 sleep 0.5
                 echo "done."
            fi
        else
            echo -e ${PBS} | qsub
            sleep 0.5
            echo "done."
         fi
    done
done

Using Linux input redirection in a C++ code test

I’m testing a function that does a numerical calculation, namely the Cholesky matrix factorization.  I found a reference to algorithm that does this, but how do I test my code quickly, to input some data, perform the calculation, and then output some results?

It turns out Linux gives us a simple way to do this!  Using “input redirection”, we can quickly input some data from a text file, and process it as if we were inputting it directly through the keyboard.  First let’s look at some pseudocode.


#include <stdio.h>

#include <math.h>

#include <iostream>

#include <fstream>

#include <string>

using namespace std;

ofstream out; //for debugging

double myFun (double input1, double input2) //insert the function you want to test here

int main()

{

//Let's say you need to enter an n by n matrix with a text file, and you know n in advance. Then:

double input[52][52]; //you can use a vector of vectors, or a 2d C-style array if needed

int i = 0; int j = 0;

for (int k = 0; k < 52*52; k++)

{

cin >> input[i][j];

j++;

if (j == n)

{

j = 0; i++;

}

}

//call myFun to test.  Then, use cout or the output stream to output your results.

//to open the output stream, simply write out.open("myOutput.txt") and then use out << to output whatever you need.

return 0;

}

Now, if you have data in space or tab delimited format in a textfile, all you have to do is compile your program:

g++ myFile.cpp -o myExecutable

And then run it with input redirection:

./myExecutable < myData.txt

where the < operator feeds in the contents of myData.txt into the standard input of your program.  For more on input redirection read here, and feel free to comment!

Using linux “grep”

Today I’d like to quickly talk about the linux command “grep”.  I thought I wrote about this before, but I apparently forgot.

grep allows you to find an expression in one or more files in a folder on Linux.  I find it useful for programming.  Say, for example, I want to look for the string “nrec” in a set of source code and header files.  Maybe “nrec” is a variable and I forgot where I declared it (if this sounds a little too specific to be merely an example, you’re right. This is what I’m having to do right this second!).  The grep command is:

grep -in “nrec” *.*

What this means is, search for the “nrec” expression in every file in the folder.  There are two useful flags set here as well.  “i” means that the search is case insensitive (that is, NREC and NrEc and nrec are each treated as equal).  “n” means that the program will show me the line number of each occurrence of my desired phrase.  There are other options that I’m not using, including “inverting” a search to find all occurrences of NOT that phrase, suppressing the file name or only showing the file name, etc.

If you were curious, here’s a sample of the output:

iras.h:144: int num_flow_datapoints; //originally: NRec
SimSysClass.cpp:806: flowrecs=sysstat(nrec)

(If you’re curious, the first instance is in a header file, on line 144.  I’m translating this code from one language to another, and originally the variable was called “nrec”. So in the header file I made a note that now my variable is called something else.  In the second instance, I had copied the original code into my file as a placeholder, so now I know that I need to use my new name in its place.  Also, the “i” flag in grep is helpful since fortran is not case-sensitive, and here you can see there were two different case styles for this variable even in our simple example.)

For more info, please consult some casual reference such as this excellent post about linux command line utilities,  a similar blog post about grep, and of course the Linux man page for the command. Also look at 15 grep tips.  As usual, remember that “man [insert command here]” gives you all the low-down on each command you’d like to learn.

Thanks for reading and please comment with additional tips or questions!

Using gdb, and notes from the book “Beginning Linux Programming”

I just started thumbing through Beginning Linux Programming by Matthew and Stones.  It covers in great detail a lot of the issues we talk about on this blog often — how to debug code, how to program using the BASH shell, file input/output, different development environments, and making makefiles.

Using gdb

One tool I haven’t talked about much is the debugger, gdb.  It is a text-based debugging tool, but it gives you some powerful ways to step through a program.  You can set a breakpoint, and then make rules for what variables are being displayed in each iteration.

Let’s say you have a file, myCode.c that you compile into an executable, myCode.  Compile using the -g flag, and then start gdb on your code by entering “gdb ./myCode”.  If your code has command line arguments, you need to specify an argument to gdb like this:

gdb –args ./myCode -a myArgument1 -b myArgument2

The important phrase here is “–args”, two dashes and the word args, that appears after gdb.  That lets gdb know that your ./myCode program itself has arguments.

You can also set a breakpoint inside gdb (you’d need to do that before you actually run the code).  To do this, say at line 10, simply type “break 10”.  This will be breakpoint 1.  To create rules to display data at each breakpoint type “display”.  It will ask what commands you’d like… for example, to display 5 values of an array, the command is “display array[0]@5”, then “cont” to continue, and “end” to end.

After setting up your breakpoints, simply type “run” to run the code.

If your program has a segmentation fault, it will let you know what line the segmentation fault occurred at, and using “backtrace” you can see what functions called that line.

If you have a segfault and the program is halted, the nice thing is that all the memory is still valid and you can see the value of certain variables.  To see the value of variables say “print myVariableName”.  It is quite informative.  For example, if a variable has a “NAN” next to it, you know there may be something wrong with that variable, that could cause an error somewhere else.

Here’s one example of a possible problem in pseudocode:

levelA = 0;

levelB = 0;

myLevel = 0.5;

myFrac = myLevel / (levelA + levelB);

The fourth line there looks innocuous enough, but this will cause a “divide by zero” error given the levelA and levelB value.  In gdb, you may get a segfault on the fourth line, but a simple “print levelA” and “print levelB” will help you solve the problem.

Here’s a short link that explains the basics of gdb with more detail.

Other notes

Also interesting are several C preprocessor macros that can tell you what line, file, date, and time the code was compiled at.   Predictably, these are __LINE__ __FILE__ __DATE__ and __TIME__ (that’s two underscores for each).

I also like the bash scripting examples that are contained in the book.  They taught me about some Linux utilities like “cut” that are very helpful, and covered elsewhere on this blog.

Any additional tips and tricks are welcome in the comments!

Emacs in Cygwin

Are you having trouble with certain commands not working in Emacs under Cygwin (e.g., C-x C-c doesn’t exit the program), then try adding this line to Cygwin.bat before the ‘bash –login -i’ line:

set CYGWIN=tty notitle glob

Cygwin.bat is located in the root directory of your Cygwin installation.  That should do it!