Python for automating cluster tasks: Part 2, More advanced commands

This is part 2 in a series about using Python for automating cluster tasks.  Part 1 is here. (For more on Python, check out: another tutorial part one and two, tips on setting up Python and Eclipse, and some specific examples including a cluster submission guide and a script that re-evaluates solutions using a different simulation model)

Edit: Added another example in the “copy” section below!

Welcome back!  Let’s continue our discussion of basic Python commands.  Let’s start by modifying our last code sample to facilitate random seed analysis.  Now, instead of writing one file we will write 50 new files.  This isn’t exactly how we’ll do the final product, but it will be helpful to introduce loops and some other string processing.

Loops and String Processing

import re
import os
import sys
import time
from subprocess import Popen
from subprocess import PIPE

def main():
    #the input filename and filestream are handled outside of the loop.
    #but the output filename and filestream have to occur inside the loop now.
    inFilename = "borg.c"
    inStream = open(inFilename, 'rb')

    for mySeed in range(1,51):
        outFilename = "borgNew.seed" + str(mySeed) + ".c"
        outStream = open(outFilename, 'w')

        print "Working on seed %s" % str(mySeed)

        for line in inStream:
            if "int mySeed" in line:
                newString = " int mySeed = " + str(mySeed) + ";\n"
                outStream.write(newString)
            else:
                outStream.write(line)
        outStream.close()
        inStream.seek(0) #reset the input file so you can read it again

if __name__ == "__main__":
    main()

Above, the range function allows us to iterate through a range of numbers. Note that the last member of the range is never included, so range(1,51) goes from 1 to 50. Also, now we have to be concerned with making sure our files are closed properly, and making sure that the input stream gets ‘reset’ every time. There may be a more efficient way to do this code, but sometimes it’s better to be more explicit to be sure that the code is doing exactly what you want it to. Also, if you had to rewrite multiple lines, it would be helpful to structure your loops the way I have them here.

By the way, after you run the sample program, you may want to do something like “rm BorgNew*” to remove all the files you just created.

Calling System Commands

Ok great, so now you can use Python to modify text files. What if you have to do something else in your workflow, such as copy files? Move them? Rename them? Call programs? Basically, you want your script to be able to do anything that you would do on the command line, or call system commands. For some background, check out this post on Stack Overflow, talking about the four or five different ways to call external commands in Python.

The code sample is below. Note that there’s two different ways to use the call command. Using “shell=True” allows you to have access to certain features of the shell such as the wildcard operator. But be careful with this! Accessing the shell directly can lead to problems as discussed here.

import re
import os
import sys
import time
from subprocess import Popen
from subprocess import PIPE
from subprocess import call

def main():

    print "Listing files..."
    call(["ls", "-l"])

    print "Showing the current working directory..."
    call(["pwd"])

    print "Now making ten copies of borg.c"
    for i in range(1,11):
        print "Working on file %s" % str(i)
        newFilename = "borgCopy." + str(i) + ".c"
        call(["cp", "borg.c", newFilename])
    print "All done copying!"

    print "Here's proof we did it.  Listing the directory..."
    call(["ls", "-l"])

    print "What a mess.  Let's clean up:"
    call("rm borgCopy*", shell=True)
    #the above is needed if you want to use a wildcard, see:
    #http://stackoverflow.com/questions/11025784/calling-rm-from-subprocess-using-wildcards-does-not-remove-the-files

    print "All done removing!"

if __name__ == "__main__":
    main()

You may also remember that there are multiple ways to call the system. You can use subprocess to, in a sense, open a shell and call your favorite Linux commands… or you can use Python’s os library to do some of the tasks directly. Here’s an example of how to create some directories and then copy the files into the directory. Thanks to Amy for helping to write some of this code:

import os
import shutil
import subprocess

print os.getcwd()
global src
src = "myFile.txt" #or whatever your file is called

for i in range(51, 53); #remember this will only do it for 51 and 52
   newFoldername = 'seed'+str(i)
   if not os.path.exists(newFoldername):
      os.mkdir(newFoldername)
   print "Listing files..."
   subprocess.call(["ls", "-l"])
   shutil.copy(src, newFoldername)
   #now, we should change to the new directory to see if the
   #copy worked correctly
   os.chdir(newFoldername)
   subprocess.call(["ls", "-l"])
   #make sure to change back
   os.chdir("..")

Conclusion

These two pieces of the puzzle should open up a lot of possibilities to you, as you’re setting up your jobs. Let us know if you want more by posting in the comments below!

Advertisements

One thought on “Python for automating cluster tasks: Part 2, More advanced commands

  1. Pingback: Water Programming Blog Guide (Part I) – Water Programming: A Collaborative Research Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s