This is part 2 in a series about using Python for automating cluster tasks. Part 1 is here. (For more on Python, check out: another tutorial part one and two, tips on setting up Python and Eclipse, and some specific examples including a cluster submission guide and a script that re-evaluates solutions using a different simulation model)
Edit: Added another example in the “copy” section below!
Welcome back! Let’s continue our discussion of basic Python commands. Let’s start by modifying our last code sample to facilitate random seed analysis. Now, instead of writing one file we will write 50 new files. This isn’t exactly how we’ll do the final product, but it will be helpful to introduce loops and some other string processing.
Loops and String Processing
import re import os import sys import time from subprocess import Popen from subprocess import PIPE def main(): #the input filename and filestream are handled outside of the loop. #but the output filename and filestream have to occur inside the loop now. inFilename = "borg.c" inStream = open(inFilename, 'rb') for mySeed in range(1,51): outFilename = "borgNew.seed" + str(mySeed) + ".c" outStream = open(outFilename, 'w') print "Working on seed %s" % str(mySeed) for line in inStream: if "int mySeed" in line: newString = " int mySeed = " + str(mySeed) + ";\n" outStream.write(newString) else: outStream.write(line) outStream.close() inStream.seek(0) #reset the input file so you can read it again if __name__ == "__main__": main()
Above, the range function allows us to iterate through a range of numbers. Note that the last member of the range is never included, so range(1,51) goes from 1 to 50. Also, now we have to be concerned with making sure our files are closed properly, and making sure that the input stream gets ‘reset’ every time. There may be a more efficient way to do this code, but sometimes it’s better to be more explicit to be sure that the code is doing exactly what you want it to. Also, if you had to rewrite multiple lines, it would be helpful to structure your loops the way I have them here.
By the way, after you run the sample program, you may want to do something like “rm BorgNew*” to remove all the files you just created.
Calling System Commands
Ok great, so now you can use Python to modify text files. What if you have to do something else in your workflow, such as copy files? Move them? Rename them? Call programs? Basically, you want your script to be able to do anything that you would do on the command line, or call system commands. For some background, check out this post on Stack Overflow, talking about the four or five different ways to call external commands in Python.
The code sample is below. Note that there’s two different ways to use the call command. Using “shell=True” allows you to have access to certain features of the shell such as the wildcard operator. But be careful with this! Accessing the shell directly can lead to problems as discussed here.
import re import os import sys import time from subprocess import Popen from subprocess import PIPE from subprocess import call def main(): print "Listing files..." call(["ls", "-l"]) print "Showing the current working directory..." call(["pwd"]) print "Now making ten copies of borg.c" for i in range(1,11): print "Working on file %s" % str(i) newFilename = "borgCopy." + str(i) + ".c" call(["cp", "borg.c", newFilename]) print "All done copying!" print "Here's proof we did it. Listing the directory..." call(["ls", "-l"]) print "What a mess. Let's clean up:" call("rm borgCopy*", shell=True) #the above is needed if you want to use a wildcard, see: #http://stackoverflow.com/questions/11025784/calling-rm-from-subprocess-using-wildcards-does-not-remove-the-files print "All done removing!" if __name__ == "__main__": main()
You may also remember that there are multiple ways to call the system. You can use subprocess to, in a sense, open a shell and call your favorite Linux commands… or you can use Python’s os library to do some of the tasks directly. Here’s an example of how to create some directories and then copy the files into the directory. Thanks to Amy for helping to write some of this code:
import os import shutil import subprocess print os.getcwd() global src src = "myFile.txt" #or whatever your file is called for i in range(51, 53); #remember this will only do it for 51 and 52 newFoldername = 'seed'+str(i) if not os.path.exists(newFoldername): os.mkdir(newFoldername) print "Listing files..." subprocess.call(["ls", "-l"]) shutil.copy(src, newFoldername) #now, we should change to the new directory to see if the #copy worked correctly os.chdir(newFoldername) subprocess.call(["ls", "-l"]) #make sure to change back os.chdir("..")
These two pieces of the puzzle should open up a lot of possibilities to you, as you’re setting up your jobs. Let us know if you want more by posting in the comments below!