Python for automating cluster tasks: Part 1, Getting started

Yet another post in our discussion of Python.  (To see more, check out: a tutorial part one and two, tips on setting up Python and Eclipse, and some specific examples including a cluster submission guide and a script that re-evaluates solutions using a different simulation model)

If you’re just getting into using MOEAs and simulation models together, you may have spent some time getting familiar with how to get the two technologies to “talk” to one another, and you may have solved a problem or two.  But now you may realize, well there’s more to the process than just running a MOEA once.  The following are some reasons why you may need to set up a “batch” submission of multiple MOEA runs:

1. Random Seed Analysis A MOEA is affected by the random number generator used to generate initial random solutions, generate random or synthetic input data, and perform variation operations to develop new solutions. Typically, a random seed analysis is used to test how robust the algorithm is to different sets of random numbers.

2. Diagnostic Analysis of Algorithm Performance MOEAs have parameters (population size, run duration, crossover probability, etc.).  We’ve discussed methods to evaluate how well the algorithm does across a wide array of different parameter values (see, for example, this post).

3. Running Multiple Problem Formulations Perhaps you want to compare a 2 objective problem with a 10 objective problem.  Or, you use different simulation models (a screening level model, a metamodel, all the way to a physics-based process model).  All of these changes would require running the MOEA more than once.

4. Learning More about your Problem Even if you’re not trying 1-3, you may just need to run the MOEA for a problem that you’re still developing.  You make an educated guess about the best set of objectives, decisions, and constraints, but by analyzing the results you could see that this needs to change.  We use the term a posteriori to describe the process, because you don’t specify preferences between objectives, etc until after you generate alternatives.  So it’s an interactive process after you start running the algorithm.

The manual approach

For this illustration, here are some assumptions on what you’re up to.  First, you are using the Borg MOEA or MOEAframework, with a simulation model (either the source code, or you’ve written a wrapper to call it).  You’ve set up one trial of the simulation-optimization process, maybe using some of the resources on this blog!  You are using cluster computing and you have a terminal connection to the cluster all set up.  And, python is installed on the cluster.

A typical workflow might look something like this.  (Thanks to my student Amy Piscopo for being the “guinea pig” here).  Some of these steps are kind of complicated, you may or may not have them.

1. Change the simulation or optimization source code.  Typically we set up programming so you don’t need to actually change the code and re-compile if you’re making a simple change.  For example, if there’s a really important parameter in your simulation code, you can pass it in as a command line argument.  But, sometimes you can’t avoid this, so step one is that you might have to go into a text editor and modify code.  (Example: “On line 2268, change “int seed = 1” to “int seed = 2”.

2. Compile the simulation or optimization source code. Any changes in source code must be compiled before you run the program.   (Example: Write a command that compiles the files directly, such as in the Borg README, or use a makefile and the command “make”)

3. Modify a submission script. You thought your experiment was going to take 8 hours, and it really takes 24.  Oops.  Typically, when you create a “submission script” for the cluster, you need to tell it a few things about your job: what queue you want, how long to run, how many processors to request, and then the specific commands you need to run.  Another thing to consider with multiple runs is that the command that you are specifying may actually change. (Example: Changing the submission script to say “borg.exe -s 2” instead of “borg.exe -s 1”)

4. Make multiple run folders, and copy all files into each folder. Ok, so you’ve made the necessary modifications for, say, “Seed 1” and “Seed 2”.  Or, “Problem 1” and “Problem 2”.  Now, you need to gather up all the files for your particular run and put them into their own folder.  I usually recommend that you use a different folder for different runs, even if there are only a few files in the folder.  It just helps keep things organized. (Example: Each seed has its own folder)

5. Submit the jobs! Whew.  Too much work.  After clicking and typing and hitting enter and clicking and drag and dropping, you’re finally ready to hit go.

If you’re exhausted by that list, I don’t blame you — so am I!  This is a lot of manual steps, and the worst part is that the process won’t change hardly at all at seed 1, vs seed 50.  Plus, if you do all this by hand, you are more likely to make a mistake (“Did I actually change seed 2 to seed 3?  Oh gosh I don’t know”).  The other thing that’s annoying about it is that you may need to make yet another change in the process later on.  Instead of changing the seed, now you have to change some parameter!

Well, there’s another way…

Instead, use python!

That’s the point of this whole post.  Let’s get started.

Learning Python syntax and starting your first script

How do you quickly get started learning this new language?  First, log into the cluster and type “python”.  You may need to load a module, consult the documentation for your own cluster.  You’ll see a prompt that looks like this: “>>>”  This is the python interpreter.  You can type any command you want, and it will get executed right there, kind of like Matlab.  So now you’re running Python!  Saying it’s too hard to get started is no excuse. 🙂

Then, open the official Python tutorial, or a third party tutorial on the internet.  I noticed the third party one has a Python interpreter right in the browser.  Anyway, any time you see a new  command, look it up in the tutorial!  After a while you’ll be a Python expert.

One more comment in the “getting started” section.  In a previous post the beginning of the script looked like this.  The import commands load packages that have commands that you need.  Any time you learn a new command, make sure you don’t need to also include a new import command at the beginning of the script.

import re
import os
import sys
import time
from subprocess import Popen
from subprocess import PIPE

And then, the main function is defined a little differently than you may be used to in Fortran or C.  Something like this:

def main():

    #a bunch of code goes here

if __name__ == "__main__":
    main()

The default function, (“if __name__”) is a convention in Python.  So just set up your script in a similar way and you can use it like you were programming in C or another language.

Make sure to pay attention to what the tutorial says about code indenting.  Spaces/tabs/indents are very important in Python and they have to be done correctly in order for the code to work.

Ok now that you’re a verified Python expert let’s talk about how to do some of the basic functions that you need to know:

Modifying text files (or, changing one little number in 3000 lines of code)

Here’s your first Python program. It takes a file called borg.c, which, somewhere inside of which, contains the line “int mySeed = 1”. It then changes it to a number you specify in your Python program (in this case, 2). It does this by reading in every line of borg.c and spitting it into a new file, but when it gets to the mySeed line, it rewrites it. Note! Sometimes the spacing comes out weird on the blog. Be sure to follow correct Python spacing convention when writing your own code.

import re
import os
import sys
import time
from subprocess import Popen
from subprocess import PIPE
def main():
    inFilename = "borg.c"
    outFilename = "borgNew.c"
    seed = 2
    inStream = open(inFilename, 'rb')
    outStream = open(outFilename, 'w')
    for line in inStream:
        if "int mySeed" in line:
            newString = " int mySeed = " + str(seed) + ";\n"
            outStream.write(newString)
    else:
        outStream.write(line)
if __name__ == "__main__":
    main()

We’ve hit the ground running!  The above code sample shows how to write a file, how to write a loop, and then how to actually modify lines of text.  It also shows that ‘#’ indicates whether the line is a comment.  Remember that the indentation will tell Python whether you’re inside one loop, or an if statement, or what have you.  The indenting scheme makes the code easy to read!

If you save this code in a file (say, myPython.py), all you have to do to run the program is type “python myPython.py” at the command line.  No news is good news, if the program runs, it worked!

Hopefully this has given you a taste for what you can do easily, with scripting.  Yes, you could’ve opened a text editor and changed one line pretty easily.  But could you do it easily 1000 times?  Not without some effort.  One easy modification here is that you could assign the ‘seed’ variable in a loop, and change the file multiple times, or create multiple files.

Next time… We’ll talk about how to call system commands within Python.  So, in addition to changing text, we’ll be able to copy files, delete files, even submit jobs to the cluster.  All within our script! Feel free to adapt any code samples here for your own purposes, and provide comments below!

Advertisements

3 thoughts on “Python for automating cluster tasks: Part 1, Getting started

  1. Another way to do this is with sed:

    sed -e “/int mySeed/c\\\\tint mySeed = 2;” borg.c > borgNew.c

    (edit: need more backslashes because of the shell)

  2. Pingback: Python for automating cluster tasks: Part 2, More advanced commands | Water Programming: A Collaborative Research Blog

  3. Pingback: Water Programming Blog Guide (Part I) – Water Programming: A Collaborative Research Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s