Job dependency in PBS submission

In one of our current projects, we’re running a bunch of batch jobs simultaneously. When a single job enters the queue, everything is fine. But when 15-20 of our jobs enter the queue at the same time, the filesystem slows to a crawl, and the jobs wind up exceeding the walltime.

So what we’d like to do is ensure that only a certain number of these jobs can run simultaneously. The brute force solution is to sit there and stare at the job queue, but that’s not very appealing. Instead we can use PBS job dependencies. (This is very helpful, and I can’t believe I’m just learning about it now).

Job dependency works like this:

qsub -N job_name -W depend=afterok:other_job_id my_job_script.sh

The -W argument to qsub just means “other options”. There are a number of different dependency options you can give; see this discussion for a more complete list. Here I’m using afterok, which means “only run after this other job has completed without errors”. In addition to solving our main problem, this also ensures that we won’t keep running the entire set of submissions if there’s an error in one of the jobs. Note that you can append multiple job ids to the afterok option, like this:

qsub -N job_name -W depend=afterok:job1:job2:job3 my_job_script.sh

Here is an example of submitting 200 separate jobs in a loop, where I only want to run a maximum of 7 at a time. The jobs are divided into “globs” (my non-technical term) where each glob will not start running until the previous glob has completed without errors.

Caveat emptor: I am still in the process of testing this. But regardless of this particular implementation, it’s a cool idea that should be useful in the future.

Advertisements

One thought on “Job dependency in PBS submission

  1. Pingback: Water Programming Blog Guide (Part I) – Water Programming: A Collaborative Research Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s