PBS job chaining

Often times we want to run a large job (or a few smaller jobs) on an HPC system but we cannot make one large submission due to constraints in the maximum number of cores we can use at a time. One way around this issue is to submit smaller jobs, some of which will be sent to the queue. However, this often leads to using more cores than we can or should at a time due to the queue management system fitting as many of our smaller jobs as possible in the available cores, which may prevent other users from using the system for a long time or lead to higher system usage than allowed.

To avoid this, one solution is to submit one smaller job at a time, waiting until the job(s) that is(are) currently running to be finished. This solution requires constant queue monitoring and is only as efficient as we can monitor the queue.

A better solution is the use of job chaining (qsub -W depend), which automatically submits a new job after the currently running one is finished. Below is an example of how to use job chaining for three jobs with one submission script:

FIRST=$(qsub job1.sh)
echo $FIRST
SECOND=$(qsub -W depend=afterany:$FIRST job2.sh)
echo $SECOND
THIRD=$(qsub -W depend=afterany:$SECOND job3.sh)
echo $THIRD

In the example above, variables FIRST, SECOND and THIRD will receive the job ID, and line qsub -W depend=afterany:$FIRST job2.sh can be read as: submit job2.sh after job with ID $FIRST is finished with or without errors (afterany). It is also possible to set a job to be submitted only if the previous job ends with ok status or with an error.

Command Rule
after Execute current job after listed jobs have begun.
afterany Execute current job after job has terminated with any status.
afterok Execute current job after job has terminated without error.
afternotok Execute current job after job has terminated with an error.

With this, you should be able to not lock everyone else out of the HPC system while not having to spend a week checking on the status of runs every few hours.


One thought on “PBS job chaining

  1. Pingback: Water Programming Blog Guide (Part I) – Water Programming: A Collaborative Research Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s