Often times we want to run a large job (or a few smaller jobs) on an HPC system but we cannot make one large submission due to constraints in the maximum number of cores we can use at a time. One way around this issue is to submit smaller jobs, some of which will be sent to the queue. However, this often leads to using more cores than we can or should at a time due to the queue management system fitting as many of our smaller jobs as possible in the available cores, which may prevent other users from using the system for a long time or lead to higher system usage than allowed.
To avoid this, one solution is to submit one smaller job at a time, waiting until the job(s) that is(are) currently running to be finished. This solution requires constant queue monitoring and is only as efficient as we can monitor the queue.
A better solution is the use of job chaining (
qsub -W depend), which automatically submits a new job after the currently running one is finished. Below is an example of how to use job chaining for three jobs with one submission script:
#!/bin/bash FIRST=$(qsub job1.sh) echo $FIRST SECOND=$(qsub -W depend=afterany:$FIRST job2.sh) echo $SECOND THIRD=$(qsub -W depend=afterany:$SECOND job3.sh) echo $THIRD
In the example above, variables FIRST, SECOND and THIRD will receive the job ID, and line
qsub -W depend=afterany:$FIRST job2.sh can be read as: submit job2.sh after job with ID $FIRST is finished with or without errors (
afterany). It is also possible to set a job to be submitted only if the previous job ends with ok status or with an error.
|after||Execute current job after listed jobs have begun.|
|afterany||Execute current job after job has terminated with any status.|
|afterok||Execute current job after job has terminated without error.|
|afternotok||Execute current job after job has terminated with an error.|
With this, you should be able to not lock everyone else out of the HPC system while not having to spend a week checking on the status of runs every few hours.