How to schedule massively parallel jobs on clusters – some basic ways

Massively (or embarrassingly) parallel are processes that are either completely separate or can easily be made to be. This can be cases where tasks don’t need to pass information from one to another (they don’t share memory) and can be executed independently of another on whatever resources are available, for example, large Monte Carlo runs, each representing different sets of model parameters.

There isn’t any guidance on how to do this on the blog, besides an older post on how to do it using PBS, but most of our current resources use SLURM. So I am going to show two ways: a) using SLURM job arrays; and b) using the GNU parallel module. Both methods allow for tasks to be distributed across multiple cores and across multiple nodes. In terms of how it affects your workflow, the main difference between the two is that GNU parallel allows you to automatically resume/rerun a task that has failed, whereas using SLURM job arrays you have to resubmit the failed tasks manually.

Your first step using either method is to configure the function representing each task to be able to receive as arguments a task id. For example, if I would like to run my model over 100 parameter combinations, I would have to create my model function as function_that_executes_model(sample=i, [other_arguments]), where the sample number i would correspond to one of my parameter combinations and the respective task to be submitted.

For python, this function needs to be contained within a .py script which will be executing this function when called. Your .py script could look like this, using argparse to parse the function arguments but there are alternatives:

import argparse
import ...

other_arg1= 1
other_arg2= 'model'

def function_that_executes_model(sample=i, other_arg1, other_arg2):
    #do stuff pertaining to sample i

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='This function executes the model with a sample number')
    parser.add_argument('i', type=int,
                        help='sample number')
    args = parser.parse_args()

To submit this script (say you saved it as using SLURM job arrays:

#SBATCH --partition=compute   # change to your own cluster partition
#SBATCH --cpus-per-task       # number of cores for each task
#SBATCH -t 0:45:00            # max wallclock time
#SBATCH --array=1-100         # array of tasks to execute

module load python
srun python3 $SLURM_ARRAY_TASK_ID

This will submit 100 1-core jobs to the cluster’s scheduler, and in your queue they will be listed as JOB_ID-TASK_ID.

Alternatively, you can use your cluster’s GNU parallel module to submit this like so:

#SBATCH --partition=compute
#SBATCH --ntasks=100
#SBATCH --time=00:45:00

module load parallel
module load python

# This specifies the options used to run srun. The "-N1 -n1" options are
# used to allocates a single core to each task.
srun="srun --export=all --exclusive -N1 -n1"

# This specifies the options used to run GNU parallel:
#   -j is the number of tasks run simultaneously.

parallel="parallel -j $SLURM_NTASKS"

$parallel "$srun python3" ::: {1..100}

This will instead submit 1 100-core job where each core executes one task. GNU parallel also allows for several additional options that I find useful, like the use of a log to track task execution (--joblog runtask.log) and --resume which will identify the last unfinished task and resume from there the next time you submit this script.

2 thoughts on “How to schedule massively parallel jobs on clusters – some basic ways

  1. Pingback: Easy batch parallelization of code in any language using mpi4py – Water Programming: A Collaborative Research Blog

  2. Pingback: Easy batch parallelization of code in any language using mpi4py – Hydrogen Water

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s