PBS is a handy cluster scheduling software, usually wrapped around a grid manager like MOAB. It’s useful in that you can submit options to the command line, or using a batch script. Arguments placed on the command line when calling the qsub command will take precedent over those in the script, so a general script may be built and then tested or varied by varying the options on the command line. PSU has a pretty good guide to get you started using the PBS system, and can be read here. However, there are some other options which are exceptionally useful for the moderate user. In particular, the ability to pass the current environment, set up email notification, and redirect output are handy things to be able to use and modify. An example script and header are presented below:
——————————————————————————————————————————–
#!/bin/csh #Using the C-Shell
#PBS -N SimpleScript #Give the job this name.
#PBS -M youremailhere@psu.edu #A single user, send notification emails there.
#PBS -m a #Send notification of aborts <a>.
#PBS -V #Pass the current environment variables to the job.
#PBS -l nodes=1:ppn=1 #Request a single node, single core for this job.
#PBS -l walltime=96:00:00 #Request a maximum wall time of 96 hours [HH:MM:SS format].
#PBS -o output/$PBS_JOBNAME.out #Redirect STDOUT to ./output/$PBS_JOBNAME.out
#PBS -e error/$PBS_JOBNAME.err #Redirect STDERR to ./output/$PBS_JOBNAME.err
env #Echo the environment (variables)
cd $PBS_O_WORKDIR #PBS starts your job in your home directory, cd to the submit/work directory
echo -n CWD:; /bin/pwd #Echo the current working directory path
echo PBS_JOBNAME is live… #Print to STDOUT (really, the file declared above) the job is live…
sleep 30 #Sleep for 30 seconds, then exit.
——————————————————————————————————————————–
In this case, I’ve configured the job to be named “SimpleScript,” to email the user “youremailhere@psu.edu” if the job aborts, to use the same environment as the one that the qsub command was issued from, requests 1 node and 1 processor on that node, a maximum run time of 96 hours, and to redirect the error/output messages to separate directories under my working directory. Clearly this is a very simple example, given that it prints some basic info, pauses and exits. If you were going to run a process or other program, you’d put your commands in place of the sleep command. However, it provides a cut/copy example of commonly used options that you can include in your own batch scripts. In case you want to modify those options, there’s a brief review of the most commonly changed ones below. For a more complete list, head to the NCCS’ listing on common PBS options:
http://www.nccs.gov/computing-resources/phoenix/running-jobs/common-pbs-options/
Commonly Used Options:
These options can either be present on the command line a-la:
qsub -N SimpleScript -j oe <batchScriptFile>
Or included in the batch script file using the PBS Flagging macro: #PBS as in:
#PBS -N SimpleScript
Recall, that you can mix and match options on the command line and in the batch script, but be aware that the command line options override those in the batch file.
[-N] Name: Declares the name of the job. It may be up to 15 characters in length, and must consist of printable, non white space characters with the first character alphabetic.
[-o] Output File Path: Defines the path to, and name of, the file to which STDOUT will get redirected to.
[-e] Error File Path: Defines the path to, and name of, the file to which STDERR will get redirected to.
[-j] Join STD* streams: Declares if the standard error stream of the job will be merged with the standard output stream of the job.
An option argument value of oe directs that the two streams will be merged, intermixed, as standard output. The path and name of the file can then be specified with the -o option.
An option argument value of eo directs that the two streams will be merged, intermixed, as standard error. The path and name of the file can then be specified with the -e option.
If the join argument is n or the option is not specified, the two streams will be two separate files.
[-V] Pass Environment: This option declares that all environment variables in the qsub command’s environment are to be exported to the batch job.
[-m] – mail options: Defines the set of conditions under which the execution server will send a mail message about the job. The mail_options argument is a string which consists of either the single character “n”, or one or more of the characters “a”, “b”, and “e”.
If the character “n” is specified, no mail will be sent.
For the letters “a”, “b”, and “e”:
- a mail is sent when the job is aborted by the batch system.
- b mail is sent when the job begins execution.
- e mail is sent when the job terminates.
If the -m option is not specified, mail will be sent if the job is aborted.
[-M] User List: A list of users to send email notifications to. The user_list argument is of the form:
user[@host][,user[@host],…]
If unset, the list defaults to the submitting user at the qsub host, i.e. the job owner.
[-l , ‘ell’] – resource_list: Defines resources required by the job and establishes a limit to the amount of resource that can be consumed. The list can be of the form:
resource_name[=[value]][,resource_name[=[value]],…]
Common arguments for this flag option are “walltime” and “nodes”. The walltime sets the wall clock limit for the job, and is of the format HH:MM:SS. Check with your sysadmin to see if there’s a maximum limit on this time. The nodes argument defines how many cores you want the script to grab.
[-a] – date_time: Declares the time after which the job is eligible for execution. The date_time argument is in the form:
[[[[CC]YY]MM]DD]hhmm[.SS]
Where CC is the first two digits of the year (the century), YY is the second two digits of the year, MM is the two digits for the month, DD is the day of the month, hh is the hour, mm is the minute, and the optional SS is the seconds.
Environment Variables Available to Job:
You can use these variables in your scripts as though they already exist in your environment; PBS sets them up as soon as your job starts running.
PBS_O_WORKDIR – the absolute path of the current working directory of the qsub command. You must ‘cd’ to this directory if you want to work in the folder you submitted the job from.
PBS_JOBNAME – the job name supplied by the user.
PBS_O_HOST – the name of the host upon which the qsub command is running.
PBS_SERVER – the hostname of the pbs_server which qsub submits the job to.
PBS_O_QUEUE – the name of the original queue to which the job was submitted.
PBS_ARRAYID – each member of a job array is assigned a unique identifier (see -t)
PBS_ENVIRONMENT – set to PBS_BATCH to indicate the job is a batch job, or to PBS_INTERACTIVE to indicate the job is a PBS interactive job, see -I option.
PBS_JOBID – the job identifier assigned to the job by the batch system.
PBS_NODEFILE – the name of the file contain the list of nodes assigned to the job (for parallel and cluster systems). This file is particularly useful if you want to log in to remote machines for parallel debugging.
PBS_QUEUE – the name of the queue from which the job is executed.
PBS_JOBNAME – the job name supplied by the user.