Debugging a code by submitting jobs to a supercomputer is an inefficient process. It goes something like this:
- Submit job and wait in queue
- Check for errors/change code
- (repeat endlessly until your code works)
Debugging in Real-Time:
There’s a better way to debug that doesn’t require waiting for the queue every time you want to check your code. On SLURM, you can debug in real-time like so:
- Request a debugging node and wait in queue
- Check for errors/change code continuously until code is fixed or node has timed out
Example (using Janus supercomputer at University of Colorado Boulder):
- Log into terminal (PuTTY, Cygwin, etc.)
- Navigate to directory where the file to be debugged is located using ‘cd’ command
- Load SLURM
- $module load slurm
- Enter ‘salloc’ command and choose your debugging QOS (quality of service). For Janus, this is called janus-debug. Enter time of use (1 hour is the max time allowed for janus-debug). Choose one node and the desired tasks per node (12 is the max on Janus).
- $salloc – -qos=janus-debug – -time=01:00:00 -N 1 – -ntasks-per-node=12
Wait in line for permission to use the node (you will have a high priority with a debugging QOS so it shouldn’t take long)
Once you are granted permission, the node is yours! Now you can debug to your hearts content (or until you run out of time).
I’m usually debugging shell scripts on Unix. If you want advice on that topic check out this link. I prefer the ‘-x’ command (shown below) but there are many options available.
Debugging shell scripts in Unix using ‘-x’ command:
$bash -x mybashscript.bash
Hopefully this was helpful! Please feel free to edit/comment/improve as you see fit.