Dave wrote the following instructions on how to debug MPI in an email recently, and I thought I’d post it here as a private post on the blog.
In case this isn’t already known, here’s instructions I came up with for running gdb and valgrind on MPI programs:
Debugging MPI with GDB
1) Run an interactive PBS job:
qsub -I -l walltime=16:00:00 -l nodes=1:ppn=4
The interactive job will start you in your home folder. CD to your working directory.
2) Load the OpenMPI module with GNU GCC support:
module load openmpi/gnu
3) Compile your code with the -ggdb flag to include GDB debugging info in the executable.
4) Create the GDB script, gdbscript.txt, to run when GDB is launched.
This is needed since the program will not start running until the
GDB ‘run’ command is called, and we need to automatically run all
jobs on remote nodes. This will also enable logging to gdb.txt.
set logging on
5) Run the MPI program with GDB:
mpirun gdb -x gdbscript.txt ./mpiprog.exe
6) When the program exits or an error is detected, you will be left in
GDB. You can now use any GDB commands, or quit by typing ‘quit’.
Memory Checking MPI Programs
First, follow steps 1-3 above.
4) When the interactive PBS job starts, run the MPI program with Valgrind:
mpirun valgrind –tool=memcheck –log-file=valgrind_%p.txt ./mpiprog.exe
5) Look at the valgrind_NNNN.txt files that were created, one for each process,
to determine if any memory leaks occurred. Valgrind often detects
uninitialized values in the Open MPI code, which should be ignored.