Some of this blog’s readers and authors (most notably, Joe Kasprzik) read the title of this post and though “wait, there already is a post about Valgrind in this blog.” And you are right, so in this blog post I will build on the legacy Joe has left us on his post about Valgrind and get into the details of how to use its basic functionalities to get your code right.
Common mistakes when coding in C++
Suppose we have the following code:
#include <stdio.h> int main() { int *var = new int[5]; // you wouldn't do this if the size was always 5, but this makes the argument clear. int n = 5; int m; if (m > n) { printf("Got into if statement.\n"); for (int i = 0; i < 6; ++i) { var[i] = i; } } printf("var[5] equals %d\n", var[n]); }
Saving the code above in a file called test.cpp, compiling it with g++ to create an executable called "test," and running it with "./test" will return the following output:
bernardoct@DESKTOP-J6145HK ~ $ g++ test.cpp -o test bernardoct@DESKTOP-J6145HK ~ $ ./test Got into if statement. var[5] equals 5
Great, it ran and did not crash (in such a simple code gcc's flag -Wall would have issued a warning saying m
was not initialized, but in more complex code such warning may not be issued). However, it would be great if this code had crashed because this would make us look into it and figure out it actually has 3 problems:
- We did not assign a value to variable
m
(it was created but not initialized), so how did the code determine thatm
was greater thann
to get into the code inside the if statement? - The pointer array
var
was created as having length 5, meaning its elements are numbered 0 to 4. If the for-loop runs from 0 to 5 but element 5 does not exist, how did the code fill it in with the value of variablei
when i was 5 in the loop? From the printf statement that returned 5 we knowvars[5]
equals 5. - The pointer array
var
was not destroyed after the code did not need it any longer. This is not necessarily a problem in this case, but if this was a function that is supposed to be called over and over within a model there is a change the RAM would be filled with these seemingly inoffensive pointer arrays and the computer would freeze (or the node, if running on a cluster, would possibly crash and have to be rebooted).
Given C++ will not crash even in the presence of such errors, one way of making sure your code is clean is by running it through Valgrind. However, most people who has used Valgrind on a model that has a few hundreds or thousands of lines of code has gotten discouraged by its possibly long and cryptic-looking output. However, do not let this intimidate you because the output is actually fairly easy to read once you either learn what to look for or use Valkyrie, a graphical user interface for Valgrind.
Generating and interpreting Valgrind’s output
The first think that needs to be done for Valgrind to give you a meaningful output is to re-compile your code with the -O0 and -g flags, the former to prevent the compiler from modifying your code to make it more efficient but unintelligible to Valgrind (or to debuggers), and the latter for Valgrind (and debuggers) to be able to pinpoint the line of code where issues happen and are originated. Therefore, the code should be compiled as shown below:
bernardoct@DESKTOP-J6145HK ~ $ g++ -O0 -g test.cpp -o test
Now it is time to run your code with Valgrind to perform some memory checking. Valgrind itself will take flags that will dictate the type of analysis to be performed. Here we are interested in checking memory misuse (instead profiling, checking for thread safety, etc.), so the first flag (not required, but good to keep things for yourself) should be --tool=memcheck. Now that we specified that we want Valgrind to run a memory check, we should specify that we want it to look in detail for memory leaks and tell us where the erros are happening and originating, which can done by passing flags --leak-check=full and --track-origins-yes. This way, the complete function call to run Valgrind on our test program is:
bernardoct@DESKTOP-J6145HK ~ $ valgrind --tool=memcheck --leak-check=full --track-origins=yes ./test
Important: Beware that your code will take orders of magnitude longer to run with Valgrind than it would otherwise. This means that you should run something as small as possible but still representative — e.g. instead of running your stochastic model with 1,000 realizations and a simulation time of 50 years, consider running 2 realizations simulating 2 years, so that Valgrind analyzes the year-long simulation and the transition between realizations and years. Also, if running your code on a cluster, load the valgrind module with module load valgrind-xyz
on your submission script and replace the call to your model on the submission script by the valgrind call above — you can find the exact name of the Valgrind module by running module avail
on the terminal. If running valgrind with a code that used MPI, use mpirun valgrind ./mycode -flags
.
When called, valgrind will instrument our test.cpp and based on the collected information will print the following on the screen:
==385== Memcheck, a memory error detector ==385== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==385== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==385== Command: ./test ==385== ==385== Conditional jump or move depends on uninitialised value(s) ==385== at 0x4006A9: main (test.cpp:9) ==385== Uninitialised value was created by a stack allocation ==385== at 0x400686: main (test.cpp:3) ==385== Got into if statement. ==385== Invalid write of size 4 ==385== at 0x4006D9: main (test.cpp:12) ==385== Address 0x5ab4c94 is 0 bytes after a block of size 20 alloc'd ==385== at 0x4C2E8BB: operator new[](unsigned long) (vg_replace_malloc.c:423) ==385== by 0x400697: main (test.cpp:5) ==385== ==385== Invalid read of size 4 ==385== at 0x4006F5: main (test.cpp:16) ==385== Address 0x5ab4c94 is 0 bytes after a block of size 20 alloc'd ==385== at 0x4C2E8BB: operator new[](unsigned long) (vg_replace_malloc.c:423) ==385== by 0x400697: main (test.cpp:5) ==385== var[5] equals 5 ==385== ==385== HEAP SUMMARY: ==385== in use at exit: 20 bytes in 1 blocks ==385== total heap usage: 3 allocs, 2 frees, 73,236 bytes allocated ==385== ==385== 20 bytes in 1 blocks are definitely lost in loss record 1 of 1 ==385== at 0x4C2E8BB: operator new[](unsigned long) (vg_replace_malloc.c:423) ==385== by 0x400697: main (test.cpp:5) ==385== ==385== LEAK SUMMARY: ==385== definitely lost: 20 bytes in 1 blocks ==385== indirectly lost: 0 bytes in 0 blocks ==385== possibly lost: 0 bytes in 0 blocks ==385== still reachable: 0 bytes in 0 blocks ==385== suppressed: 0 bytes in 0 blocks ==385== ==385== For counts of detected and suppressed errors, rerun with: -v ==385== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)
Seeing Valgrind’s output being 5 times as long as the test code itself can be somewhat disheartening, but the information contained in the output is really useful. The first block of the output is the header — it will always be printed so that you know the version of Valgrind you have been using, the call for your own code it used, and so on. In our example, the header is:
==385== Memcheck, a memory error detector ==385== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==385== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==385== Command: ./test
After that, Valgrind report the errors it found during the execution of your code. Errors are always reported as a description of the error in good old English, followed by where it happens in your code. Let’s look at the first error found by Valgrind:
==385== Conditional jump or move depends on uninitialised value(s) ==385== at 0x4006A9: main (test.cpp:9) ==385== Uninitialised value was created by a stack allocation ==385== at 0x400686: main (test.cpp:3)
This tells us that there is an if statement (conditional statement) on line 9 of test.cpp in which at least one of the sides of the logical test has at least one uninitialized variable. As pointed out by Valgrind, line 9 of test.cpp has our problematic if statement which compares initialized variable n
to uninitialized variable m
, which will have whatever was put last in that memory address by the computer.
The second error block is the following:
==385== Invalid write of size 4 ==385== at 0x4006D9: main (test.cpp:12) ==385== Address 0x5ab4c94 is 0 bytes after a block of size 20 alloc'd ==385== at 0x4C2E8BB: operator new[](unsigned long) (vg_replace_malloc.c:423) ==385== by 0x400697: main (test.cpp:5)
This means that your code is writing something in a location of memory that it did not allocated for its use. This block says that the illegal write, so to speak, happened in line 12 of test.cpp through a variable created in line 5 of test.cpp using the new[]
operator. These lines correspond to var[i] = i;
and to
int *var = new int[5];
. With this, we learned that either var was created too short on line 5 of test.cpp or that the for loop that assigns values to
var
goes one or more steps too far.
Similarly, the next block tells us that our printf statement used to print the value of var[5]
on the screen has read past the amount of memory that was allocated to var
in its declaration on line 5 of test.cpp, as shown below:
==385== Invalid read of size 4 ==385== at 0x4006F5: main (test.cpp:16) ==385== Address 0x5ab4c94 is 0 bytes after a block of size 20 alloc'd ==385== at 0x4C2E8BB: operator new[](unsigned long) (vg_replace_malloc.c:423) ==385== by 0x400697: main (test.cpp:5)
The last thing Valgrind will report is the information about memory leaks, which are accounted for when the program is done running. The output about memory leaks for our example is:
==409== HEAP SUMMARY: ==409== in use at exit: 20 bytes in 1 blocks ==409== total heap usage: 3 allocs, 2 frees, 73,236 bytes allocated ==409== ==409== 20 bytes in 1 blocks are definitely lost in loss record 1 of 1 ==409== at 0x4C2E8BB: operator new[](unsigned long) (vg_replace_malloc.c:423) ==409== by 0x400697: main (test.cpp:5) ==409== ==409== LEAK SUMMARY: ==409== definitely lost: 20 bytes in 1 blocks ==409== indirectly lost: 0 bytes in 0 blocks ==409== possibly lost: 0 bytes in 0 blocks ==409== still reachable: 0 bytes in 0 blocks ==409== suppressed: 0 bytes in 0 blocks
The important points to take away from this last block are that:
- there were 20 bytes of memory leaks, meaning that if this were a function in your code every time it was run it would leave 20 bytes of garbage sitting in the RAM. This may not sound like a big deal but imagine if your code leaves 1 MB of garbage in the RAM for each of the 100,000 times a function is called. With this, there went 100 GB of RAM and everything else you were doing in your computer at that time because the computer will likely freeze and have to go through a hard-reset.
- the memory you allocated and did not free was allocated in line line 5 of test.cpp when you used the operator new[] to allocate the integer pointer array.
It is important to notice here that if we increase the amount of allocated memory by the new[]
operator on line 5 to that corresponding to 6 instead of 5 integers, the last two errors (invalid read and invalid write) would disappear. This means that if you run your code with Valgrind and see hundreds of errors, chances are that it will take modifying a few lines of code to get rid of most of these errors.
Valkyrie — a graphical user interface for Valgrind
Another way of going through Valgrind’s output is by using Valkyrie (now installed in the login node of Reed’s cluster, The Cube). If you are analyzing your code from your own computer with a Linux terminal (does not work with Cygwin, but you can install a native Ubuntu terminal on Windows 10 by following instructions posted here) and do not have Valkyrie installed yet, you can install it by running the following on your terminal:
bernardoct@DESKTOP-J6145HK ~ $ sudo apt-get install valkyrie
Valkyrie works by reading an xml file exported by Valgrind containing the information about the errors it found. To export this file, you need to pass the flags --xml=yes and --xml-file=valgring_output.xml (or whatever name you want to give the file) to Valgrind, which would make the call to Valgrind become:
bernardoct@DESKTOP-J6145HK ~ $ valgrind --tool=memcheck --leak-check=full --track-origins=yes --xml=yes --xml-file=valgring_output.xml ./test
Now, you should have a file called “valgrind_output.xml” in the directory you are calling Valgrind from. To open it with Valkyrie, first open Valkyrie by typing valkyrie
on your terminal — if on Windows 10 you need to have Xming installed and running, which can be done by following the instructions in the end of this post. If on a cluster, besides having Xming open you also have to have ssh’ed into the cluster with the -X flag (e.g. by running ssh -X username@my.cluster.here) with either Cygwin or from a native Linux terminal. After opening Valkyrie, click on the green folder button and select the xml file, as in the screenshot below.
After opening the xml file generated by Valgrind, Valkyrie should look like in the screenshot below:
Now you can begin from a collapsed list of errors and unfold each error to see its details. Keep in mind that Valkyrie is not your only option of GUI for Valgrind, as IDEs like JetBrains’ CLion and QTCreator come integrated with Valgrind. Now go check your code!
PS: Thanks to folks on Redit for the comments which helped improve this post.