Making Valgrind Easy

Some of this blog’s readers and authors (most notably, Joe Kasprzik) read the title of this post and though “wait, there already is a post about Valgrind in this blog.” And you are right, so in this blog post I will build on the legacy Joe has left us on his post about Valgrind and get into the details of how to use its basic functionalities to get your code right.

Common mistakes when coding in C++

Suppose we have the following code:

#include <stdio.h>

int main() {
    int *var = new int[5]; // you wouldn't do this if the size was always 5, but this makes the argument clear.
    int n = 5;
    int m;

    if (m > n) {
        printf("Got into if statement.\n");
        for (int i = 0; i < 6; ++i) {
            var[i] = i;
        }
    }

    printf("var[5] equals %d\n", var[n]);
}

Saving the code above in a file called test.cpp, compiling it with g++ to create an executable called "test," and running it with "./test" will return the following output:

bernardoct@DESKTOP-J6145HK ~
$ g++ test.cpp -o test

bernardoct@DESKTOP-J6145HK ~
$ ./test
Got into if statement.
var[5] equals 5

Great, it ran and did not crash (in such a simple code gcc's flag -Wall would have issued a warning saying m was not initialized, but in more complex code such warning may not be issued). However, it would be great if this code had crashed because this would make us look into it and figure out it actually has 3 problems:

  1. We did not assign a value to variable m (it was created but not initialized), so how did the code determine that m was greater than n to get into the code inside the if statement?
  2. The pointer array var was created as having length 5, meaning its elements are numbered 0 to 4. If the for-loop runs from 0 to 5 but element 5 does not exist, how did the code fill it in with the value of variable i when i was 5 in the loop? From the printf statement that returned 5 we know vars[5] equals 5.
  3. The pointer array var was not destroyed after the code did not need it any longer. This is not necessarily a problem in this case, but if this was a function that is supposed to be called over and over within a model there is a change the RAM would be filled with these seemingly inoffensive pointer arrays and the computer would freeze (or the node, if running on a cluster, would possibly crash and have to be rebooted).

Given C++ will not crash even in the presence of such errors, one way of making sure your code is clean is by running it through Valgrind. However, most people who has used Valgrind on a model that has a few hundreds or thousands of lines of code has gotten discouraged by its possibly long and cryptic-looking output. However, do not let this intimidate you because the output is actually fairly easy to read once you either learn what to look for or use Valkyrie, a graphical user interface for Valgrind.

Generating and interpreting Valgrind’s output

The first think that needs to be done for Valgrind to give you a meaningful output is to re-compile your code with the -O0 and -g flags, the former to prevent the compiler from modifying your code to make it more efficient but unintelligible to Valgrind (or to debuggers), and the latter for Valgrind (and debuggers) to be able to pinpoint the line of code where issues happen and are originated. Therefore, the code should be compiled as shown below:

bernardoct@DESKTOP-J6145HK ~
$ g++ -O0 -g test.cpp -o test

Now it is time to run your code with Valgrind to perform some memory checking. Valgrind itself will take flags that will dictate the type of analysis to be performed. Here we are interested in checking memory misuse (instead profiling, checking for thread safety, etc.), so the first flag (not required, but good to keep things for yourself) should be --tool=memcheck. Now that we specified that we want Valgrind to run a memory check, we should specify that we want it to look in detail for memory leaks and tell us where the erros are happening and originating, which can done by passing flags --leak-check=full and --track-origins-yes. This way, the complete function call to run Valgrind on our test program is:

bernardoct@DESKTOP-J6145HK ~
$ valgrind --tool=memcheck --leak-check=full --track-origins=yes ./test

Important: Beware that your code will take orders of magnitude longer to run with Valgrind than it would otherwise. This means that you should run something as small as possible but still representative — e.g. instead of running your stochastic model with 1,000 realizations and a simulation time of 50 years, consider running 2 realizations simulating 2 years, so that Valgrind analyzes the year-long simulation and the transition between realizations and years. Also, if running your code on a cluster, load the valgrind module with module load valgrind-xyz on your submission script and replace the call to your model on the submission script by the valgrind call above you can find the exact name of the Valgrind module by running module avail on the terminal. If running valgrind with a code that used MPI, use mpirun valgrind ./mycode -flags.

When called, valgrind will instrument our test.cpp and based on the collected information will print the following on the screen:

==385== Memcheck, a memory error detector
==385== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==385== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==385== Command: ./test
==385==
==385== Conditional jump or move depends on uninitialised value(s)
==385==    at 0x4006A9: main (test.cpp:9)
==385==  Uninitialised value was created by a stack allocation
==385==    at 0x400686: main (test.cpp:3)
==385==
Got into if statement.
==385== Invalid write of size 4
==385==    at 0x4006D9: main (test.cpp:12)
==385==  Address 0x5ab4c94 is 0 bytes after a block of size 20 alloc'd
==385==    at 0x4C2E8BB: operator new[](unsigned long) (vg_replace_malloc.c:423)
==385==    by 0x400697: main (test.cpp:5)
==385==
==385== Invalid read of size 4
==385==    at 0x4006F5: main (test.cpp:16)
==385==  Address 0x5ab4c94 is 0 bytes after a block of size 20 alloc'd
==385==    at 0x4C2E8BB: operator new[](unsigned long) (vg_replace_malloc.c:423)
==385==    by 0x400697: main (test.cpp:5)
==385==
var[5] equals 5
==385==
==385== HEAP SUMMARY:
==385==     in use at exit: 20 bytes in 1 blocks
==385==   total heap usage: 3 allocs, 2 frees, 73,236 bytes allocated
==385==
==385== 20 bytes in 1 blocks are definitely lost in loss record 1 of 1
==385==    at 0x4C2E8BB: operator new[](unsigned long) (vg_replace_malloc.c:423)
==385==    by 0x400697: main (test.cpp:5)
==385==
==385== LEAK SUMMARY:
==385==    definitely lost: 20 bytes in 1 blocks
==385==    indirectly lost: 0 bytes in 0 blocks
==385==      possibly lost: 0 bytes in 0 blocks
==385==    still reachable: 0 bytes in 0 blocks
==385==         suppressed: 0 bytes in 0 blocks
==385==
==385== For counts of detected and suppressed errors, rerun with: -v
==385== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)

Seeing Valgrind’s output being 5 times as long as the test code itself can be somewhat disheartening, but the information contained in the output is really useful. The first block of the output is the header it will always be printed so that you know the version of Valgrind you have been using, the call for your own code it used, and so on. In our example, the header is:

==385== Memcheck, a memory error detector
==385== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==385== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==385== Command: ./test

After that, Valgrind report the errors it found during the execution of your code. Errors are always reported as a description of the error in good old English, followed by where it happens in your code. Let’s look at the first error found by Valgrind:

==385== Conditional jump or move depends on uninitialised value(s)
==385==    at 0x4006A9: main (test.cpp:9)
==385==  Uninitialised value was created by a stack allocation
==385==    at 0x400686: main (test.cpp:3)

This tells us that there is an if statement (conditional statement) on line 9 of test.cpp in which at least one of the sides of the logical test has at least one uninitialized variable. As pointed out by Valgrind, line 9 of test.cpp has our problematic if statement which compares initialized variable n to uninitialized variable m, which will have whatever was put last in that memory address by the computer.

The second error block is the following:

==385== Invalid write of size 4
==385==    at 0x4006D9: main (test.cpp:12)
==385==  Address 0x5ab4c94 is 0 bytes after a block of size 20 alloc'd
==385==    at 0x4C2E8BB: operator new[](unsigned long) (vg_replace_malloc.c:423)
==385==    by 0x400697: main (test.cpp:5)

This means that your code is writing something in a location of memory that it did not allocated for its use. This block says that the illegal write, so to speak, happened in line 12 of test.cpp through a variable created in line 5 of test.cpp using the new[] operator. These lines correspond to var[i] = i; and to int *var = new int[5];. With this, we learned that either var was created too short on line 5 of test.cpp or that the for loop that assigns values to var goes one or more steps too far.

Similarly, the next block tells us that our printf statement used to print the value of var[5] on the screen has read past the amount of memory that was allocated to var in its declaration on line 5 of test.cpp, as shown below:

==385== Invalid read of size 4
==385==    at 0x4006F5: main (test.cpp:16)
==385==  Address 0x5ab4c94 is 0 bytes after a block of size 20 alloc'd
==385==    at 0x4C2E8BB: operator new[](unsigned long) (vg_replace_malloc.c:423)
==385==    by 0x400697: main (test.cpp:5)

The last thing Valgrind will report is the information about memory leaks, which are accounted for when the program is done running. The output about memory leaks for our example is:

==409== HEAP SUMMARY:
==409==     in use at exit: 20 bytes in 1 blocks
==409==   total heap usage: 3 allocs, 2 frees, 73,236 bytes allocated
==409==
==409== 20 bytes in 1 blocks are definitely lost in loss record 1 of 1
==409==    at 0x4C2E8BB: operator new[](unsigned long) (vg_replace_malloc.c:423)
==409==    by 0x400697: main (test.cpp:5)
==409==
==409== LEAK SUMMARY:
==409==    definitely lost: 20 bytes in 1 blocks
==409==    indirectly lost: 0 bytes in 0 blocks
==409==      possibly lost: 0 bytes in 0 blocks
==409==    still reachable: 0 bytes in 0 blocks
==409==         suppressed: 0 bytes in 0 blocks

The important points to take away from this last block are that:

  1. there were 20 bytes of memory leaks, meaning that if this were a function in your code every time it was run it would leave 20 bytes of garbage sitting in the RAM. This may not sound like a big deal but imagine if your code leaves 1 MB of garbage in the RAM for each of the 100,000 times a function is called. With this, there went 100 GB of RAM and everything else you were doing in your computer at that time because the computer will likely freeze and have to go through a hard-reset.
  2. the memory you allocated and did not free was allocated in line line 5 of test.cpp when you used the operator new[] to allocate the integer pointer array.

It is important to notice here that if we increase the amount of allocated memory by the new[] operator on line 5 to that corresponding to 6 instead of 5 integers, the last two errors (invalid read and invalid write) would disappear. This means that if you run your code with Valgrind and see hundreds of errors, chances are that it will take modifying a few lines of code to get rid of most of these errors.

Valkyrie — a graphical user interface for Valgrind

Another way of going through Valgrind’s output is by using Valkyrie (now installed in the login node of Reed’s cluster, The Cube). If you are analyzing your code from your own computer with a Linux terminal (does not work with Cygwin, but you can install a native Ubuntu terminal on Windows 10 by following instructions posted here) and do not have Valkyrie installed yet, you can install it by running the following on your terminal:

bernardoct@DESKTOP-J6145HK ~
$ sudo apt-get install valkyrie

Valkyrie works by reading an xml file exported by Valgrind containing the information about the errors it found. To export this file, you need to pass the flags --xml=yes and --xml-file=valgring_output.xml (or whatever name you want to give the file) to Valgrind, which would make the call to Valgrind become:

bernardoct@DESKTOP-J6145HK ~
$ valgrind --tool=memcheck --leak-check=full --track-origins=yes --xml=yes --xml-file=valgring_output.xml ./test

Now, you should have a file called “valgrind_output.xml” in the directory you are calling Valgrind from. To open it with Valkyrie, first open Valkyrie by typing valkyrie on your terminal if on Windows 10 you need to have Xming installed and running, which can be done by following the instructions in the end of this post. If on a cluster, besides having Xming open you also have to have ssh’ed into the cluster with the -X flag (e.g. by running ssh -X username@my.cluster.here) with either Cygwin or from a native Linux terminal. After opening Valkyrie, click on the green folder button and select the xml file, as in the screenshot below.

valkyrie_screenshot.png

After opening the xml file generated by Valgrind, Valkyrie should look like in the screenshot below:valkyrie_screenshot2

Now you can begin from a collapsed list of errors and unfold each error to see its details. Keep in mind that Valkyrie is not your only option of GUI for Valgrind, as IDEs like JetBrains’ CLion and QTCreator come integrated with Valgrind. Now go check your code!

PS: Thanks to folks on Redit for the comments which helped improve this post.

Advertisements

A completely non-exhaustive list of tutorial resources for scientific computing

This is a short blog post to put together a list of resources with tutorials (similar to what’s usually found on this blog) for various programming languages. It is by no means exhaustive, so please comment if you feel there’s an important one I left out.

Matlab:

https://www.arnevogel.com/

Kinda new blog with Matlab tutorials on numerical methods

https://blogs.mathworks.com/loren/

One of the MathWorks blogs, great tutorials.

https://www.mathworks.com/support/learn-with-matlab-tutorials.html

Matlab tutorials from MathWorks

http://undocumentedmatlab.com/

More Matlab tutorials, a lot of material on many topics

Python:

https://glowingpython.blogspot.com/

Not updated very frequently, but good data analysis and visualization tutorials are posted

https://pythonprogramming.net/

Updated regularly, some great data visualization and analysis tutorials. The tutorials come with videos. I really like this site.

http://treyhunner.com/

Updated regularly (about once a month) with python tutorials and general python coding insights. I really like the writing style of this author. 

https://docs.scipy.org/doc/scipy/reference/tutorial/

Tutorials on using SciPy

C and C++:

https://www.cprogramming.com/

C and C++ programming tutorials, tips and tricks

http://www.cplusplus.com/articles/

Not really updated anymore but some good basic tutorials are listed

https://blog.knatten.org/

Hadn’t been updated in a while, but it looks like it’s been picked up again. Good for general C++ programming and good practice.

http://www.bfilipek.com/

C++ tutorials

General:

https://towardsdatascience.com/

This is a great general resource not devoted to a particular language. They cover topics in data science, machine learning and programming in general. Some great tutorials for Python and R. 

https://projecteuler.net/

Mathematical programming problems to help with learning any language

https://github.com/EbookFoundation/free-programming-books/blob/master/free-programming-books.md

Free programming books repository

Reddits (some of the bigger ones):

/r/matlab (General on matlab, community provides help with coding)

/r/programming (General on programming)

/r/learnprogramming (Community provides help with debugging questions)

/r/python (General on python, community provides help with coding)

/r/learnpython (Community provides help with python questions, smaller than /r/python)

/r/cpp (General on C++)

/r/cpp_questions (Community provides help with C++ questions)

I’ve also recently made /r/sci_comp which has very little activity for the moment, but the aim is to create a community with general resources on coding for scientific applications.

 

Simple tricks to make your C/C++ code run faster

Us, engineers with no formal computer science training, for a myriad of good reasons tend to favor friendly programming languages such as Python over evil C/C++. However, when our application is performance sensitive and we cannot wait for results sitting in our chairs for as long as a Python code would require us to, we are sometimes forced to us C/C++. Why then not making the most of it when it this is the case?

Here are some simple tips to make your C/C++ code run even faster, and how to get some advice about further performance improvements. The last idea (data locality) is transferable to Python and other languages.

Most important trick

Improve your algorithm. Thinking if there is a simpler way of doing what you coded may reduce your algorithm’s complexity (say, from say n3 to n*log(n)), which would:

  • yield great speed-up when you have lots of data or needs to run a computation several times in a row, and
  • make you look smarter.

Compiler flags

First and foremost, the those who know what they are doing — compiler developers — do the work for you by calling the compiler with the appropriate flags. There are an incredible amount of flags you can call for that, but I would say that the ones you should have on whenever possible are -O3 and –march=native.

The optimization flags (-O1 to -O3, the latter more aggressive than the former) will perform a series of modification on your code behind the scenes to speed it up, some times by more than an order of magnitude. The issue is that this modifications may eventually make your code behave differently than you expected, so it’s always good to do a few smaller runs with -O0 and -O3 and compare their results before getting into production mode.

The –march=native flag will make the compiler fine tune your code to the processor it is being compiled on (conversely, –march=haswell would fine tune it to haswell architectures, for example). This is great if you are only going to run your code on your own machine or in another machine known to have a similar architecture, but if you try to run the binary on a machine with a different architecture, specially if it is an older one, you may end up with illegal instruction errors.

Restricted pointer array

When declaring a pointer array which you are sure will not be subject to pointer aliasing — namely there will be no other pointer pointing to the same memory address –, you can declare that pointer as a restrict pointer, as below:

  • GCC: double* __restrict__ my_pointer_array
  • Intel compiler: double* restrict my_pointer_array

This will let the compiler know that it can change order of certain operations involving my_pointer_array to make your code faster without changing some read/write order that may change your results. If you want to use the restricted qualifier with the intel compiler the flag -restrict must be passed as an argument to the compiler.

Aligned pointer array

By aligning an array, you are making sure the data is going to lie in the ideal location in memory for the processor to fetch it and perform the calculations it needs based on that array. To help your compiler optimize your data alignment, you need to (1) align your array when it is declared by a specific number of bytes and (2) tell your the compiler the array is aligned when using it to perform calculations — the compiler has no idea whether or not arrays received as arguments in function are aligned. Below are examples in C, although the same ideas apply to C++ as well.

GCC

#include <stdio.h>
#include <omp.h>

#define SIZE_ARRAY 100000
#define ALIGN 64

void sum(double *__restrict__ a, double *__restrict__ b, double *__restrict__ c, int n) {
    a = (double*) __builtin_assume_aligned(a, ALIGN);
    b = (double*) __builtin_assume_aligned(b, ALIGN);
    c = (double*) __builtin_assume_aligned(c, ALIGN);
    for (int i = 0; i < n; ++i)
        c[i] = a[i] + b[i];
}

int main(void) {

    double a[SIZE_ARRAY] __attribute__((aligned(ALIGN )));
    double b[SIZE_ARRAY] __attribute__((aligned(ALIGN )));
    double c[SIZE_ARRAY] __attribute__((aligned(ALIGN )));

    for (int i = 0; i < SIZE_ARRAY; ++i) {
        a[i] = 5.;
        b[i] = 2.;
    }
    double start_time = omp_get_wtime();
    sum(a, b, c, SIZE_ARRAY);
    double time = omp_get_wtime() - start_time;
    printf("%0.6fs", time);
}

Intel compiler

#include <stdio.h>
#include <omp.h>

#define SIZE_ARRAY 100000
#define ALIGN 64

void sum(double* restrict a, double* restrict b, double* restrict c, int n) {
    __assume_aligned(a, ALIGN);
    __assume_aligned(b, ALIGN);
    __assume_aligned(c, ALIGN);
    for (int i = 0; i < n; ++i)
        c[i] = a[i] + b[i];
}

int main(void) {

    __declspec(align(ALIGN )) float a[SIZE_ARRAY];
    __declspec(align(ALIGN )) float b[SIZE_ARRAY];
    __declspec(align(ALIGN )) float c[SIZE_ARRAY];

    for (int i = 0; i < SIZE_ARRAY; ++i) {
        a[i] = 5.;
        b[i] = 2.;
    }
    double start_time = omp_get_wtime();
    sum(a, b, c, SIZE_ARRAY);
    double time = omp_get_wtime() - start_time;

    printf("%0.6fs", time);
}

Edit: In a comment to this post, Krister Walfridsson not only caught an issue with my GCC code, for which mention I thank him, but he also shows the differences in machine code generated with and without alignment.

Data Locality

Computers are physical things, which means that data is physically stored and needs to be physically moved around in memory and between cache and processor in order to be used in calculations. This means that, if your data is stored all over the place in memory — e.g. in multiple pointer arrays in different parts of memory –, the processor will have to reach out to several parts of memory to fetch all your data before performing any computations. By having the data intelligently laid out in memory you ensure all data required for each computation is stored close to each other and in cache at the same time, which becomes even more important if your code uses too much data to fit in the cache at once.

In order to making your processor’s life easier, it is a good idea to ensure that all data required for a calculation step is close together. For example, if a given computation required three arrays of fixed sizes, it is always a good idea to merge them into one long array, as in the example below for the Intel compiler.

#include <stdio.h>
#include <omp.h>

#define SIZE_ARRAY 100000
#define ALIGN 64

void sum(double* restrict a, double* restrict b, double* restrict c, int n) {
    __assume_aligned(a, ALIGN);
    __assume_aligned(b, ALIGN);
    __assume_aligned(c, ALIGN);
    for (int i = 0; i < n; ++i)
        c[i] = a[i] + b[i];
}

int main(void) {

    __declspec(align(ALIGN )) float abc[3 * SIZE_ARRAY];

    for (int i = 0; i < 2 * SIZE_ARRAY; ++i) {
        a[i] = 5.;
        b[i] = 2.;
    }
    double start_time = omp_get_wtime();
    sum(abc, abc + SIZE_ARRAY, abc + 2 * ARRAY, SIZE_ARRAY);
    double time = omp_get_wtime() - start_time;

    printf("%0.6fs", time);
}

or even, since c[i] depends only on b[i] and a[i], we can have the values of a, b and c intercalated to assure that all computations will be performed on data that is right next to each other in memory:

#include <stdio.h>
#include <omp.h>

#define SIZE_ARRAY 100000
#define ALIGN 64
#define STRIDE 3

void sum(double* restrict abc, int n, int stride) {
    __assume_aligned(abc, ALIGN);

    for (int i = 0; i < n; i += stride)
        abc[i+2] = abc[i] + abc[i+1];
}

int main(void) {
    __declspec(align(ALIGN )) double abc[3 * SIZE_ARRAY];

    for (int i = 0; i < 3 * SIZE_ARRAY; i += STRIDE) {
        abc[i] = 5.;
        abc[i+1] = 2.;
                                            }
    double start_time = omp_get_wtime();
    sum(abc, SIZE_ARRAY, STRIDE );
    double time = omp_get_wtime() - start_time;

    printf("%0.6fs", time);
}

Conclusion

According a class project in which we had to write C code to perform matrix multiplication, the improvements suggested should may improve the performance of your code by 5 or 10 times. Also, the idea of data locality can be transferred to other languages, such as Python.

Let your Makefile make your life easier

This semester I’m taking my first official CS class here at Cornell, CS 5220 Applications of Parallel Computers taught by Dave Bindel (for those of you in the Reed group or at Cornell, I would definitely recommend taking this class, which is offered every other year, once you have some experience coding in C or C++). In addition to the core material we’ve been learning in the class, I’ve been learning a lot by examining the structure and syntax of the code snippets written by the instructor and TA that are offered as starting points and examples for our class assignments. One element in particular  that stood out to me from our first assignment was the many function calls made through the makefile. This post will first take a closer look into the typical composition of a makefile and then examine how we can harness the structure of a makefile to help improve workflow on complicated projects.

Dissecting a typical makefile

On the most basic level, a makefile simply consists of series of rules that each have an associated set of actions. Makefiles are how you use the “make” utility, a software package installed on all linux systems. Make has its own syntax similar to bash but with some distinct idiosyncrasies. For example, make allows you to store a snippet of code in whats called a “macro” (these are pretty much analogous to variables in most other languages).  A macro to store the flags you would like to run with your compiler could be defined like this:

CFLAGS  = -g -Wall

To referenced the CFLAGS macro, use a dollar sign and brackets, like this:

 $(CFLAGS)

There are a series of “special” predifined macros that can be used in any makefile which are fairly common, you can find them here.

Now that we’ve discussed makefile syntax, lets take a look at how rules are structured within a makefile. A rule specified by a makefile has the following shape:

target: prerequisites
    recipe
    ...
    ...

The target is usually the name of the file that is generated by  a program, for example an executable or object file. A prerequisite is the specified input used to create the target (which can often depend on several files). The recipe is the action that make carries out for the intended target (note: this must be indented at every line).

For example, a rule to build an executable called myProg from a c file called myProg.c using the gcc compiler with flags defined in CFLAGS might look like this:

myProg: myProg.c
    gcc $(CFLAGS) $<

Make the makefile do the work

The most common rules within makefiles call the compiler to build code (hence the name “makefile”) and many basic makefiles are used for this sole purpose. However, a rule simply sends a series commands specified by its recipe to the command line and a rule can actually specify any action or series of actions that you want. A ubiquitous example of a rule that specifies an action is “clean”, which may be implemented like this:

clean:
    rm -rf *.o $(PROGRAM)

Where PROGRAM is a macro containing the names of the executable compiled by the makefile.

In this example, the rule’s target is an action rather than an output file. You can call “clean” by simply typing “make clean” into the command line and you will remove all .o files and the executable PROGRAM from your working directory.

Utilizing this application of rules, we can now have our makefile do a lot of our command line work for us. For example, we could create a rule called “run” which submits a series of pbs jobs into a cluster.

run:
    qsub job-$*.pbs

We can then enter “make run” into the command line to execute this rule, which will submit the .pbs jobs for us (note that this will not perform any of the other rules defined in the makefile). Using this rule may be handy if we have a large number of jobs to submit.

Alternatively we could  make a rule called “plot” which calls a plotting function from python:

plot: 
    python plotter.py $(PLOTFILES)

Where PLOTFILES is a macro containing the name of files to be plotted and plotter.py is a python function that takes the file names as input.

Those are just two examples (loosely based on a makefile given in CS 5220) of how you can use a makefile to do your command line work for you, but the possibilities are endless!! Ok, maybe that’s getting a bit carried away, but I do find this functionality to be a simple and elegant way to improve the efficiency of your workflow on complex projects.

For some background on makefiles, I’d recommend taking a look a Billy’s post from last year. I also found the GNU make user manual helpful as well as this tutorial from Swarthmore that has some nice example makefiles.

Compiling Code using Makefiles

For people new to coding or using supercomputers to submit jobs, compiling can be a new concept. In programming, a compiler takes source code (e.g. written in C/C++, Python, Fortran, etc.) and translates it into a lower-level programming language (e.g. assembly language or machine code). When a compiler runs successfully, the source code will be converted to an executable program which the computer understands how to run.   For more info on compilers, check out this video

To compile a program, we use the ‘make’ command. When we have multiple source files (which we often do when running complex water management models), a makefile file helps organize directions to give to the compiler. If you are not creating a model from scratch, you may already have an existing makefile which are conventionally named ‘makefile’ or ‘Makefile’. If that is the case, compiling is easy if all the files are in the proper directory. Simply type ‘make’ in the command line interface to compile your code.

If you would like to edit your makefile, create one from scratch, or just want to learn more about the ‘make’ command and makefiles, check out the resources below:

Introduction to ‘make’ and compiler options:

Introductory tutorials for makefiles:

Makefile naming:

Makefile macros:

Compiler flags:

For a list of compiler flags, see the manual or ‘man’ page for the compiler: e.g. $man <compiler_name>. The convention for naming makefile macros for compiler flags for gcc and g++ are CFLAGS and CPPFLAGS, respectively.

Example Makefile (C or C++):

The conventional file organization for this work is to create a src (or source) and bin (or binary) directory. The source code will go in /src while the makefile and any input files will go in /bin. Once the executable is created, it will be located in /bin as well. Below is a truncated version of a makefile I made for a water treatment plant model (written in C) based on a makefile I found for LRGV (written in C++). In general, there is little difference between a makefile for a program written in C or C++, so this template file can be used for either language by simply commenting or uncommenting the makefile macros for the language for which you want to compile (e.g. CC, CXX, CFLAGS, CPPFLAGS). One special difference between compiling C and C++ is that a common library, math.h, is automatically linked for C++ but must be explicitly linked for C. You’ll notice that I’ve explicitly linked it in the code below, so as long as the C code I’ve written is also valid C++, I should be able to use either compiler since C is a subset of C++. Enjoy!

MAIN_DIR=./.. #from within /bin go to main directory which contains /bin and /src directories

SOURCE_DIR = $(MAIN_DIR)/src #navigate to directory which contains source code

SOURCES = \
$(SOURCE_DIR)/basin.c    \
   $(SOURCE_DIR)/breakpt.c  \
   #I’ll leave out some files for the sake of space
   $(SOURCE_DIR)/unittype.c \
   $(SOURCE_DIR)/uptable.c  \
   $(SOURCE_DIR)/writewtp.c \

OBJECTS=$(SOURCES:.c=.o) #name object files based on .c files
CC=gcc #select the type of compiler: this once in for C
#CXX=g++
CFLAGS=-c -O3 -Wall -I. -I$(SOURCE_DIR) #set flags for gcc
#CPPFLAGS=-c -O3 -Wall -I. -I$(SOURCE_DIR) #set flags for g++
LIBS=-lm  # link libraries (e.g. math.h with -lm: goo.gl/Bgq75V)
EXECUTABLE=wtp_v2-2_borg-mp #name of the executable file

all: $(SOURCES) $(EXECUTABLE)
    rm -rf $(SOURCE_DIR)/*.o

$(EXECUTABLE): $(OBJECTS)
    $(CC) $(OBJECTS) -o $@ $(LIBS)

.c.o:
    $(CC) $(CFLAGS) $^ -o $@

clean: #’make clean’ will remove all compilation files
    rm -f $(SOURCE_DIR)/*.o $(EXECUTABLE)