Welcome to our blog!

Welcome to Water Programming! This blog is a collaborative effort by Pat Reed’s group at Cornell, Joe Kasprzyk’s group at CU Boulder, Jon Herman’s group at UC Davis, and others who use computer programs to solve problems — Multiobjective Evolutionary Algorithms (MOEAs), simulation models, visualization, and other techniques. Use the search feature and categories on the right panel to find topics of interest. Feel free to comment, and contact us if you want to contribute posts.

To find software:  Please consult the Pat Reed group website, MOEAFramework.org, and BorgMOEA.org.

The MOEAFramework Setup Guide: A detailed guide is now available. The focus of the document is connecting an optimization problem written in C/C++ to MOEAFramework, which is written in Java.

The Borg MOEA Guide: We are currently writing a tutorial on how to use the C version of the Borg MOEA, which is being released to researchers here. To gain access please email joseph.kasprzyk “at” colorado.edu.

Call for contributors: We want this to be a community resource to share tips and tricks. Are you interested in contributing? Please email joseph.kasprzyk “at” colorado.edu. You’ll need a WordPress.com account.

Reading CSV files in C++

If you are an engineer used to coding in Python or Matlab who is transitioning to C++, you will soon find out that even the most innocent task will now require several lines of code. A previous post has already shown how to export data to a CSV file. In order to facilitate your transition to C++, see below for an example of how to read your new CSV file.


#include <string>
#include <vector>
#include <sstream> //istringstream
#include <iostream> // cout
#include <fstream> // ifstream

using namespace std;

 * Reads csv file into table, exported as a vector of vector of doubles.
 * @param inputFileName input file name (full path).
 * @return data as vector of vector of doubles.
vector<vector<double>> parse2DCsvFile(string inputFileName) {

    vector<vector<double> > data;
    ifstream inputFile(inputFileName);
    int l = 0;

    while (inputFile) {
        string s;
        if (!getline(inputFile, s)) break;
        if (s[0] != '#') {
            istringstream ss(s);
            vector<double> record;

            while (ss) {
                string line;
                if (!getline(ss, line, ','))
                try {
                catch (const std::invalid_argument e) {
                    cout << "NaN found in file " << inputFileName << " line " << l
                         << endl;


    if (!inputFile.eof()) {
        cerr << "Could not read file " << inputFileName << "\n";
        __throw_invalid_argument("File not found.");

    return data;

int main()
    vector<vector<double>> data = parse2DCsvFile("test.csv");

    for (auto l : data) {
    	for (auto x : l)
    		cout << x << " ";
    	cout << endl;

    return 0;

Glossary of commonly used terms

I have recently started training with the group and coming from a slightly different research background I was unfamiliar with some a lot of the terminology. I thought it might be useful to put together a glossary of sorts containing the terms that someone new to this field of study might not intuitively understand. The idea is that when someone encounters an unfamiliar term while going through the training or reading through the material, they can come to the blog glossary and quickly Ctrl-F the term (yes, that is a keyboard shortcut used as a verb).

The definitions are not exhaustive, but links/references are provided so the reader can find additional material. The glossary is a work in progress, some definitions are still incomplete, but it will be regularly updated. I’m also probably the least qualified person in the group to provide the definitions, so any corrections/suggestions are more than welcome. If there’s any other term I’ve omitted and you feel should be included, please leave a comment so I can edit the post.


Adaptive metropolis algorithm

It is based on the classical random walk Metropolis algorithm and adapts continuously to the target distribution. 1

Akaike’s Information Criterion (AIC)

A measure of the relative quality of statistical models for a given set of data.2


MOEA that applies a multi-algorithm search that combines the NSGAII, particle swarm optimization, differential evolution, and adaptive metropolis.3

Approximation set

The set of solutions output by a multi-objective evolutionary algorithm approximating the Pareto front.4


A secondary population used by many multi-objective evolutionary algorithms to store non-dominated solutions found through the generational process.5


Attainment plot

Bayesian Information Criterion


Iterative search algorithm for multi-objective optimization. It combines adaptive operator selection with ε-dominance, adaptive population sizing and time continuation. 6

Classification and Regression Trees (CART)

Decision tree algorithms that can used for classification and regression analysis of predictive modeling problems.7

Closed vs. open loop control


See Convexity.


Restrictions imposed on the decision space by the particular characteristics of the system. They must be satisfied for a certain solution to be considered acceptable.5

Control map


Refers to whether the parameterization choices have significant impacts on the success or failure for an algorithm.


Progress towards the reference set.


Geometrically, a function is convex is a line segment drawn from any point along function f to any other point along f lies on or above the graph of f. For optimization problems, a problem is convex if the objective function and all constraints are convex functions if minimizing, and concave is maximizing.

(The) Cube

The Reed Group’s computing cluster run by the Center for Advanced Computing (CAC) at Cornell. The cluster has 32 nodes, each with Dual 8-core E5-2680 CPUs @ 2.7 GHz and 128 GB of RAM. For more information see: https://www.cac.cornell.edu/wiki/index.php?title=THECUBE_Cluster

Decision space

The set of all decision variables.

Decision variables

The numerical quantities (variables) that are manipulated during the optimization process and they represent how our actions are encoded within the problem.5


When elements of a solution set at a given time are dominated by a solution set the algorithm maintained some time before.5

Differential evolution

Evolutionary algorithm designed to optimize problems over continuous domains.5

Direct policy search

Dominance resistance

The inability of an algorithm to produce offspring that dominates poorly performing, non-dominated members of the population. (See also Pareto dominance).8

DTLZ problems

A suite of representative test problems for MOEAs that for which the analytical solutions have been found.  The acronym is a combination of the first letters of the creators’ last names (Deb, Thiele, Laumanns, Zitzler). DTLZ problems have been used for benchmarking and diagnostics when evaluating the performance of MOEAs.

Dynamic memory


Refers to a case where as evolution progresses, non-dominated solutions will not be lost in subsequent generations.

Epsilon (ε) dominance

When dominance is determined by use of a user-specified precision to simplify sorting. Pareto epsilon (ε) optimality and Pareto epsilon (ε) front are defined accordingly. (See also Pareto dominance).5

Epsilon (ε) dominance archive

Epsilon (ε)-box dominance archive


A steady-state MOEA, the first to incorporate ε-dominance archiving into its search process.

Epsilon (ε) indicator

Additive ε-indicator (ε+-indicator) (performance metric) 

The smallest distance ε that the approximation set must be translated by in order to completely dominate the reference set.4

Epsilon (ε) progress


Evolutionary algorithms

A class of search and optimization algorithms inspired by processes of natural evolution.5

Multi-objective evolutionary algorithms (MOEAs)

Evolutionary algorithms used for solving multi-objective problems.5

Evolutionary operators

They operate on the population of an evolutionary algorithm attempting to generate solutions with higher and higher fitness.5

Mutation evolutionary operators

They perturb the decision variables of a single solution to look for improvement in its vicinity.

Recombination evolutionary operators

They combine decision variables from two or more solutions to create new solutions.

Selection evolutionary operators

They determine which solutions are allowed to proceed to the next cycle.


A cross-language automation tool for running models. (See also Project Platypus).

Exploratory Modeling and Analysis (EMA)

A research methodology that uses computational experiments to analyze complex and uncertain systems.9

Data-driven exploratory modeling

Used to reveal implications of a data set by searching through an ensemble of models for instances that are consistent with the data.

Question-driven exploratory modeling

Searches over an ensemble of models believed to be plausible to answer a question of interest or illuminate policy choices.

Model-driven exploratory modeling

Investigates the properties of an ensemble of models without reference to a data set or policy question. It is rather a theoretical investigation into the properties of a class of models.

Feasible region

The set of all decision variables in the decision space that are feasible (i.e. satisfy all constraints).5


Generational algorithms

A class of MOEAs that replace the entire population during each full mating, mutation, and selection iteration of the algorithm.5 (See also Steady-state algorithms).

Generational distance (performance metric)

The average distance from every solution in the approximation set to the nearest solution in the reference set.4

Gini index

A generalization of the binomial variance used in Classification and Regression Trees (CART). (See also Classification and Regression Trees (CART)). 

High performance computing

Hypervolume (performance metric)

The volume of the space dominated by the approximation set.4

Inverted generational distance (performance metric)

The average distance from every solution in the reference set to the nearest solution in the approximation set.4


A free desktop application for producing and sharing high-dimensional, interactive scientific visualizations. (See also Project Platypus).

Kernel density estimation

Latin Hypercube Sampling (LHS)

Stratified technique used to generate samples of parameter values.

Markov chain

Method of moments

MOEA Framework

A free and open source Java library for developing and experimenting with multi-objective evolutionary algorithms and other general-purpose optimization algorithms.4

Monte Carlo

Morris method


The Non-dominated Sorting Genetic Algorithm-II. MOEA featuring elitism, efficient non-domination sorting, and parameter free diversity maintenance.10


A generational algorithm that uses e-dominance archiving, adaptive population sizing and time continuation.

Number of function evaluations (NFE)


The criteria used to compare solutions in an optimization problem.


A particle swarm optimization algorithm—the first to include e-dominance as a means to solve many-objective problems.11


The process of identifying the best solution (or a set of best solutions) among a set of alternatives.

Multi-objective optimization

Multi-objective optimization employs two or more criteria to identify the best solution(s) among a set of alternatives

Intertemporal optimization

Parallel computing

Parametric generator

Pareto optimality

The notion that a solution is superior or inferior to another solution only when it is superior in all objectives or inferior in all objectives respectively.

Pareto dominance

A dominating solution is superior to another in all objectives. A dominated solution is inferior to another in all objectives. A non-dominated solution is superior in one objective but inferior in another.  

Pareto front

Contains the objective values of all non-dominated solutions (in the objective function space).

Pareto optimal set

Contains the decision variables of all non-dominated solutions (in the decision variable space).

Particle swarm optimization

Population-based stochastic optimization technique where the potential solutions, called particles, fly through the problem space by following the current optimum particles.

Patient Rule Induction method (PRIM)

A rule induction algorithm.

Performance metrics

Procedures used to compare the performance of approximation sets.



The set of encoded solutions that are manipulated and evaluated during the application of an evolutionary algorithm.

Principle Component Analysis (PCA)

Project Platypus

A Free and Open Source Python Library for Multiobjective Optimization. For more information see: https://github.com/Project-Platypus

Radial basis function

Reference set

The set of globally optimal solutions in an optimization problem.


Python Library for Robust Decision Making and Exploratory Modelling. (See also Project Platypus).

Robust Decision Making (RDM)

An analytic framework that helps identify potential robust strategies for a particular problem, characterize the vulnerabilities of such strategies, and evaluate trade-offs among them.12

Multi-objective robust decision making (MORDM)

An extension of Robust Decision Making (RDM) to explicitly include the use of multi-objective optimization to discover robust strategies and explore the trade-offs among multiple competing performance objectives.13


An open source implementation of MORDM with the tools necessary to perform a complete MORDM analysis.14 For more information see: https://github.com/OpenMORDM

Safe operating space



Sobol sampling

Spacing (performance metric)

The uniformity of the spacing between the solutions in an approximation set.


MOEA that assigns a fitness value to each solution based on the number of competing solutions it dominates.

 State of the world

A fundamental concept in decision theory which refers to a feature of the world that the agent/decision maker has no control over and is the origin of the agent’s uncertainty about the world.

Steady-state algorithms

A class of MOEAs that only replace one solution in the population during each full mating, mutation, and selection iteration of the algorithm. (See also Generational algorithms).

Time continuation

The injection of new solutions in the population to reinvigorate search.


The set of candidate solutions selected randomly from a population.


Visual analytics

The rapid analysis of large datasets using interactive software that enables multiple connected views of planning problems.

More information on the concepts

  1. Haario, H., Saksman, E. & Tamminen, J. An adaptive Metropolis algorithm. Bernoulli 7, 223–242 (2001).
  2. Akaike, H. Akaike’s information criterion. in International Encyclopedia of Statistical Science 25–25 (Springer, 2011).
  3. Vrugt, J. A. & Robinson, B. A. Improved evolutionary optimization from genetically adaptive multimethod search. Proc. Natl. Acad. Sci. 104, 708–711 (2007).
  4. Hadka, D. Beginner’s Guide to the MOEA Framework. (CreateSpace Independent Publishing Platform, 2016).
  5. Coello, C. A. C., Lamont, G. B. & Van Veldhuizen, D. A. Evolutionary algorithms for solving multi-objective problems. 5, (Springer, 2007).
  6. Hadka, D. & Reed, P. Borg: An Auto-Adaptive Many-Objective Evolutionary Computing Framework. Evol. Comput. 21, 231–259 (2012).
  7. Breiman, L. Classification and Regression Trees. (Wadsworth International Group, 1984).
  8. Reed, P. M., Hadka, D., Herman, J. D., Kasprzyk, J. R. & Kollat, J. B. Evolutionary multiobjective optimization in water resources: The past, present, and future. Adv. Water Resour. 51, 438–456 (2013).
  9. Bankes, S. Exploratory Modeling for Policy Analysis. Oper. Res. 41, 435–449 (1993).
  10. Deb, K., Pratap, A., Agarwal, S. & Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. Evol. Comput. IEEE Trans. On 6, 182–197 (2002).
  11. Sierra, M. R. & Coello, C. C. Improving PSO-based multi-objective optimization using crowding, mutation and e-dominance. in Evolutionary multi-criterion optimization 3410, 505–519 (Springer, 2005).
  12. Lempert, R. J., Groves, D. G., Popper, S. W. & Bankes, S. C. A general, analytic method for generating robust strategies and narrative scenarios. Manag. Sci. 52, 514–528 (2006).
  13. Kasprzyk, J. R., Nataraj, S., Reed, P. M. & Lempert, R. J. Many objective robust decision making for complex environmental systems undergoing change. Environ. Model. Softw. (2013). doi:10.1016/j.envsoft.2012.12.007
  14. Hadka, D., Herman, J., Reed, P. & Keller, K. An open source framework for many-objective robust decision making. Environ. Model. Softw. 74, 114–129 (2015).


Enhance your (Windows) remote terminal experience with MobaXterm

Jazmin and Julie recently introduced me to a helpful program for Windows called “MobaXterm” that has significantly sped up my workflow when running remotely on the Cube (our cluster here at Cornell). MobaXterm bills itself as an “all in one” toolbox for remote computing. The program’s interface includes a terminal window as well as a graphical SFTP browser. You can link the terminal to the SFTP browser so that as you move through folders on the terminal the browser follows you. The SFTP browser allows you to view and edit files using your text editor of choice on your windows desktop, a feature that I find quite helpful for making quick edits to shell scripts or pieces of code as go.


A screenshot of the MobaXterm interface. The graphical SFTP browser is on the left, while the terminal is on the right (note the checked box in the center of the left panel that links the browser to the terminal window).


You can set up a remote Cube session using MobaXterm with the following steps:

  1. Download MobaXterm using this link
  2.  Follow the installation instructions
  3. Open MobaXterm and select the “Session” icon in the upper left corner.
  4. In the session popup window, select a new SSH session in the upper left, enter “thecube.cac@cornell.edu” as the name of the remote host and enter your username.
  5. When the session opens, check the box below the SFTP browser on the left to link the browser to your terminal
  6. Run your stuff!

Note that for a Linux system, you can simply link your file browser window to your terminal window and get the same functionality as MobaXterm. MobaXterm is not available for Mac, but Cyberduck and Filezilla are decent alternatives. An alternative graphical SFTP browser for Windows is WinSCP, though I prefer MobaXterm because of its linked terminal/SFTP interface.

For those new to remote computing, ssh or UNIX commands in general, I’d recommend checking out the following posts to get familiar with running on a remote cluster:




Developing parallelised code with MPI for dummies, in C (Part 2/2)

My last post introduced MPI and demonstrated a simple example for using it to parallelize a code across multiple nodes. In the previous example, we created an executable that could be run in parallel to complete the same task multiple times. But what if we want use MPI to on a code that has both parallel and serial sections, this is inevitable if we want everything to be self-contained.

As I tried to stress last time, MPI runs multiple versions of the same executable each with independent memory (please read this sentence three times, it is very different from how you learned to code). If you wish to share memory, you must explicitly send it. This allows no scope for a serial section!

We must, instead, imitate serial sections of code by designating a ‘root’ processor, conventionally the processor with rank = 0. We trap the ‘serial section’ inside an if-statement designating the root and send data to it from other processors when required.

Sending Data

I will build on the previous two examples by creating a loop that calculates the mean of a set of random numbers, we will parallelize the random number generation but leave the mean calculation in ‘serial’ (i.e. to be calculated by the root processor).

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(){
int size, rank,i;
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
double randSum = 0;
srand(rank + 1);
double myRand = (double)rand()/(double)RAND_MAX;
printf("I evaluated rank = %d, myRand = %f\n",rank,myRand);
if (rank == 0){
   for (i=0;i<size;i++){
      if (i > 0){
         MPI_Recv(&myRand, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
      randSum = randSum + myRand;
   printf("Mean random number = %f\n",randSum/size);
   MPI_Send(&myRand, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD);
return 0;

For contrast with a regular serial version:

#include <stdio.h>
#include <stdlib.h>
int main(){
int rank,size = 10;
double randSum = 0;
srand(rank + 1);
double myRand = (double)rand()/(double)RAND_MAX;
printf("I evaluated rank = %d, myRand = %f\n",rank,myRand);
if (rank == 0){
   for (rank = 0; rank < size; ++rank){
      srand(rank + 1);
      randSum = randSum + (double)rand()/(double)RAND_MAX;
      printf("I evaluated rank = %d, myRand = %f\n",rank,myRand);}
      printf("Mean random number = %f\n",randSum/size);
return 0;

We introduce here two new MPI functions:

MPI_Send(data address, size of data, MPI type of data, processor destination (by rank), tag, communicator) sends the random number to the root (rank 0).

MPI_Recv(data address, size of data, MPI type of data, processor source (by rank), tag, communicator, status suppression) tells a processor, in our case the root, to receive data from a processor source.

Both MPI_Send and MPI_Recv prevent code from progressing further until the send-> receive is resolved. i.e. when rank = 5 reaches send, it will wait until rank = 0 has received data from ranks 1:4 before sending the data and progressing further.

Broadcasting data

Sending data between processors in MPI is moderately expensive, so we want to call send/recv as few times as possible. This means that vectors should be sent in one, rather than in a loop. It also means that when sending data from one processor to all (most commonly from the root), it is more efficient to use the built in ‘broadcast’ rather than sending to each processor individually (the reason for this is explained in: http://mpitutorial.com/tutorials/mpi-broadcast-and-collective-communication/).

Below we will introduce an example where the root broadcasts how many random numbers each processor should create, these vectors of random numbers are then sent back to the root for mean calculation.

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(){
int size, rank,i,j;
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
double randSum = 0;
int numRands;
if (rank == 0){
   numRands = 5;
double *myRand = calloc(numRands,sizeof(double));
for (i =0;i<numRands;++i){
   myRand[i] = (double)rand()/(double)RAND_MAX;
if (rank == 0){
   for (i=0;i<size;i++){
      printf("root received from rank %d the vector: ",i);
      if (i > 0){
         MPI_Recv(myRand, numRands, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
      for (j=0;j<numRands;j++){
         printf("%f ",myRand[j]);
         randSum = randSum + myRand[j];
   printf("Mean random number = %f\n",randSum/(size*numRands));
   MPI_Send(myRand, numRands, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD);
return 0;

We have new used the new MPI function:

MPI_Bcast(data address, size of data, MPI data type, processor source, communicator) broadcasts from the processor source (in our case the root) to all other processors, readers should note the common mistake of using MPI_Recv instead of MPI_Bcast to receive the data; MPI_Bcast is the function to both send and receive data.

Another simple but common mistake that readers should note is the passing of dynamically sized data; note how myRand is sent without the & address operator (because the variable itself is an address) while numRands is sent with the & operator.

Concluding remarks

This tutorial should set you up to use much of the MPI functionality you need to parallelise your code. Some natural questions that may have arisen while reading this tutorial that we did not cover:

MPI_Barrier – while MPI_Send/Recv/Bcast require processors to ‘catch up’, if you are writing and reading data to files (particularly if a processor must read data written by another processor) then you need to force the processors to catch up; MPI_Barrier achieves this.

tags – you can contain metadata that can be described by integers (e.g. vector length or MPI data type) in the ‘tag’ option for MPI_Send/Recv.

MPI_Status – this structure can contain details about the data received (rank, tag and length of the message), although much of the time this will be known in advance. Since receiving the status can be expensive, MPI_STATUS_IGNORE is used to supress the status structure.

All of the MPI functions described in this tutorial are only a subset of those available that I have found useful in parallelizing my current applications. An exhaustive list can be found at: http://www.mpich.org/static/docs/latest/. If you want to go beyond the functions described in this post (or you require further detail) I would recommend: http://mpitutorial.com/tutorials/.

Part (1/2): https://waterprogramming.wordpress.com/2017/07/21/developing-parallelised-code-with-mpi-for-dummies-in-c-part-12/

Debugging: Interactive techniques and print statements

Trying to fix a bug in a program can be a long and tedious process. Learning what techniques to use can save you from headaches and wasted time. Starting out, I figured that a single best tool for debugging must exist. But which was it? Were print statements the answer? Or was it interactive debugging (like that described in “Debugging in Python (using PyCharm) Parts 1, 2, and 3)?

Speaking to different people and reading forums, I could not find a consensus. Some people would refer to print statements as “a lost art”, while others criticize print statements as “poor substitutes [for interactive debugging] and, in fact, at times dangerous tools.” I’ve read many accounts of experienced programmers who swear by print statements but feel embarrassed to admit it. As if it were taboo to use such a simple technique even if it was effective at solving their problem.

There are strong opinions on either side, but based on my experiences, I believe the answer lies somewhere in the middle. Interactive debugging and print statements are two effective techniques which each have a time and a place. I think this post summed up my opinion on the matter well:

“Print statements and debugger are not mutually exclusive. They are just different tools available to you in order to locate/identify bugs. There are those who will claim how they never touch a debugger and there are those who do not have a single logging/print statement anywhere in the code that they write. My advice is that you do not want to be in either one of those groups.” 

Below, I’ve compiled some opinions which highlight the benefits and drawbacks of each technique. Hopefully, these can serve as a guide for your future debugging needs!

Interactive Debugging

  • Less recompiling: with interactive debugging you can change variable values while the program is running. This gives you the freedom to test new scenarios without the need to recompile.
  • Get custom notifications: set up watch variables which notify you when that variable changes
  • Control: step into functions of interest or skip over functions that are not important for debugging. You can also set conditional breakpoints which are only activated when a certain value is triggered.
  • Use an IDE or the command line: debugging can be performed in an IDE (interactive development environment) or from the command line. IDEs are generally preferred, but there are instances—such as when using a command line interface to access a supercomputer—when they cannot be used. In these circumstances, interactive debugging can still be performed from the command line with tools such as GDB. Furthermore, most of these tools can be run with a text user interface (e.g. $gdb –tui).
  • Travel back in time: view the call stack at any time and inspect values up the stack trace. This can be an especially helpful feature when determining why a program has crashed. For example, see “What to Do After a Crash” in this GDB post.
  • Scales well with large projects: Although I don’t have much experience in this area, this post claims that interactive debugging is better suited for large projects or those with parallel code.
  • No clean-up: unlike print statements which often must be cleaned up (deleted or commented out) after debugging is done, there is no trace left behind from interactive debugging.

Print Statements

  • Reproducibility: leaving print statements in your code can help you reproduce past debugging sessions or help collaborators that may need to debug the program in the future. And instead of commenting out or deleting these print statements when not in use, they can be placed within an if-statement with a debugging flag that can be turned on and off (e.g. if (debug == TRUE) print <value>).
  • Consistency between environments: interactive debugging can be done via the command line, but the experience does not generally compare to the same task in an IDE. Print statements, however, give the user a consistent experience across environments.
  • Permanent record-keeping: when an interactive debugging session is over, there is no record of the information that the user came across. On the other hand, print statements can be used to create an extensive diagnostic report of the program. The user can then analyze this report to gain more insight about the program at any time in the future.
  • Easy to use: print statement are simple to use and understand. Although interactive debugging has nice bells and whistles, there is a learning curve for new users.

Thanks for reading and please feel free to edit and add your own thoughts!




Developing parallelised code with MPI for dummies, in C (Part 1/2)

Parallel computing allows for faster implementation of a code by enabling the simultaneous execution of multiple tasks. Before we dive in to how parallelisation of a code is achieved, let’s briefly review the components that make up a high performance computing (HPC) cluster (it should be noted that you can parallelise code on your own computer, but this post will focus on parallelisation on clusters).  High performance computing clusters are usually comprised of a network of individual computers known as nodes that function together as a single computing resource as shown in Figure 1. Each node has some number of processors (the chip within a node that actually executes instructions) and modern processors may contain multiple cores, each of which can execute operations independently. Processors performing tasks on the same node have access to shared memory, meaning they can write and reference the same memory locations as they execute tasks. Memory is not shared between nodes however, so operations that run on multiple nodes use what’s known as distributed-memory programming. In order to properly manage tasks using distributed memory, nodes must have a way to pass information to each other.


Figure 1: One possible configuration of a HPC cluster, based on the Cornell CAC presentation linked in the following paragraph.

Parallelization is commonly performed using OpenMP or MPI.  OpenMP (which stands for Open Multi-Processing) parallelises operations by multithreading, running tasks on multiple cores/units within a single node using shared memory.  MPI (which stands for Message Passing Interface) parallelises tasks by distributing them over multiple nodes (also possible over multiple processors) within a network, utilizing the distributed memory of each node. This has two practical implications; MPI is needed to scale to more nodes but communication between tasks is harder. The two parallelisation methods are not mutually exclusive; you could use OpenMP to parallelise operations on individual network nodes and MPI to communicate between nodes on the network (example: http://www.slac.stanford.edu/comp/unix/farm/mpi_and_openmp.html). Both OpenMP and MPI support your favourite languages (C, C++, FORTRAN, Python, but not Java – perfect!). The remainder of this post will focus on implementing MPI in your code, for references on using OpenMP, see this presentation by the Cornell Center for Advanced Computing: https://www.cac.cornell.edu/education/training/ParallelMay2012/ProgOpenMP.pdf.

So how does MPI work?

MPI creates multiple instances of your executable and runs one on each processor you have specified for use. These processors can communicate with each other using specific MPI functions. I will explain a few of the more basic functions in this post.

What does MPI code look like?

A parallel loop (compiles with: mpicc -O3 -o exampleParallel.exe exampleParallel.c -lm)

int main(){ 
int size, rank; 
MPI_Comm_size(MPI_COMM_WORLD, &size); 
MPI_Comm_rank(MPI_COMM_WORLD, &rank); 
printf("I evaluated rank = %d\n",rank); 
return 0;

The parallel loop can be distributed over 10 processors (ppn) in one node and submitted to the Cube as a Unix script:

#PBS -N exampleParallel
#PBS -l nodes=1:ppn=10
#PBS -l walltime=0:00:05
#PBS -j oe
#PBS -o output
mpirun ./exampleParallel.exe

This can be contrasted with a serial loop (compiles with: gcc -O3 -o exampleSerial.exe exampleSerial.c):

int main(){ 
int rank, size = 10; 
for (rank = 0; rank < size; rank++){ 
   printf("I evaluated rank = %d\n",rank);
return 0;}

Let’s have a look at what each line of the parallel code is doing;

MPI_Init is the first MPI function that must be called in a piece of code in order to initialize the MPI environment. NOTE! MPI_Init does not signify the start of a ‘parallel section’, as I said earlier, MPI does not have parallel sections, it runs multiple instances of the same executable in parallel.

MPI_Comm_size populates an integer address with the number of processors in a group, in our example, 10 (i.e. num nodes * processors per node).

MPI_Comm_rank populates an integer address with the processor number for the current instance of the executable. This ‘rank’ variable is the main way to differentiate between different instances of your executable, it is equivalent to a loop counter.

MPI_Finalize is the last MPI function that must be called in a piece of code, it terminates the MPI environment. As far as I can tell it is only recommended to have return functions after MPI_Finalize.

This simple example highlights the difference between MPI and serial code; that each executable is evaluated separately in parallel. While this makes MPI hard to code, and sharing data between parallel processes expensive, it also makes it much easier to distribute across processors.

Next week we will present examples demonstrating how to send data between nodes and introduce serial sections of code.

Part (2/2): https://waterprogramming.wordpress.com/2017/07/28/developing-parallelised-code-with-mpi-for-dummies-in-c-part-22/

References (each of these are a useful link if you would like to learn more about parallel computing and HPC):

Parallel Programming Concepts and High-Performance Computing, a module in the Cornell Virtual Workshop

CAC Glossary of HPC Terms: https://cvw.cac.cornell.edu/main/glossary

Dynamic memory allocation in C++

To have success creating C++ programs, it’s essential to have a good understanding of the workings and implementation of dynamic memory,  which is the “art” of performing manual memory management.  When developing code in C++, you may not always know how much memory you will need before your program runs.  For instance, the size of an array may be unknown until you execute the program.

Introduction to pointers

In order to understand dynamic memory allocation, we must first talk about pointers. A pointer is essentially a variable that stores the address in memory of another variable. The pointer data type is a long hexadecimal number representing the memory address.   This address can be accessed using an ampersand (&) exemplified in the following script:

//Accessing a pointer:
int age=43;
cout << &age << endl;
// The output would be something like: 0x7fff567ecb68

Since a pointer is also a variable, it requires declaration before being used .   The declaration consists on giving it a name, hopefully following the good code style convention and declaring the data type that it “points to” using an asterisk (*) like so:

//Declaring pointers of different data types:
int *integerPointer;
double *doublePointer;
float *floatPointer;
char *charPointer;

The pointer’s operators

In summary, the two pointer operators are: address-of operator(&) which returns the memory address and contents-of operator (*) which returns the value of the variable located at that address.

// Example of pointer operators:
float variable=25.6;
float *pointer;
pointer= &variable;
cout << variable << endl; //outputs 25.6, the variable’s value
cout << pointer << endl; //outputs 0x7fff5a774b68, the variable’s location in memory
cout << *pointer << endl; // outputs 25.6, value of the variable stored in that location

This last operator is also called deference operator which enables you to access directly the variable the pointer points to, which you can then use for regular operations:

float width = 5.0;
float length = 10.0;
float area;
float *pWidth = &width;
float *pLength = &length;

//Both of the following operations are equivalent
area = *pWidth * *pLength;
area = width * length;
//the output for both would be 50.

Deferencing the pointers *pWidth and *pLength represents exactly the same as the variables width and length, respectively.

Memory allocation in C++

Now that you have some basic understanding of pointers, we can talk about memory allocation in C++.  Memory in C++ is divided in tow parts: the stack and the heap.  All variables declared inside the function use memory from the stack whereas unused memory that can be used to allocate memory dynamically is stored in the heap.

You may not know in advance how much memory you need to store information in a variable. You can allocate memory within the heap of a variable of a determined data type using the new operator like so:

new float;

This operator allocates the memory necessary for storing a float on the heap and returns that address. This address can also be stored in a pointer, which can then be deferenced to access the variable:

float *pointer = new float; //requesting memory
*pointer = 12.0; //store value
cout << *pointer << endl; //use value
delete pointer;// free up memory
// this is now a dangling pointer
pointer= new float // reuse for new address

Here, the pointer is stored in the stack as a local variable, and holds the allocated address in the heap as its value. The value of 12.0 is then stored at the address in the heap. You can then use this value for other operations. Finally, the delete statement releases the memory when it’s no longer needed; however, it does not delete the pointer since it was stored in the stack. These type of pointers that point to non-existent memory are called dangling pointers and can be reused.

Dynamic memory and arrays

Perhaps the most common use of dynamic memory allocation are arrays. Here’s a brief example of the syntax:

int *pointer= NULL; // initialized pointer
pointer= new int[10] // request memory
delete[]pointer; //delete array pointed to by pointer

The NULL pointer has a value of zero, you can declare a null pointer when you do not have the address to be assigned.

Finally, to allocate and release memory for multi-dimensional arrays, you basically use an array of pointers to arrays, it sounds confusing but you can do this using the following  sample method:

int row = 3;
int col = 4;
double **p  = new double* [row]; // Allocate memory for rows

// Then allocate memory for columns
for(int i = 0; i < col; i++) {
    p[i] = new double[col];

//Release memory
for(int i = 0; i < row; i++) {
   delete[] p[i];
delete [] p;

I hope this quick overview provides a starting point on tackling your  C++ memory allocation challenges.