Parallel Programming Part 2: Demonstrating OpenMP constructs with a simple test case

Parallel Programming Part 2: Demonstrating OpenMP constructs with a simple test case

In Parallel Programming Part 1, I introduced a number of basic OpenMP directives and constructs, as well as provided a quick guide to high-performance computing (HPC) terminology. In Part 2 (aka this post), we will apply some of the constructs previously learned to execute a few OpenMP examples. We will also view an example of incorrect use of OpenMP constructs and its implications.

TheCube Cluster Setup

Before we begin, there are a couple of steps to ensure that we will be able to run C++ code on the Cube. First, check the default version of the C++ compiler. This compiler essentially transforms your code into an executable file that can be easily understood and implemented by the Cube. You can do this by entering

g++ --version

into your command line. You should see that the Cube has G++ version 8.3.0 installed, as shown below.

Now, create two files:

  • openmp-demo-print.cpp this is the file where we will perform a print demo to show how tasks are distributed between threads
  • openmp-demo-access.cpp this file will demonstrate the importance of synchronization and the implications if proper synchronization is not implemented

Example 1: Hello from Threads

Open the openmp-demo-print.cpp file, and copy and paste the following code into it:

#include <iostream>
#include <omp.h>
#include <stdio.h>

int main() {

    int num_threads = omp_get_max_threads(); // get max number of threads available on the Cube
    omp_set_num_threads(num_threads);

    printf("The CUBE has %d threads.\n", num_threads);


    #pragma omp parallel 
    {
        int thread_id = omp_get_thread_num();
        printf("Hello from thread num %d\n", thread_id);
    }

    printf("Thread numbers written to file successfully. \n");

    return 0;
}

Here is what the code is doing:

  • Lines 1 to 3 are the libraries that need to be included in this script to enable the use of OpenMP constructs and printing to the command line.
  • In main() on Lines 7 and 8, we first get the maximum number of threads that the Cube has available on one node and assign it to the num_threads variable. We then set the number of threads we want to use.
  • Next, we declare a parallel section using #pragma omp parallel beginning on Line 13. In this section, we are assigning the printf() function to each of the num_threads threads that we have set on Line 8. Here, each thread is now executing the printf() function independently of each other. Note that the threads are under no obligation to you to execute their tasks in ascending order.
  • Finally, we end the call to main() by returning 0

Be sure to first save this file. Once you have done this, enter the following into your command line:

g++ openmp-demo-print.cpp -o openmp-demo-print -fopenmp

This tells the system to compile openmp-demo-print.cpp and create an executable object using the same file name. Next, in the command line, enter:

./openmp-demo-print

You should see the following (or some version of it) appear in your command line:

Notice that the threads are all being printed out of order. This demonstrates that each thread takes a different amount of time to complete its task. If you re-run the following script, you will find that the order of threads that are printed out would have changed, showing that there is a degree of uncertainty associated with the amount of time they each require. This has implications for the order in which threads read, write, and access data, which will be demonstrated in the next example.

Example 2: Who’s (seg)Fault Is It?

In this example, we will first execute a script that stores each thread number ID into an array. Each thread then accesses its ID and prints it out into a text file called access-demo.txt. See the implementation below:

#include <iostream>
#include <omp.h>
#include <vector>

using namespace std;

void parallel_access_demo() {
    vector<int> data;
    int num_threads = omp_get_max_threads();

    // Parallel region for appending elements to the vector
    #pragma omp parallel for
    for (int i = 0; i < num_threads; ++i) {
        // Append the thread ID to the vector
        data.push_back(omp_get_thread_num()); // Potential race condition here
    }
}

int main() {
    // Execute parallel access demo
    parallel_access_demo();
    printf("Code ran successfully. \n");

    return 0;
}

As per the usual, here’s what’s going down:

  • After including the necessary libraries, we use include namespace std on Line 5 so that we can more easily print text without having to used std:: every time we attempt to write a statement to the output file
  • Starting on Line 7, we write a simple parallel_access_demo() function that does the following:
    • Initialize a vector of integers very creatively called data
    • Get the maximum number of threads available per node on the Cube
    • Declare a parallel for region where we iterate through all threads and push_back the thread ID number into the data vector
  • Finally, we execute this function in main() beginning Line 19.

If you execute this using a similar process shown in Example 1, you should see the following output in your command line:

Congratulations – you just encountered your first segmentation fault! Not to worry – this is by design. Take a few minutes to review the code above to identify why this might be the case.

Notice Line 15. Where, each thread is attempting to insert a new integer into the data vector. Since push_back() is being called in a parallel region, this means that multiple threads are attempting to modify data simultaneously. This will result in a race condition, where the size of data is changing unpredictably. There is a pretty simple fix to this, where we add a #pragma omp critical before Line 15.

As mentioned in Part 1, #pragma omp critical ensures that each thread executes its task one at a time, and not all at once. It essentially serializes a portion of the parallel for loop and prevents race conditions by ensuring that the size of data changes in a predictable and controlled manner. An alternative would be to declare #pragma omp single in place of #pragma omp parallel for on Line 12. This essentially delegates the following for-loop to a single thread, forcing the loop to be completed in serial. Both of these options will result in the blissful outcome of

Summary

In this post, I introduced two examples to demonstrate the use of OpenMP constructs. In Example 1, I demonstrated a parallel printing task to show that threads do not all complete their tasks at the same time. Example 2 showed how the lack of synchronization can lead to segmentation faults due to race conditions, and two methods to remedy this. In Part 3, I will introduce GPU programming using the Python PyTorch library and Google Colaboratory cloud infrastructure. Until then, happy coding!

Legacy Code Reborn: Wrapping C++ into MATLAB Simulink

Did you ever have the need to include legacy code in your own projects without rephrasing it into your new framework? In Matlab Simulink, there is a way to include C++ code within your system. This post will teach you how to fit legacy C++ code into your Simulink projects.

Let’s assume you have been given a C++ code in which neural networks are implemented and already trained (as an example, but this guide extends to any generic class/function). Let’s also assume that your goal is to include them within a larger control system. Recreating them from scratch and trying to translate the original code into Matlab could be an unwise choice, as the implementation error could be around the corner. The best solution is to include the neural networks directly as they are, wrapping them in a nice box. How to proceed?

Requirements
To fully appreciate the following post, you should have a basic understanding of object-oriented programming, C++, and Matlab Simulink.

Disclaimer
The one I will show you differs from the fastest solution you could think about. Indeed, there are easier ways to wrap your C++ code if it fulfills certain limitations declared in the C Function Block documentation. Still, this solution is the most complete you can think of! And always works.
Here below I recall the Matlab Simulink limitations from the official documentation.
The C Function block cannot be used to directly access all C++ classes. These types of classes are not supported:

  • Template classes
  •  Standard Template Library (STL) containers
  •  Classes with private constructors and destructors

If your code falls in one of the above cases, you should be interested in reading this post; otherwise, you can think about following the more straightforward way.

1. Installing the C++ compiler
To execute C++ code, you need a C++ compiler. If you already have one, skip this section. Otherwise, you have two possibilities, in both the cases, you will have a functioning C++ compiler on your device:

  1. Installing the standard open-source C++ compiler.
    In this case, I recommend MINGW (even if it is pretty old). You can install it simply by following the software installation guidelines.
  2. Installing MINGW as Matlab adds-on. 
    Another (even easier) possibility is installing MINGW directly from Matlab as a standard Matlab toolbox. 

To let Matlab know that you will need to compile C++ code, you need to specify (each time you execute the code) the following command in Matlab terminal: 

mex -setup C++

This, if the compiler has been correctly installed (in my case through Microsoft Visual Studio 2022), should prompt out:

MEX configured to use 'Microsoft Visual C++ 2022' for C++ language compilation.

or, if you installed through Matlab adds-on:

MEX configured to use 'MinGW64 Compiler (C++)' for C++ language compilation. 

2. The C Function Block

Here is where things start to be more challenging.
The starting point is understanding which Simulink block we are going to use to wrap around our C++ executed code. In this case, it is represented by the C Function Block.

The block has the following structure:

  • Output code: here is where you will generate your output. The following is a candidate example.
    output = predict(myNN,input1,input2);
    Here we invoke the predict method passing as arguments two possible inputs and returning the output (our hopefully prediction). The special guest is myNN. This is the key of the solution to the problem and represents an instance of the class of the object we are going to wrap.
  • Start code: here, you instantiate the myNN object. It is the equivalent of the constructor in the C++ class.
    myNN = createNN();
  • Terminate code: here is where you destroy your object (equivalent to the destructor).
    deleteNN(myNN);
  • Symbols: here, you declare your input, output and myNN object. You can specify names (input1,input2, myNN) and data types (int32,double…). Remark: it is fundamental that the type of the myNN object is VoidPointer. This depends on the fact that we are building a wrapper.

To conclude the inspection of the C Function block, we must look at the settings. Here we must specify all the source files (.cpp) and header files (.h) inside the corresponding sections. The files must be in the same folder of the Simulink file, unless you specify the directory in the corresponding field.

An important remarkthe settings are common for all the C Function blocks you will have inside the project. Thus, you can’t structure your application thinking that each C Function block can have separate and independent source files. For this reason, even if you need to instantiate multiple classes of different natures inside your scheme, in the end, you will need one unique wrapper (with corresponding .cpp and .h files).

3. Writing your C++ Wrapper

After exploring the C Function block, we are ready to start writing our wrapper. Let’s assume you have legacy code that implements your neural network in source/header files. What you must do is: 

  1. Write your NN class, whose methods will be invoked by the wrapper. 
  2. Write the wrapper.

3a. Writing the class

We will call the files uniqueNN.cpp/uniqueNN.h.

#include "uniqueNN.h"
#include "ann.h"
// include here all additional dependences

myNN::myNN() {
}

myNN::~myNN() {
    delete ANN;
}

double myNN::predict(double input1, double input2) {
// copy-paste here your legacy code
}

In this class, you can see constructor, destructor, and predict method. Inside that method, you would copy-paste your legacy code. In there, you can do whatever you want. Remark: you don’t need to fulfill the limitations of the C Function Block because this class will be invoked by a “protective layer” represented by our wrapper!

Suppose your system has multiple components, and a different network with a different structure represents each. You can still use the very same class and implement a corresponding method for each of your original networks to produce the corresponding network’s prediction. 

Clearly, you must specify all your needed files for executing your legacy code in source and header files. 

3b. Implementing the wrapper

#include "uniqueNN.h"
// include here all additional standard libraries 

// Neural Networks

void* createNN()
{
    return new myNN();
}

void deleteNN(void* obj)
{
    delete static_cast<myNN*>(obj);
}

double predict(void* obj, double input1, double input2)
{
    myNN* nn = static_cast<myNN*>(obj);
    return nn-> predict(input1, input2);
}

Here we have three methods. These are the ones we have seen in the C Function Block! They will be invoked by Simulink.

As you can see, we find again the constructor, the destructor, and the predict method. For the sake of conciseness, I suggest using the same name for the method implemented in the class and the method implemented in the wrapper, but it is not mandatory. Notice that you must pass as an argument for the predict method in the wrapper the void pointer (your class object you are instantiating, that is the one you declared in the C Function Block!).

Remark: as you might have imagined, for each method implemented in the class (that in our example represented one different legacy neural network), you should create one corresponding method in the wrapper.

A final remark: the wrapper needs only the header of your class (for us uniqueNN.h) (not all the ones from legacy code, so do not put them because it is redundant!)

Here below the quality of the reconstruction of the legacy code through the C++ wrapper. The small differences that you might experience are due to internal data type differences between the legacy code simulator and the new one, absolutely negligible.

The figure above shows that the predictions from the neural network are nearly identical for both the wrapper-based simulink and original C++ implementations.

That’s it! Thank you for reading and if you have further questions, please feel free to contact me, here is my LinkedIn profile!

MORDM Basics V: WaterPaths Tutorial

The MORDM framework poses four fundamental questions that each corresponds to a section within the framework shown in Figure 1:

  1. What actions can we take to address the problem?
  2. What worlds are we implementing those actions in?
  3. What are acceptable levels of performance for those actions in all worlds?
  4. What controls failures across those worlds?
Figure 1: The MORDM framework (Herman et. al., 2013).

In the previous blog post, we used state-aware ROF triggers to implement drought mitigation and supply management actions for one hydroclimatic and demand scenario. In the first MORDM blog post, we generated multiple deeply-uncertain synthetic realizations of hydroclimatic and demand scenarios. The next logical question would be: how do these actions fare across all worlds and across time?

Setting up the system

To explore this question, we will expand the scope of our test case to include Cary’s two neighboring utilities – Durham and Raleigh – within the Research Triangle. The three utilities aim to form a cooperative water supply and management agreement in which they would like to optimize the following objectives, as stated in Trindade et al (2019):

  1. Maximize supply reliability, REL
  2. Minimize the frequency of implementing water use restrictions, RT
  3. Minimize the net present value of infrastructure investment, NPV
  4. Minimize the financial cost of drought mitigation and debt repayment, FC
  5. Minimize the worst first-percentile cost of the FC

In this example, each objective is a function of short-term risks of failure triggers (stROF) and long term risks of failure triggers (ltROF). The stROFs trigger short-term action, typically on a weekly or monthly basis. The ltROFs trigger action on a multi-year, sometimes decadal, timescale. Recall from a previous post that ROF triggers are state-aware, probabilistic decision rules that dictate how a system responds to risk. Here, we optimize for a Pareto-approximate set of ROF triggers (or risk thresholds) that will result in a range of performance objective tradeoffs. An example of a stROF is the restriction ROF trigger we explored in the post prior to this one.

In addition, an example of an ltROF would be the infrastructure construction ltROF. When this ltROF threshold is crossed, an infrastructure project is ‘built’. Potential infrastructure projects are ordered in a development pathway (Zeff et al 2016), and the ltROF triggers the next infrastructure option in the sequence. The term ‘pathway’ is used as these infrastructure sequences are not fixed, but are state-dependent and can be shuffled to allow the optimization process to discover multiple potential pathways.

Over the next two blog posts, we will explore the interaction between the water-use restriction stROF and the infrastructure ltROF, and how this affects system performance. For now, we will simulate the Research Triangle test case and optimize for the ‘best’ set of ROF triggers using WaterPaths and Borg MOEA.

Using WaterPaths and Borg MOEA

We will be using the WaterPaths utility planning and management tool (Trindade et al, 2020) to simulate the performance of the Research Triangle test case. For clarification, the default simulation within WaterPaths is that of the Research Triangle. The folder that you will be downloading from GitHub already contains all the input, uncertainty and decision variable files required. This tool will be paired with the Borg MOEA (Hadka and Reed, 2013) to optimize the performance objectives in each simulation to obtain a set of Pareto-optimal long- and short-term ROF triggers that result in a non-dominated set of tradeoffs. Jazmin made a few posts that walks through compiling Borg and the installation of different parallel and series versions of Borg that might be helpful to try out before attempting this exercise.

Once you have Borg downloaded and set up, begin by downloading the GitHub repository into a file location on your machine of choice:

git clone https://github.com/lbl59/WaterPaths.git

Once all the files are downloaded, compile WaterPaths:

make gcc

To optimize the WaterPaths simulation with Borg, first move the Borg files into the main WaterPaths/borg folder:

mv -r Borg WaterPaths/borg/

This line of code will make automatically make folder called borg within the WaterPaths folder, and copy all the external Borg files into it.

Next, cd into the /borg folder and run make in your terminal. This should a generate a file called libborgms.a. Make a folder called lib within the WaterPaths folder, and move this file into the WaterPaths/lib folder

cp libborgms.a ../lib/

Next, cd back into the main folder and use the Makefile to compile the WaterPaths executable with Borg:

make borg

Great! Now, notice that the /rof_tables_test_problem folder is empty. You will need to generate ROF tables within the WaterPaths environment. To do so, run the generate_rof_tables.sh script provided in the GitHub repository into your terminal. The script provided should look like this:

#!/bin/bash
#SBATCH -n 16 -N 2 -p normal
#SBATCH --job-name=gen_rof_tables
#SBATCH --output=output/gen_rof_tables.out
#SBATCH --error=error/gen_rof_tables.err
#SBATCH --time=04:00:00
#SBATCH --mail-user=YOURUSERNAME@cornell.edu
#SBATCH --mail-type=all

export OMP_NUM_THREADS=16
cd $SLURM_SUBMIT_DIR
./waterpaths\
	-T ${OMP_NUM_THREADS}\
       	-t 2344\
       	-r 1000\
       	-d /YOURCURRENTDIRECTORY/\
       	-C 1\ 
	-m 0\
	-s sample_solutions.csv\
       	-O rof_tables_test_problem/\
       	-e 0\
       	-U TestFiles/rdm_utilities_test_problem_opt.csv\
       	-W TestFiles/rdm_water_sources_test_problem_opt.csv\
       	-P TestFiles/rdm_dmp_test_problem_opt.csv\
	-p false

Replace all the ‘YOUR…’ parameters with your system-specific details. Make two new folders: output/ and error/. Then run the script above by entering

sbatch ./generate_rof_tables.sh

into your terminal. This should take approximately 50 minutes. Once the ROF tables have been generated, run the optimize_sedento_valley.sh script provided. It should look like this:

#!/bin/bash
#SBATCH -n 16 -N 3 -p normal
#SBATCH --job-name=sedento_valley_optimization
#SBATCH --output=output/sedento_valley_optimization.out
#SBATCH --error=error/sedento_valley_optimization.err
#SBATCH --time=04:00:00
#SBATCH --mail-user=YOURUSERNAME@cornell.edu
#SBATCH --mail-type=all

DATA_DIR=/YOURDIRECTORYPATH/
N_REALIZATIONS=1000
export OMP_NUM_THREADS=16
cd $SLURM_SUBMIT_DIR

mpirun ./waterpaths -T ${OMP_NUM_THREADS}\
  -t 2344 -r ${N_REALIZATIONS} -d ${DATA_DIR}\
  -C -1 -O rof_tables_test_problem/ -e 3\
  -U TestFiles/rdm_utilities_test_problem_opt.csv\
  -W TestFiles/rdm_water_sources_test_problem_opt.csv\
  -P TestFiles/rdm_dmp_test_problem_opt.csv\
  -b true -o 200 -n 1000

As usual, replace all the ‘YOUR…’ parameters with your system-specific details. Run this script by entering

sbatch ./optimize_sedento_valley.sh

into the terminal. This script runs the Borg MOEA optimization for 1,000 function evaluations, and will output a .set file every 200 function evaluations. At the end of the run, you should have two files within your main folder:

  1. NC_output_MS_S3_N1000.set contains the Pareto-optimal set of decision variables and the performance objective values for the individual utilities and the overall region.
  2. NC_runtime_MS_S3_N1000.runtime contains information on the time it took for 1000 simulations of the optimization of the Research Triangle to complete.

The process should take approximately 1 hour and 40 minutes.

Summary

Congratulations, you are officially the Dr Strange of the Research Triangle! You have successfully downloaded WaterPaths and Borg MOEA, as well as run a simulation-optimization of the North Carolina Research Triangle test case across 1000 possible futures, in which you were Pareto-optimal in more than one. You obtained the .set files containing the Pareto-optimal decision variables and their respective performance objective values. Now that we have optimized the performance of the Research Triangle system, we are ready to examine the performance objectives and the Pareto-optimal ROF trigger values that result in this optimal set of tradeoffs.

In the next blog post, we will process the output of the .set files to visualize the objective space, decision variable space, and the tradeoff space. We will also conduct robustness analysis on the Pareto-optimal set to delve further into the question of “What are acceptable levels of performance for those actions in all worlds?”. Finally, we will explore the temporal interactions between the water use restrictions stROF and the infrastructure construction ltROF, and how supply and demand are both impacted by – and have an effect on – these decision variables.

References

Hadka, D., & Reed, P. (2013). Borg: An auto-adaptive Many-objective evolutionary computing framework. Evolutionary Computation, 21(2), 231–259. https://doi.org/10.1162/evco_a_00075

Trindade, B. C., Gold, D. F., Reed, P. M., Zeff, H. B., & Characklis, G. W. (2020). Water pathways: An open source stochastic simulation system for integrated water supply portfolio management and infrastructure investment planning. Environmental Modelling & Software, 132, 104772. https://doi.org/10.1016/j.envsoft.2020.104772

Trindade, B. C., Reed, P. M., & Characklis, G. W. (2019). Deeply uncertain pathways: Integrated multi-city regional water supply infrastructure investment and portfolio management. Advances in Water Resources, 134, 103442. https://doi.org/10.1016/j.advwatres.2019.103442

Zeff, H. B., Herman, J. D., Reed, P. M., & Characklis, G. W. (2016). Cooperative drought adaptation: Integrating infrastructure development, conservation, and water transfers into adaptive policy pathways. Water Resources Research, 52(9), 7327–7346. https://doi.org/10.1002/2016wr018771

Root finding in MATLAB, R, Python and C++

In dynamical systems, we are often interested in finding stable points, or equilibria. Some systems have multiple equilibria. As an example, take the lake problem, which is modeled by the equation below where Xt is the lake P concentration, at are the anthropogenic P inputs, Yt~LN(μ,σ2)  are random natural P inputs, b is the P loss rate, and q is a shape parameter controlling the rate of P recycling from the sediment. The first three terms on the right hand side make up the “Inputs” in the figure, while the last term represents the “Outputs.” A lake is in equilibrium when the inputs are equal to the outputs and the lake P concentration therefore is not changing over time.

lakeModel

For irreversible lakes this occurs at three locations, even in the absence of anthropogenic and natural inputs: an oligotrophic equilibrium, an unstable equilibrium (called the critical P threshold) and a eutrophic equilibrium (see figure below).

PcritDiagram

The unstable equilibrium in this case is called the critical P threshold because once it is crossed, it is impossible to return to an oligotrophic equilibrium by reducing anthropogenic and natural P inputs alone. In irreversible lakes like this, we would therefore like to keep the lake P concentration below the critical P threshold. How do we find the critical P threshold? With a root finding algorithm!

As stated earlier, the system above will be in equilibrium when the inputs are equal to the outputs and the P concentration is not changing over time, i.e. when

X_{t+1} - X_t = \frac{X^q_t}{1+X^q_t} - bX_t = 0

Therefore we simply need to find the zero, or “root” of the above equation.  Most of the methods for this require either an initial estimate or upper and lower bounds on the location of the root. These are important, since an irreversible lake will have three roots. If we are only interested in the critical P threshold, we have to make sure that we provide an estimate which leads to the unstable equilibrium, not either of the stable equilibria. If possible, you should plot the function whose root you are finding to make sure you are giving a good initial estimate or bounds, and check afterward to ensure the root that was found is the one you want! Here are several examples of root-finding methods in different programming languages.

In MATLAB, roots can be found with the function fzero(fun,x0) where ‘fun’ is the function whose root you want to find, and x0 is an initial estimate. This function uses Brent’s method, which combines several root-finding methods: bisection, secant, and inverse quadratic interpolation. Below is an example using the lake problem.

myfun = @(x,b,q) x^q/(1+x^q)-b*x;
b = 0.42;
q = 2.0;
fun = @(x) myfun(x,b,q);
pcrit = fzero(fun,0.75);

This returns pcrit = 0.5445, which is correct. If we had provided an initial estimate of 0.25 instead of 0.75, we would get pcrit = 2.6617E-19, basically 0, which is the oligotrophic equilibrium in the absence of anthropogenic and natural P inputs. If we had used 1.5 as an initial estimate, we would get pcrit = 1.8364, the eutrophic equilibrium.

MatlabScreenShot

In R, roots can be found with the function uniroot, which also uses Brent’s method. Dave uses this on line 10 of the function lake.eval in his OpenMORDM example. Instead of taking in an initial estimate of the root, this function takes in a lower and upper bound. This is safer, as you at least know that the root estimate will lie within these bounds. Providing an initial estimate that is close to the true value should do well, but is less predictable; the root finding algorithm may head in the opposite direction from what is desired.

b <- 0.42
q <- 2.0
pcrit <- uniroot(function(x) x^q/(1+x^q) - b*x, c(0.01, 1.5))$root

This returns pcrit = 0.5445145. Good, we got the same answer as we did with MATLAB! If we had used bounds of c(0.75, 2.0) we would have gotten 1.836426, the eutrophic equilibrium.

What if we had given bounds that included both of these equilibria, say c(0.5, 2.0)? In that case, R returns an error: ‘f() values at end points not of opposite sign’. That is, if the value returned by f(x) is greater than 0 for the lower bound, it must be less than 0 for the upper bound and vice versa. In this case both f(0.5) and f(2.0) are greater than 0, so the algorithm fails. What if we gave bounds for which one is greater than 0 and another less, but within which there are multiple roots, say c(-0.5,2.0)? Then R just reports the first one it finds, in this case pcrit = 0.836437, the eutrophic equilibrium. So it’s important to make sure you pick narrow enough bounds that include the root you want, but not roots you don’t!

RscreenShot

In Python, you can use either scipy.optimize.root or scipy.optimize.brentq, which is what Jon uses on line 14 here. scipy.optimize.root can be used with several different algorithms, but the default is Powell’s hybrid method, also called Powell’s dogleg method. This function only requires an initial estimate of the root.

from scipy.optimize import root
b = 0.42
q = 2.0
pcrit = root(lambda x: x**(1+x**q) - b*x, 0.75)

scipy.optimize.root returns an object with several attributes. The attribute of interest to us is the root, represented by x, so we want pcrit.x. In this case, we get the correct value of 0.54454. You can play around with initial estimates to see how pcrit.x changes.

PythonScreenShot1

Not surprisingly, scipy.optimize.brentq uses Brent’s method and requires bounds as an input.

from scipy.optimize import brentq as root
b = 0.42
q = 2.0
pcrit = root(lambda x: x**(1+x**q) - b*x, 0.01, 1.5)

This just returns the root itself, pcrit = 0.5445. Again, you can play around with the bounds to see how this estimate changes.

PythonScreenShot2

In C++, Dave again shows how this can be done in the function ‘main-lake.cpp’ provided in the Supplementary Material to OpenMORDM linked from this page under the “Publications” section. On lines 165-168 he uses the bisect tool to find the root of the function given on lines 112-114. I’ve copied the relevant sections of his code into the function ‘find_Pcrit.cpp’ below.


#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <iostream>
#include <boost/math/tools/roots.hpp>

namespace tools = boost::math::tools;
using namespace std;

double b, q, pcrit;

double root_function(double x) {
  return pow(x,q)/(1+pow(x,q)) - b*x;
}

bool root_termination(double min, double max) {
  return abs(max - min) <= 0.000001;
}

int main(int argc, char* argv[])
{
  b = 0.42;
  q = 2.0;

  std::pair<double, double> result = tools::bisect(root_function, 0.01, 1.0, root_termination);
  pcrit = (result.first + result.second)/2;
  cout << pcrit << endl;
}

This yields the desired root of pcrit = 0.54454, but of course, changing the bounds may result in different estimates. In case you missed it, the take home message is to be careful about your initial estimate and bounds ;).

CPPscreenShot

 

Getting started with C and C++

I’ve been learning C and C++ recently and I thought I’d share my experience learning these languages through a post. Prior to learning C and C++, I had experience in Python and Matlab, but this was my first foray into lower level languages. In my attempts to learn each language I made my way through several courses and books available online; some I found very helpful, others not so much. In this post I’ll detail my experiences with each resource and provide some tips that may help those planning on learning these languages.

Main takeaways

To learn the languages, I used four main resources. Online courses from Lynda.com, a book titled Learn C the Hard Way, a book commonly known as K&R 2 and tutorials from cplusplus.com. For those who do not have the time or desire to read this whole post, I found the following resources to be the most useful:

For C:

Learning C the Hard Way

K&R 2

For C++:

Foundations of Programming: Object – Oriented Design (you need a lynda.com login             to access)

Up and Running with C++ (again, you need a lynda.com login to access)

cplusplus.com

Everyone’s learning style is different, but I found that I learned the languages much faster by coding examples myself, rather than watching someone walk through a script. I also found that courses that taught straight from the command line were more effective than courses that taught through an IDE. When using an IDE, I often found myself spending more time working through glitches or nuances within the IDE than learning the languages themselves.

I’ll detail my experiences with each resource below. I’ll start with C resources, then discuss C++.

Resources for Learning C

C Essential Training – From Lynda.com

I started my training by taking course on Lynda.com titled “C Essential Training”. Lynda.com is an online educational website with thousands of videos, many of which focus on programming. The service is free to Cornell students and graduate students (though I checked and unfortunately neither PSU, UC Davis nor CU Boulder have agreements with the site). I found the course to be well structured and I felt that the instructor presented the material clearly and understandably. Despite this, I do not feel that the course did an effective job teaching me C. The main problem I had with the course was its reliance on the Eclipse IDE.  Eclipse seems like a fine IDE, but I don’t plan to use an IDE when writing code and I didn’t want to spend the time taking a separate course to learn its intricacies (though Lynda.com does have a full course devoted to Eclipse). Throughout the course, I kept finding myself having small Eclipse problems (e.g. not being able to change the project I was working on or having compiler errors) that were not hard to solve, but were never discussed in the lectures. I was able to solve each problem by doing some research online, but each little problem took me time to resolve and was mentally taxing. After spending 30 minutes looking up an Eclipse fix, I was not in the mood to go troubleshooting interesting C questions . Another problem with using Eclipse is that the user is never forced to write their own makefiles, an omission that seems like it could really hurt someone who plans to run C programs through the command line. In summary, I would not recommend taking this course unless you are either thoroughly versed in Eclipse or plan to write all of your code through Eclipse.

Learning C the Hard Way

The next resource I used to learn C was a book that Jazmin pointed me to called Learning C the Hard Way by Zed A. Shaw (after some poking around I found this had been mentioned previously on this blog). The book is laid out as a tutorial, where each chapter teaches a new C concept (it’s really a C course in book form).The author takes a slightly nontraditional teaching approach in that he makes you write the code first, then explains in detail what you just wrote. I found this very hands on teaching method extremely helpful. When I wrote the code myself, I was forced to focus on every detail of the code (something that is very important in a language like C). I also was able to learn which concepts were genuinely challenging for me and which concepts I needed more work on. When I watched the Lynda.com lectures, I’d often believe I understood a concept, only to find out later that I had misinterpreted the instructors lesson.

The book does not use an IDE, but rather writes code in a text editor (I used Sublime Text) and runs them on the Unix command line.  The author provides a succinct introduction to makefiles and how to use them, which was refreshing after the Eclipse based course that never mention makefiles or compilers.

Overall I found the teaching method employed by the book to be very effective, and I would highly recommend it to new C users. I should note however, that there seems to be some controversy surrounding the book. If you google “Learning C the hard way” you’ll find some very heated exchanges between the author and a blogger who criticized the book’s teaching methodology. The blogger had two main criticisms of the book; first that it over simplified and inaccurately presented key C concepts, and second, that the author failed to teach accepted programming standards for the C language. Mr. Shaw’s rebuttal was that the book’s purpose was to teach people get people comfortable with C and begin actually coding with it, then  once they are literate, have them go back and learn more about the language’s nuances. I personally agree with Mr. Shaw on this point, though I don’t have a background in computer science so my opinion is only that of an beginner. Many of the criticisms of the book seemed to come from the perspective of an advanced coder who is unable to see the language through the eyes of a beginner. Mr. Shaw’s explanations might be over simplified, but they do a good job demystifying many of the most foreign  aspects of C. I think that use of this book should be supplemented with other sources, especially documents on accepted C coding standards, but if you’re looking for a quick way to get on your feet with C and gain some confidence, then the book is a great resource.

I used a free beta version of the book which can be found here: http://c.learncodethehardway.org/book/ but you can also purchase the book from the author here: https://www.amazon.com/Learn-Hard-Way-Practical-Computational/dp/0321884922

I found the beta version to be just fine, but there were some minor errors and some sections were clearly under construction.

The blog post criticizing the book can be found here: http://hentenaar.com/dont-learn-c-the-wrong-way

K&R 2

A resource that I discovered through reading the exchanges between the Shaw and his critics was “The C Programming Language” by Brian W. Kernighan and Dennis M. Ritchie (commonly referred to as K&R 2 which is what I’ll call it for the rest of the post). One of the Authors of this book, Dennis Ritchie, actually coauthored the C language and this book is talked of as the go to authority of all matters C. Mr. Shaw devoted a whole chapter of “Learning C the Hard way” to bashing this book, but I found its layout and explanations quite accessible and useful. I did not find the tutorials as direct as “Learning C the Hard Way”, but I found it to be a helpful supplement.

 

Resources for Learning C++

Foundations of Programming: Object-Oriented Design – From Lynda.com

A main difference between C and C++ is that C++ is an object oriented language. I had some very basic experience in using object oriented programming, but was looking for a refresher before learning C++. “Foundations of Programming: Object-Oriented Design” was an excellent course that taught me all I wanted to know about object-oriented programming and more. The course is purely conceptual and does not teach any actual code or contain any practice problems. It presents the concepts in a simple yet comprehensive manner that I found very helpful. I would highly recommend this course to anyone hoping to learn or brush up their knowledge of how object-oriented programming works.

Up and Running with C++ – From Lynda.com

This course was very similar in layout to the C course from Lynda.com, and I have the same criticisms. The entire course used Eclipse, and I kept having minor problems that were never addressed by the lectures but prevented me from executing my code. I did feel like I was able to learn the basic tools I needed from the lectures, but I would have gotten much more out of the class if it had been taught through the command line. I also felt that the course was sparse on exercises and heavy on lectures. I found that I got much less out of watching the instructor write code than being forced to write out the code myself (as Learning C the Hard Way forces you to do).

cplusplus.com

This resource is linked often in older posts on this blog, and I found it helpful in answering C++ questions I had after finishing the Lynda.com courses. I did not find that tutorial had the most helpful narration of how one may learn C++ from scratch, but it has very succinct definitions of many C++ components and was helpful as a reference. I think this site is the one I will look to most when troubleshooting future C++ code.

Final thoughts

I’d like to note that I found WRASEMAN’s post on makefiles a few weeks back  to be quite helpful. From my limited experience, ensuring that your code compiles correctly can be one of the most challenging parts of using a lower level language and the post has some excellent resources that explain makefiles are and how they can be used.

I know there are a lot of contributors and readers of this blog who are much more versed in C and C++ than I am, so if you’d like to chime in on useful resources, please do so in the comments.

 

Using HDF5/zlib Compression in NetCDF4

Not too long ago, I posted an entry on writing NetCDF files in C and loading them in R.  In that post, I mentioned that the latest and greatest version of NetCDF includes HDF5/zlib compression, but I didn’t say much more beyond that.  In this post, I’ll explain briefly how to use this compression feature in your NetCDF4 files.

Disclaimer: I’m not an expert in any sense on the details of compression algorithms.  For more details on how HDF5/zlib compression is integrated into NetCDF, check out the NetCDF Documentation.  Also, I’ll be assuming that the NetCDF4 library was compiled on your machine to enable HDF5/zlib compression.  Details on building and installing NetCDF from source code can be found in the documentation too.

I will be using code similar to what was in my previous post.  The code generates three variables (x, y, z) each with 3 dimensions.  I’ve increased the size of the dimensions by an order of magnitude to better accentuate the compression capabilities.

  // Loop control variables
  int i, j, k;
  
  // Define the dimension sizes for
  // the example data.
  int dim1_size = 100;
  int dim2_size = 50;
  int dim3_size = 200;
  
  // Define the number of dimensions
  int ndims = 3;
  
  // Allocate the 3D vectors of example data
  float x[dim1_size][dim2_size][dim3_size]; 
  float y[dim1_size][dim2_size][dim3_size];
  float z[dim1_size][dim2_size][dim3_size];
  
  // Generate some example data
  for(i = 0; i < dim1_size; i++) {
        for(j = 0; j < dim2_size; j++) {
                for(k = 0; k < dim3_size; k++) {
                        x[i][j][k] = (i+j+k) * 0.2;
                        y[i][j][k] = (i+j+k) * 1.7;
                        z[i][j][k] = (i+j+k) * 2.4;
                }
          }
        }

Next is to setup the various IDs, create the NetCDF file, and apply the dimensions to the NetCDF file.  This has not changed since the last post.

  // Allocate space for netCDF dimension ids
  int dim1id, dim2id, dim3id;
  
  // Allocate space for the netcdf file id
  int ncid;
  
  // Allocate space for the data variable ids
  int xid, yid, zid;
  
  // Setup the netcdf file
  int retval;
  if((retval = nc_create(ncfile, NC_NETCDF4, &ncid))) { ncError(retval); }
  
  // Define the dimensions in the netcdf file
  if((retval = nc_def_dim(ncid, "dim1_size", dim1_size, &dim1id))) { ncError(retval); }
  if((retval = nc_def_dim(ncid, "dim2_size", dim2_size, &dim2id))) { ncError(retval); }
  if((retval = nc_def_dim(ncid, "dim3_size", dim3_size, &dim3id))) { ncError(retval); }
  
  // Gather the dimids into an array for defining variables in the netcdf file
  int dimids[ndims];
  dimids[0] = dim1id;
  dimids[1] = dim2id;
  dimids[2] = dim3id;

Here’s where the magic happens.  The next step is to define the variables in the NetCDF file.  The variables must be defined in the file before you tag it for compression.

  // Define the netcdf variables
  if((retval = nc_def_var(ncid, "x", NC_FLOAT, ndims, dimids, &xid))) { ncError(retval); }
  if((retval = nc_def_var(ncid, "y", NC_FLOAT, ndims, dimids, &yid))) { ncError(retval); }
  if((retval = nc_def_var(ncid, "z", NC_FLOAT, ndims, dimids, &zid))) { ncError(retval); }

Now that we’ve defined the variables in the NetCDF file, let’s tag them for compression.

  // OPTIONAL: Compress the variables
  int shuffle = 1;
  int deflate = 1;
  int deflate_level = 4;
  if((retval = nc_def_var_deflate(ncid, xid, shuffle, deflate, deflate_level))) { ncError(retval); }
  if((retval = nc_def_var_deflate(ncid, yid, shuffle, deflate, deflate_level))) { ncError(retval); }
  if((retval = nc_def_var_deflate(ncid, zid, shuffle, deflate, deflate_level))) { ncError(retval); }

The function nc_def_var_deflate() performs this.  It takes the following parameters:

  • int ncid – The NetCDF file ID returned from the nc_create() function
  • int varid – The variable ID associated with the variable you would like to compress.  This is returned from the nc_def_var() function
  • int shuffle – Enables the shuffle filter before compression.  Any non-zero integer enables the filter.  Zero disables the filter.  The shuffle filter rearranges the byte order in the data stream to enable more efficient compression. See this performance evaluation from the HDF group on integrating a shuffle filter into the HDF5 algorithm.
  • int deflate – Enable compression at the compression level indicated in the deflate_level parameter.  Any non-zero integer enables compression.
  • int deflate_level – The level to which the data should be compressed.  Levels are integers in the range [0-9].  Zero results in no compression whereas nine results in maximum compression.

The rest of the code doesn’t change from the previous post.

  // OPTIONAL: Give these variables units
  if((retval = nc_put_att_text(ncid, xid, "units", 2, "cm"))) { ncError(retval); }
  if((retval = nc_put_att_text(ncid, yid, "units", 4, "degC"))) { ncError(retval); }
  if((retval = nc_put_att_text(ncid, zid, "units", 1, "s"))) { ncError(retval); }
  
  // End "Metadata" mode
  if((retval = nc_enddef(ncid))) { ncError(retval); }
  
  // Write the data to the file
  if((retval = nc_put_var(ncid, xid, &x[0][0][0]))) { ncError(retval); }
  if((retval = nc_put_var(ncid, yid, &y[0][0][0]))) { ncError(retval); }
  if((retval = nc_put_var(ncid, zid, &z[0][0][0]))) { ncError(retval); }
  
  // Close the netcdf file
  if((retval = nc_close(ncid))) { ncError(retval); }

So the question now is whether or not it’s worth compressing your data.  I performed a simple experiment with the code presented here and the resulting NetCDF files:

  1. Generate the example NetCDF file from the code above using each of the available compression levels.
  2. Time how long the code takes to generate the file.
  3. Note the final file size of the NetCDF.
  4. Time how long it takes to load and extract data from the compressed NetCDF file.

Below is a figure illustrating the results of the experiment (points 1-3).

compress_plot

Before I say anything about these results, note that individual results may vary.  I used a highly stylized data set to produce the NetCDF file which likely benefits greatly from the shuffle filtering and compression.  These results show a compression of 97% – 99% of the original file size.  While the run time did increase, it barely made a difference until hitting the highest compression levels (8,9).  As for point 4, there was only a small difference in load/read times (0.2 seconds) between the uncompressed and any of the compressed files (using ncdump and the ncdf4 package in R).  There’s no noticeable difference among the load/read times for any of the compressed NetCDF files.  Again, this could be a result of the highly stylized data set used as an example in this post.

For something more practical, I can only offer anecdotal evidence about the compression performance.  I recently included compression in my current project due to the large possible number of multiobjective solutions and states-of-the-world (SOW).  The uncompressed file my code produced was on the order of 17.5 GB (for 300 time steps, 1000 SOW, and about 3000 solutions).  I enabled compression of all variables (11 variables – 5 with three dimensions and 6 with two dimensions – compression level 4).  The next run produced just over 7000 solutions, but the compressed file size was 9.3 GB.  The down side is that it took nearly 45 minutes to produce the compressed file, as opposed to 10 minutes with the previous run.  There are many things that can factor into these differences that I did not control for, but the results are promising…if you’ve got the computer time.

I hope you found this post useful in some fashion.  I’ve been told that compression performance can be increased if you also “chunk” your data properly.  I’m not too familiar with chunking data for writing in NetCDF files…perhaps someone more clever than I can write about this?

Acknowledgement:  I would like to acknowledge Jared Oyler for his insight and helpful advice on some of the more intricate aspects of the NetCDF library.

Setting up Eclipse for C/C++

IDEs are tools to make code development a lot easier, specially if your project has multiple files, classes, and functions. However, setting up the IDE can sometimes be as painful as developing complex codes without an IDE. This post will present a short tutorial about how to install and configure Eclipse for C/C++ on Windows 7 in a (hopefully) fairly painless manner. This tutorial is sequenced as follows:

  1. Installation
    1. Downloading the Java Runtime Environment.
    2. Downloading the GCC compiler.
    3. Downloading Eclipse.
  2. First steps with Eclipse
    1. Setting up a template (optional)
    2. Creating a new project
    3. Including libraries in your project

INSTALLATION

Downloading the Java Runtime Environment

To check if you have the Java Runtime Environment installed, go to java.com with either Internet Explorer or Firefox (Chrome will block the plugin) and click on “Do I have Java?”. Accept running all the pluggins and, If the website tells you you do not have java, you will have to download and install it from the link displayed on the website.

Downloading the GCC compiler

After the check is done, you will have to download the GCC compiler, which can be done from http://www.equation.com. On the side menu, there will be a link to Programming Tools, which after expanded shows a link to Fortran, C, C++. Click on this link and download the right GCC version for your system (32/64 bit), as shown in the following screenshot.

DownloadGCC

After downloading it, double click on the executable, accept the licence, and type “c:\MinGW” as the installation directory. This is important because this is the first folder where Eclipse will look for the compiler in your computer. Proceed with the installation.

Downloading Eclipse

Now it is time to download an install eclipse. Go to the Eclipse download website and download Eclipse IDE for C/C++ Developers. Be sure to select the right option for your computer (Windows, 32bit/64bit), otherwise eclipse may not install and even if it does it will not run after installed. If unsure about which version you should download, this information can be found at Control Panel -> System by looking at System type.

Download

After downloading it, extract the file contents to “C:\Program Files\eclipse” (“Program Files (x86) if installing the 32 bits version) so that everything is organized. Note that for this you will need to start WinRAR or any other file compression program with administrative privileges. This can be done by right clicking the name of the program on the start menu and clicking on Run as Administrator.

Now, go to C:\Program Files\eclipse and double click on eclipse.exe to open eclipse. In case you get an error message saying, among other things:

Java was started but returned exit code=13
...
...
-os win32
-ws win32
...

then delete the whole eclipse folder, go back to the eclipse download page, download eclipse 32 bit, and extract it as previously described. You should not see the same error again when trying to run eclipse.exe.

Now that Eclipse is up and running, it is time to use it.

FIRST STEPS WITH ECLIPSE

The first thing eclipse will do is ask you to choose a workspace folder. This is the folder where all your code projects will be stored. It should not matter too much which folder you choose, so using the default is probably a good idea.

Setting up templates (optional)

It is helpful to create a code template in order to avoid retyping the same standard piece of code every time you create a new file or project. Many scientific codes have similar imports (such as math.h and stdio.h) and all of them must have a main method (as any C++ code). If we create a code template with a few common imports and the int main function, we can just tell Eclipse when creating a new project to add these to a new .cpp file.

In order to create the mentioned template, go to Window -> Preferences. There, under C/C++ -> Code Style on the left panel, click on Code Templates. Under Configure generated code and comments, expand Files -> C++ Source File, and then click on New. Choose a meaningful name for your template (I chose “Cpp with main”) and type a short description. After that, copy and paste the template below under “Pattern”.

/*
File: ${file_name}

Author: ${user}
Date: ${date}
*/

#include <iostream>
#include <string>
#include <math.h>
#include <stdio.h>
#include <string.h>

using namespace std;

int main()
{
    // Your code here.

    return 0;
}

Note ${file_name}, ${data}, and ${user} are variables, which means that they will be replaced by your file’s actual data. To see a list of the other variables that can be inserted in your template, click on Insert Variable…. Click Ok and Ok again and your template will be ready to be used!

Configuring_template

Creating a new project

Click on File -> New -> C++ Project. Under Project type choose Empty Project, then under Toolchains choose MinGW GCC, and, finally, type “project1” as your project name an click on Finish.

New_project

After your project is created, click on File -> New -> Source File. Type “say_something.cpp” (no quotes and do not forget the .cpp after the file name) as the name of your source file and choose the template you created as the template. The window should then look like this:

New_file

Click on Finish. If you used the template, replace the comment “// Your code here.” by “cout << “Yay, it worked!” << endl;”. Your code should look like the snippet below. If you have not created the template, just type the following code to your file.

/*
File: say_something.cpp

Author: bct52
Date: Jun 26, 2015
*/

#include <iostream>
#include <string>
#include <math.h>
#include <stdio.>
#include <string.h>

using namespace std;

int main()
{
    cout << "Yay, it worked!" << endl;

    return 0;
}

Now, build the code by clicking on the small hammer above the code window and, after the project is built, click on the run button (green circle with white play sign in the center). If everything went well, your window should look like the screenshot below, which means your code compiled and is runs as expected.

Project1_run

Including libraries in your project

When developing code, often times other people have had to develop pieces of code to perform some of the intermediate steps we want our code to perform. These pieces of code are often publicly available in the form of libraries. Therefore, instead of reinventing the wheel, it may be better to simply use a library.

Some libraries are comprised of one or a few files only, and can be included in a project simply by dragging the file into the Eclipse project. Others, however, are more complex and should be installed in the computer and then called from the code. The procedure for the latter case will be described here, as it is the most general case . The process of installation and usage of the Boost library with MinGW (GCC) will be used here as a case study.

The first step is downloading the library. Download the Boost library from here and extract it anywhere in your computer, say in C:\Users\my_username\Downloads (it really doesn’t matter where because these files will not be used after installation is complete).

Now it is time to install it. For this:

    1. Hold the Windows keyboard button and press R, type “cmd”, and press enter.
    2. On the command prompt, type “cd C:\Users\bct52\Downloads\boost_1_58_0” (or the directory where you extracted boost to) and press enter.
    3. There should be a file called bootstrap.bat in this folder. If that is the case, run the command:
      bootstrap.bat mingw
    4. In order to compile Boost to be used with MinGW, compile Boost with the gcc toolset. You will have to choose an installation directory for Boost, which WILL NOT be the same directory where you extracted the files earlier. In my case, I used C:\boost. For this, run the command:
      b2 install --prefix=C:\boost toolset=gcc

      Now go read a book or work on something else because this will take a while.

Now, if the installation worked with just warnings, it is time to run a code example from Boost’s website that, or course, uses the Boost library. Create a new project called “reveillon” and add a source file to it called “days_between_new_years.cpp” following the steps from the “Creating a new project” section. there is no need to use the template this time.

You should now have a blank source file in front of you. If not, delete any text/comments/codes in the file so that the file is blank. Now, copy and paste the following code, from Boost’s example, into your file.

 /* Provides a simple example of using a date_generator, and simple
   * mathematical operatorations, to calculate the days since
   * New Years day of this year, and days until next New Years day.
   *
   * Expected results:
   * Adding together both durations will produce 366 (365 in a leap year).
   */
  #include <iostream>
  #include "boost/date_time/gregorian/gregorian.hpp"

  int
  main()
  {
    
    using namespace boost::gregorian;

    date today = day_clock::local_day();
    partial_date new_years_day(1,Jan);
    //Subtract two dates to get a duration
    days days_since_year_start = today - new_years_day.get_date(today.year());
    std::cout << "Days since Jan 1: " << days_since_year_start.days()
              << std::endl;
    
    days days_until_year_start = new_years_day.get_date(today.year()+1) - today;
    std::cout << "Days until next Jan 1: " << days_until_year_start.days()
              << std::endl;
    return 0;
  };

Note that line 9 (“#include “boost/date_time/gregorian/gregorian.hpp””) is what tells your code what exactly is being used from Boost in your code. Line 15 (“using namespace boost::gregorian;”) saves you from having to type boost::gregorian every time you want to use one of its functions.

However, the project will still not compile in Eclipse because Eclipse still does not know where to look for the Boost library. This will require a couple of simple steps:

  1. Right click on the project (reveillon), under the Project Explorer side window, then click on Properties. Under C/C++ Build->Settings, click on Includes under GCC C++ Compiler. On the right there should be two blank boxes, the top one called Include paths (-I) and the other called Include files (-include). Under Include paths (top one), add the path “C:\boost\include\boost-1_58” (note that this path must reflect the path where you installed Boost as well as which version of Boost you have). This is where the compiler will look for the header file specified in the code with the #include statement.
  2. The compiled library files themselves must be included through the linker. This step is necessary only if you are using a compiled library. For this, on the same window, click on Libraries under MinGW C++ Linker. Add the path to the Boost libraries folder to the Library search path (-L) (bottom box). this path will be “C:\boost\lib” (again, if you installed Boost in a different folder your path will be slightly different). Now the actual compiled library must be added to the Libraries (-i) (top box). First, we need to figure out the name of the compiled library file used in the code. In this case, it is the file “libboost_date_time-mgw51-mt-d-1_58.a”. Therefore, add boost_date_time-mgw51-mt-d-1_58 (no lib prefix, no .a postfix, and be sure to match the name of your file) to Libraries (-i). Click Ok and Ok again.

Now compile the code by clicking on the hammer button and run the rode by clicking on the play button. Below is a screenshot reflecting both steps above as well as the expected output after running the program.

configuring_library

That’s it. After your model is in a good shape and it is time to run it with Borg (or other optimization algorithm), just change your “int main()” to a function with your model’s name and the right Borg’s arguments, add the standard Borg main, and change the makefile accordingly. Details on how to do all this for Borg will be explained in a future post.

Using a local copy of Boost on your cluster account

Boost is a set of libraries for C++.  It increases the language’s functionality, allowing you to do all sorts of interesting things (for example it has lots of random number generators).  Boost may already be installed on your local research computing cluster.  But there are several reasons why it may be a good idea to have your own copy of Boost to use within your user account:

  • It may be difficult or impossible to actually find the location of your computer’s Boost libraries.
  • Boost functions are introduced with newer and newer versions of the software.  So what if you want to use a function that came out in a later version (i.e., 1.5.6) that is not in the version installed on your computer?
  • Perhaps you want to be able to see the source code of the Boost functions within your own account, to better understand how they work.

If so, it’s easy enough to download Boost to your local computer, then upload the files to your user account.  Click “current release” on the main Boost website (see the link above).  Then download the files to your computer.  If you’re on a Windows machine, use a program like 7-zip to unpack all the files (or simply keep the tgz file and unpack them on the cluster, that’s probably faster anyway).  Then, upload the Boost files to the cluster.  I recommend placing the boost_1_56_0 folder inside the /lib/ folder on your home directory, that way all your libraries can be in one place.

Here’s the important part: any time you use Boost you need to point to where the libraries are stored.  Because of that, you’ll need to know the path of Boost, that is, where the files “live” on your computer.  There is probably a command in your makefile already that starts with -I.  All you have to do is add your new Boost path to the command, on my system it looks something like:

CPPFLAGS=-I/home/myUsername/lib/boost_1_56_0

That’s it!  Comments questions and concerns shall be given below.

Quick testing of code online

Hi there, happy holidays everyone! A really quick post with a helpful link, if you need to test a small code idea quickly online. The website: http://www.ideone.com will run multiple languages, and even lets you input data using standard in and out. Codepad (http://codepad.org/) is another popular one, but it doesn’t support standard in/out. As usual feel free to comment with more suggestions!

Training Video: External problems in MOEAFramework

In the next few weeks, we’ll be adding blog posts relating to our MOEAframework training.  They are a little bit out of order now, but we may rearrange them as things move forward.

Today’s post is a video overview of a simple “external” problem for MOEAframework.  External means that the objective function is coded in another language, compiled in an executable, and it communicates with the MOEAframework algorithms via the command line.  I walk through different components of a simple code file, rosenbrock.c, compile it, and show how to interact with the program via the command line.