# Directed search with the Exploratory Modeling workbench

This is the third blog in a series showcasing the functionality of the Exploratory Modeling workbench. Exploratory modeling entails investigating the way in which uncertainty and/or policy levers map to outcomes. To investigate these mappings, we can either use sampling based strategies (open exploration) or optimization based strategies (directed search) In the first blog, I gave a general overview of the workbench and showed briefly how both investigation strategies can be done. In the second blog, I demonstrated the use of the workbench for open exploration in substantial more detail. In this third blog, I will demonstrate in more detail how to use the workbench for directed search. Like in the previous two blog post, I will use the DPS version of the lake problem.

For optimization, the workbench relies on platypus. You can easily install the latest version of platypus from github using pip

pip install git+https://github.com/Project-Platypus/Platypus.git


By default, the workbench will use epsilon NSGA2, but all the other algorithms available within platypus can be used as well.

Within the workbench, optimization can be used in three ways:
* Search over decision levers for a reference scenario
* Robust search: search over decision levers for a set of scenarios
* worst case discovery: search over uncertainties for a reference policy

The search over decision levers or over uncertainties relies on the specification of the direction for each outcome of interest defined on the model. It is only possible to use ScalarOutcome objects for optimization.

## Search over levers

Directed search is most often used to search over the decision levers in order to find good candidate strategies. This is for example the first step in the Many Objective Robust Decision Making process. This is straightforward to do with the workbench using the optimize method.

from ema_workbench import MultiprocessingEvaluator, ema_logging

ema_logging.log_to_stderr(ema_logging.INFO)

with MultiprocessingEvaluator(model) as evaluator:
results = evaluator.optimize(nfe=10000, searchover='levers',
epsilons=[0.1,]*len(model.outcomes),
population_size=50)


the result from optimize is a DataFrame with the decision variables and outcomes of interest. The latest version of the workbench comes with a pure python implementation of parallel coordinates plot built on top of matplotlib. It has been designed with the matplotlib and seaborn api in mind. We can use this to quickly visualize the optimization results.

from ema_workbench.analysis import parcoords

paraxes = parcoords.ParallelAxes(parcoords.get_limits(results), rot=0)
paraxes.plot(results, color=sns.color_palette()[0])
paraxes.invert_axis('max_P')
plt.show()


Note how we can flip an axis using the invert_axis method. This eases interpretation of the figure because the ideal solution in this case would be a straight line for the four outcomes of interest at the top of the figure.

### Specifying constraints

In the previous example, we showed the most basic way for using the workbench to perform many-objective optimization. However, the workbench also offers support for constraints and tracking convergence. Constrains are an attribute of the optimization problem, rather than an attribute of the model as in Rhodium. Thus, we can pass a list of constraints to the optimize method. A constraint can be applied to the model input parameters (both uncertainties and levers), and/or outcomes. A constraint is essentially a function that should return the distance from the feasibility threshold. The distance should be 0 if the constraint is met.

As a quick demonstration, let’s add a constraint on the maximum pollution. This constraint applies to the max_P outcome. The example below specifies that the maximum pollution should be below 1.

from ema_workbench import MultiprocessingEvaluator, ema_logging, Constraint

ema_logging.log_to_stderr(ema_logging.INFO)

constraints = [Constraint("max pollution", outcome_names="max_P",
function=lambda x:max(0, x-1))]

with MultiprocessingEvaluator(model) as evaluator:
results = evaluator.optimize(nfe=1000, searchover='levers',
epsilons=[0.1,]*len(model.outcomes),
population_size=25, constraints=constraints)


### tracking convergence

To track convergence, we need to specify which metric(s) we want to use and pass these to the optimize method. At present the workbench comes with 3 options: Hyper volume, Epsilon progress, and a class that will write the archive at each iteration to a separate text file enabling later processing. If convergence metrics are specified, optimize will return both the results as well as the convergence information.

from ema_workbench import MultiprocessingEvaluator, ema_logging
from ema_workbench.em_framework.optimization import (HyperVolume,
EpsilonProgress, )
from ema_workbench.em_framework.outcomes import Constraint

ema_logging.log_to_stderr(ema_logging.INFO)

# because of the constraint on pollution, we can specify the
# maximum easily
convergence_metrics = [HyperVolume(minimum=[0,0,0,0], maximum=[1,1,1,1]),
EpsilonProgress()]
constraints = [Constraint("max pollution", outcome_names="max_P",
function=lambda x:max(0, x-1))]

with MultiprocessingEvaluator(model) as evaluator:
results_ref1, convergence1 = evaluator.optimize(nfe=25000, searchover='levers',
epsilons=[0.05,]*len(model.outcomes),
convergence=convergence_metrics,
constraints=constraints,
population_size=100)


We can visualize the results using parcoords as before, while the convergence information is in a DataFrame making it also easy to plot.

fig, (ax1, ax2) = plt.subplots(ncols=2, sharex=True)
ax1.plot(convergence1.epsilon_progress)
ax1.set_xlabel('nr. of generations')
ax1.set_ylabel('$\epsilon$ progress')
ax2.plot(convergence1.hypervolume)
ax2.set_ylabel('hypervolume')
sns.despine()
plt.show()


### Changing the reference scenario

Up till now, we have performed the optimization for an unspecified reference scenario. Since the lake model function takes default values for each of the deeply uncertain factors, these values have been implicitly assumed. It is however possible to explicitly pass a reference scenario that should be used instead. In this way, it is easy to apply the extended MORDM approach suggested by Watson and Kasprzyk (2017).

To see the effects of changing the reference scenario on the values for the decision levers found through the optimization, as well as ensuring a fair comparison with the previous results, we use the same convergence metrics and constraints from the previous optimization. Note that the constraints are in essence only a function, and don’t retain optimization specific state, we can simply reuse them. The convergence metrics, in contrast retain state and we thus need to re-instantiate them.

from ema_workbench import Scenario

reference = Scenario('reference', **dict(b=.43, q=3,mean=0.02,
stdev=0.004, delta=.94))
convergence_metrics = [HyperVolume(minimum=[0,0,0,0], maximum=[1,1,1,1]),
EpsilonProgress()]

with MultiprocessingEvaluator(model) as evaluator:
results_ref2, convergence2 = evaluator.optimize(nfe=25000, searchover='levers',
epsilons=[0.05,]*len(model.outcomes),
convergence=convergence_metrics,
constraints=constraints,
population_size=100, reference=reference)



### comparing results for different reference scenarios

To demonstrate the parcoords plotting functionality in some more detail, let’s combine the results from the optimizations for the two different reference scenarios and visualize them in the same plot. To do this, we need to first figure out the limits across both optimizations. Moreover, to get a better sense of which part of the decision space is being used, let’s set the limits for the decision levers on the basis of their specified ranges instead of inferring the limits from the optimization results.

columns = [lever.name for lever in model.levers]
columns += [outcome.name for outcome in model.outcomes]
limits = {lever.name: (lever.lower_bound, lever.upper_bound) for lever in
model.levers}
limits = dict(**limits, **{outcome.name:(0,1) for outcome in model.outcomes})
limits = pd.DataFrame.from_dict(limits)
# we resort the limits in the order produced by the optimization
limits = limits[columns]

paraxes = parcoords.ParallelAxes(limits, rot=0)
paraxes.plot(results_ref1, color=sns.color_palette()[0], label='ref1')
paraxes.plot(results_ref2, color=sns.color_palette()[1], label='ref2')
paraxes.legend()
paraxes.invert_axis('max_P')
plt.show()


## Robust Search

The workbench also comes with support for many objective robust optimization. In this case, each candidate solution is evaluated over a set of scenarios, and the robustness of the performance over this set is calculated. This requires specifying 2 new pieces of information:
* the robustness metrics
* the scenarios over which to evaluate the candidate solutions

The robustness metrics are simply a collection of ScalarOutcome objects. For each one, we have to specify which model outcome(s) it uses, as well as the actual robustness function. For demonstrative purposes, let’s assume we are use a robustness function using descriptive statistics: we want to maximize the 10th percentile performance for reliability, inertia, and utility, while minimizing the 90th percentile performance for max_P.

We can specify our scenarios in various ways. The simplest would be to pass the number of scenarios to the robust_optimize method. In this case for each generation a new set of scenarios is used. This can create noise and instability in the optimization. A better option is to explicitly generate the scenarios first, and pass these to the method. In this way, the same set of scenarios is used for each generation.

If we want to specify a constraint, this can easily be done. Note however, that in case of robust optimization, the constrains will apply to the robustness metrics instead of the model outcomes. They can of course still apply to the decision variables as well.

import functools
from ema_workbench import Constraint, MultiprocessingEvaluator
from ema_workbench import Constraint, ema_logging
from ema_workbench.em_framework.optimization import (HyperVolume,
EpsilonProgress)
from ema_workbench.em_framework.samplers import sample_uncertainties

ema_logging.log_to_stderr(ema_logging.INFO)

percentile10 = functools.partial(np.percentile, q=10)
percentile90 = functools.partial(np.percentile, q=90)

MAXIMIZE = ScalarOutcome.MAXIMIZE
MINIMIZE = ScalarOutcome.MINIMIZE
robustnes_functions = [ScalarOutcome('90th percentile max_p', kind=MINIMIZE,
variable_name='max_P', function=percentile90),
ScalarOutcome('10th percentile reliability', kind=MAXIMIZE,
variable_name='reliability', function=percentile10),
ScalarOutcome('10th percentile inertia', kind=MAXIMIZE,
variable_name='inertia', function=percentile10),
ScalarOutcome('10th percentile utility', kind=MAXIMIZE,
variable_name='utility', function=percentile10)]

def constraint(x):
return max(0, percentile90(x)-10)

constraints = [Constraint("max pollution",
outcome_names=['90th percentile max_p'],
function=constraint)]
convergence_metrics = [HyperVolume(minimum=[0,0,0,0], maximum=[10,1,1,1]),
EpsilonProgress()]
n_scenarios = 10
scenarios = sample_uncertainties(model, n_scenarios)

nfe = 10000

with MultiprocessingEvaluator(model) as evaluator:
robust_results, convergence = evaluator.robust_optimize(robustnes_functions,
scenarios, nfe=nfe, constraints=constraints,
epsilons=[0.05,]*len(robustnes_functions),
convergence=convergence_metrics,)

fig, (ax1, ax2) = plt.subplots(ncols=2)
ax1.plot(convergence.epsilon_progress.values)
ax1.set_xlabel('nr. of generations')
ax1.set_ylabel('$\epsilon$ progress')
ax2.plot(convergence.hypervolume)
ax2.set_ylabel('hypervolume')
sns.despine()
plt.show()


paraxes = parcoords.ParallelAxes(parcoords.get_limits(robust_results), rot=45)
paraxes.plot(robust_results)
paraxes.invert_axis('90th percentile max_p')
plt.show()


## Search over uncertainties: worst case discovery

Up till now, we have focused on optimizing over the decision levers. The workbench however can also be used for worst case discovery (Halim et al, 2016). In essence, the only change is to specify that we want to search over uncertainties instead of over levers. Constraints and convergence works just as in the previous examples.

Reusing the foregoing, however, we should change the direction of optimization of the outcomes. We are no longer interested in finding the best possible outcomes, but instead we want to find the worst possible outcomes.

# change outcomes so direction is undesirable
minimize = ScalarOutcome.MINIMIZE
maximize = ScalarOutcome.MAXIMIZE

for outcome in model.outcomes:
if outcome.kind == minimize:
outcome.kind = maximize
else:
outcome.kind = minimize


We can reuse the reference keyword argument to perform worst case discovery for one of the policies found before. So, below we select solution number 9 from the pareto approximate set. We can turn this into a dict and instantiate a Policy objecti.

from ema_workbench import Policy

policy = Policy('9', **{k:v for k, v in results_ref1.loc[9].items()
if k in model.levers})

with MultiprocessingEvaluator(model) as evaluator:
results = evaluator.optimize(nfe=1000, searchover='uncertainties',
epsilons=[0.1,]*len(model.outcomes),
reference=policy)


Visualizing the results is straightforward using parcoords.

paraxes = parcoords.ParallelAxes(parcoords.get_limits(results), rot=0)
paraxes.plot(results)
paraxes.invert_axis('max_P')
plt.show()


## Closing remarks

This blog showcased the functionality of the workbench for applying search based approaches to exploratory modelling. We specifically looked at the use of many-objective optimization for searching over the levers or uncertainties, as well as the use of many-objective robust optimization. This completes the overview of the functionality available in the workbench. In the next blog, I will put it all together to show how the workbench can be used to perform Many Objective Robust Decision Making.

# Open exploration with the Exploratory Modelling Workbench

In this blog, I will continue to showcase the functionality of the exploratory modelling workbench. In the previous blog, I have given a general introduction to the workbench, and showed how the Direct Policy Search example that comes with Rhodium can be adapted for use with the workbench. In this blog post, I will showcase how the workbench can be used for open exploration.

## first a short background

In exploratory modeling, we are interested in understanding how regions in the uncertainty space and/or the decision space map to the whole outcome space, or partitions thereof. There are two general approaches for investigating this mapping. The first one is through systematic sampling of the uncertainty or decision space. This is sometimes also known as open exploration. The second one is to search through the space in a directed manner using some type of optimization approach. This is sometimes also known as directed search.

The workbench support both open exploration and directed search. Both can be applied to investigate the mapping of the uncertainty space and/or the decision space to the outcome space. In most applications, search is used for finding promising mappings from the decision space to the outcome space, while exploration is used to stress test these mappings under a whole range of possible resolutions to the various uncertainties. This need not be the case however. Optimization can be used to discover the worst possible scenario, while sampling can be used to get insight into the sensitivity of outcomes to the various decision levers.

## open exploration

To showcase the open exploration functionality, let’s start with a basic example using the DPS lake problem also used in the previous blog post. We are going to simultaneously sample over uncertainties and decision levers. We are going to generate 1000 scenarios and 5 policies, and see how they jointly affect the outcomes. A scenario is understood as a point in the uncertainty space, while a policy is a point in the decision space. The combination of a scenario and a policy is called experiment. The uncertainty space is spanned by uncertainties, while the decision space is spanned by levers. Both uncertainties and levers are instances of RealParameter (a continuous range), IntegerParameter (a range of integers), or CategoricalParameter (an unorder set of things). By default, the workbench will use Latin Hypercube sampling for generating both the scenarios and the policies. Each policy will be always evaluated over all scenarios (i.e. a full factorial over scenarios and policies).

from ema_workbench import (RealParameter, ScalarOutcome, Constant,
ReplicatorModel)
model = ReplicatorModel('lakeproblem', function=lake_model)
model.replications = 150

#specify uncertainties
model.uncertainties = [RealParameter('b', 0.1, 0.45),
RealParameter('q', 2.0, 4.5),
RealParameter('mean', 0.01, 0.05),
RealParameter('stdev', 0.001, 0.005),
RealParameter('delta', 0.93, 0.99)]

# set levers
model.levers = [RealParameter(&quot;c1&quot;, -2, 2),
RealParameter(&quot;c2&quot;, -2, 2),
RealParameter(&quot;r1&quot;, 0, 2),
RealParameter(&quot;r2&quot;, 0, 2),
RealParameter(&quot;w1&quot;, 0, 1)]

def process_p(values):
values = np.asarray(values)
values = np.mean(values, axis=0)
return np.max(values)

#specify outcomes
model.outcomes = [ScalarOutcome('max_P', kind=ScalarOutcome.MINIMIZE,
function=process_p),
ScalarOutcome('utility', kind=ScalarOutcome.MAXIMIZE,
function=np.mean),
ScalarOutcome('inertia', kind=ScalarOutcome.MINIMIZE,
function=np.mean),
ScalarOutcome('reliability', kind=ScalarOutcome.MAXIMIZE,
function=np.mean)]

# override some of the defaults of the model
model.constants = [Constant('alpha', 0.41),
Constant('steps', 100)]



Next, we can perform experiments with this model.

from ema_workbench import (MultiprocessingEvaluator, ema_logging,
perform_experiments)
ema_logging.log_to_stderr(ema_logging.INFO)

with MultiprocessingEvaluator(model) as evaluator:
results = evaluator.perform_experiments(scenarios=1000, policies=5)



### Visual analysis

Having generated these results, the next step is to analyze them and see what we can learn from the results. The workbench comes with a variety of techniques for this analysis. A simple first step is to make a few quick visualizations of the results. The workbench has convenience functions for this, but it also possible to create your own visualizations using the scientific Python stack.

from ema_workbench.analysis import pairs_plotting
fig, axes = pairs_plotting.pairs_scatter(results, group_by='policy',
legend=False)
plt.show()



Writing your own visualizations requires a more in-depth understanding of how the results from the workbench are structured. perform_experiments returns a tuple. The first item is a numpy structured array where each row is a single experiment. The second item contains the outcomes, structured in a dict with the name of the outcome as key and a numpy array as value. Experiments and outcomes are aligned based on index.

import seaborn as sns

experiments, outcomes = results

df = pd.DataFrame.from_dict(outcomes)
df = df.assign(policy=experiments['policy'])

# rename the policies using numbers
df['policy'] = df['policy'].map({p:i for i, p in
enumerate(set(experiments['policy']))})

# use seaborn to plot the dataframe
grid = sns.pairplot(df, hue='policy', vars=outcomes.keys())
ax = plt.gca()
plt.show()



Often, it is convenient to separate the process of performing the experiments from the analysis. To make this possible, the workbench offers convenience functions for storing results to disc and loading them from disc. The workbench will store the results in a tarbal with .csv files and separate metadata files. This is a convenient format that has proven sufficient over the years.

from ema_workbench import save_results

save_results(results, '1000 scenarios 5 policies.tar.gz')

results = load_results('1000 scenarios 5 policies.tar.gz')



In addition to visual analysis, the workbench comes with a variety of techniques to perform a more in-depth analysis of the results. In addition, other analyses can simply be performed by utilizing the scientific python stack. The workbench comes with

• Scenario Discovery, a model driven approach to scenario development.
• Dimensional stacking, a quick visual approach drawing on feature scoring to enable scenario discovery. This approach has received limited attention in the literature (Suzuki et al., 2015). The implementation in the workbench replaces the rule mining approach with a feature scoring approach.
• Feature Scoring, a poor man’s alternative to global sensitivity analysis
• Regional sensitivity analysis

### Scenario Discovery

A detailed discussion on scenario discovery can be found in an earlier blogpost. For completeness, I provide a code snippet here. Compared to the previous blog post, there is one small change. The library mpld3 is currently not being maintained and broken on Python 3.5 and higher. To still utilize the interactive exploration of the trade offs within the notebook, use the interactive back-end as shown below.

from ema_workbench.analysis import prim

experiments, outcomes = results

x = experiments
y = outcomes['max_P'] &lt;0.8

prim_alg = prim.Prim(x, y, threshold=0.8)
box1 = prim_alg.find_box()


%matplotlib notebook

plt.show()



%matplotlib inline
# we go back to default not interactive

box1.inspect(43)
box1.inspect(43, style='graph')
plt.show()



### dimensional stacking

Dimensional stacking was suggested as a more visual approach to scenario discovery. It involves two steps: identifying the most important uncertainties that affect system behavior, and creating a pivot table using the most influential uncertainties. Creating the pivot table involves binning the uncertainties. More details can be found in Suzuki et al. (2015) or by looking through the code in the workbench. Compared to the original paper, I use feature scoring for determining the most influential uncertainties. The code is set up in a modular way so other approaches to global sensitivity analysis can easily be used as well if so desired.

from ema_workbench.analysis import dimensional_stacking

x = experiments
y = outcomes['max_P'] &lt;0.8

dimensional_stacking.create_pivot_plot(x,y, 2, nbins=3)
plt.show()



We can see from this visual that if B is low, while Q is high, we have a high concentration of cases where pollution stays below 0.8. The mean and delta have some limited additional influence. By playing around with an alternative number of bins, or different number of layers, patterns can be coarsened or refined.

### regional sensitivity analysis

A third approach for supporting scenario discovery is to perform a regional sensitivity analysis. The workbench implements a visual approach based on plotting the empirical CDF given a classification vector. Please look at section 3.4 in Pianosi et al (2016) for more details.

from ema_workbench.analysis import regional_sa
from numpy.lib import recfunctions as rf

x = rf.drop_fields(experiments, 'model', asrecarray=True)
y = outcomes['max_P'] &lt; 0.8

regional_sa.plot_cdfs(x,y)
plt.show()



### feature scoring

Feature scoring is a family of techniques often used in machine learning to identify the most relevant features to include in a model. This is similar to one of the use cases for global sensitivity analysis, namely factor prioritisation. In some of the work ongoing in Delft, we are comparing feature scoring with Sobol and Morris and the results are quite positive. The main advantage of feature scoring techniques is that they impose virtually no constraints on the experimental design, while they can handle real valued, integer valued, and categorical valued parameters. The workbench supports multiple techniques, the most useful of which generally is extra trees (Geurts et al. 2006).

For this example, we run feature scoring for each outcome of interest. We can also run it for a specific outcome if desired. Similarly, we can choose if we want to run in regression mode or classification mode. The later is applicable if the outcome is a categorical variable and the results should be interpreted similar to regional sensitivity analysis results. For more details, see the documentation.

from ema_workbench.analysis import feature_scoring

x = experiments
y = outcomes

fs = feature_scoring.get_feature_scores_all(x, y)
sns.heatmap(fs, cmap='viridis', annot=True)
plt.show()



From the results, we see that max_P is primarily influenced by b, while utility is driven by delta, for inertia and reliability the situation is a little bit less clear cut.

### linear regression

In addition to the prepackaged analyses that come with the workbench, it is also easy to rig up something quickly using the ever expanding scientific Python stack. Below is a quick example of performing a basic regression analysis on the results.

experiments, outcomes = results

for key, value in outcomes.items():
params = model.uncertainties #+ model.levers[:]

fig, axes = plt.subplots(ncols=len(params), sharey=True)

y = value

for i, param in enumerate(params):
ax = axes[i]
ax.set_xlabel(param.name)

pearson = sp.stats.pearsonr(experiments[param.name], y)

ax.annotate(&quot;r: {:6.3f}&quot;.format(pearson[0]), xy=(0.15, 0.85),
xycoords='axes fraction',fontsize=13)

x = experiments[param.name]
sns.regplot(x, y, ax=ax, ci=None, color='k',
scatter_kws={'alpha':0.2, 's':8, 'color':'gray'})

ax.set_xlim(param.lower_bound, param.upper_bound)

axes[0].set_ylabel(key)

plt.show()



The workbench can also be used for more advanced sampling techniques. To achieve this, it relies on SALib. On the workbench side, the only change is to specify the sampler we want to use. Next, we can use SALib directly to perform the analysis. To help with this, the workbench provides a convenience function for generating the problem dict which SALib provides. The example below focusses on performing SOBOL on the uncertainties, but we could do the exact same thing with the levers instead. The only changes required would be to set lever_sampling instead of uncertainty_sampling, and get the SALib problem dict based on the levers.

from SALib.analyze import sobol
from ema_workbench.em_framework.salib_samplers import get_SALib_problem

with MultiprocessingEvaluator(model) as evaluator:
sa_results = evaluator.perform_experiments(scenarios=1000,
uncertainty_sampling='sobol')

experiments, outcomes = sa_results
problem = get_SALib_problem(model.uncertainties)

Si = sobol.analyze(problem, outcomes['max_P'],
calc_second_order=True, print_to_console=False)

Si_filter = {k:Si[k] for k in ['ST','ST_conf','S1','S1_conf']}
Si_df = pd.DataFrame(Si_filter, index=problem['names'])


# Using the Exploratory Modelling Workbench

Over the last 7 years, I have been working on the development of an open source toolkit for supporting decision-making under deep uncertainty. This toolkit is known as the exploratory modeling workbench. The motivation for this name is that in my opinion all model-based deep uncertainty approaches are forms of exploratory modeling as first introduced by Bankes (1993). The design of the workbench has undergone various changes over time, but it has started to stabilize in the fall of 2016. This summer, I published a paper detailing the workbench (Kwakkel, 2017). There is an in depth example in the paper, but in a series of blogs I want to showcase the funtionality in some more detail.

The workbench is readily available through pip, but it requires ipyparallel and mpld3 (both available through conda), SALib (via pip), and optionality platypus (pip install directly from github repo).

## Adapting the DPS example from Rhodium

As a starting point, I will use the Direct Policy Search example that is available for Rhodium (Quinn et al 2017). I will adapt this code to work with the workbench. In this way, I can explain the workbench, as well as highlight some of the main differences between the workbench and Rhodium.

<br /># A function for evaluating our cubic DPS. This is based on equation (12)
# from [1].
def evaluateCubicDPS(policy, current_value):
value = 0

for i in range(policy["length"]):
rbf = policy["rbfs"][i]
value += rbf["weight"] * abs((current_value - rbf["center"]) / rbf["radius"])**3
value = min(max(value, 0.01), 0.1)
return value

# Construct the lake problem
def lake_problem(policy, # the DPS policy
b = 0.42, # decay rate for P in lake (0.42 = irreversible)
q = 2.0, # recycling exponent
mean = 0.02, # mean of natural inflows
stdev = 0.001, # standard deviation of natural inflows
alpha = 0.4, # utility from pollution
delta = 0.98, # future utility discount rate
nsamples = 100, # monte carlo sampling of natural inflows
steps = 100): # the number of time steps (e.g., days)
Pcrit = root(lambda x: x**q/(1+x**q) - b*x, 0.01, 1.5)
X = np.zeros((steps,))
decisions = np.zeros((steps,))
average_daily_P = np.zeros((steps,))
reliability = 0.0
utility = 0.0
inertia = 0.0

for _ in range(nsamples):
X[0] = 0.0

natural_inflows = np.random.lognormal(
math.log(mean**2 / math.sqrt(stdev**2 + mean**2)),
math.sqrt(math.log(1.0 + stdev**2 / mean**2)),
size=steps)

for t in range(1,steps):
decisions[t-1] = evaluateCubicDPS(policy, X[t-1])
X[t] = (1-b)*X[t-1] + X[t-1]**q/(1+X[t-1]**q) + decisions[t-1] + natural_inflows[t-1]
average_daily_P[t] += X[t]/float(nsamples)

reliability += np.sum(X < Pcrit)/float(steps)
utility += np.sum(alpha*decisions*np.power(delta,np.arange(steps)))
inertia += np.sum(np.diff(decisions) > -0.01)/float(steps-1)

max_P = np.max(average_daily_P)
reliability /= float(nsamples)
utility /= float(nsamples)
inertia /= float(nsamples)

return (max_P, utility, inertia, reliability)



The formulation of the decision rule assumes that policy is a dict, which is composed of a set of variables generated either through sampling or through optimization. This is relatively straightforward to do in Rhodium, but not so easy to do in the workbench. In the workbench, a policy is a composition of policy levers, where each policy lever is either a range of real values, a range of integers, or an unordered set of categories. To adapt the DPS version of the lake problem to work with the workbench, we have to first replace the policy dict with the different variables explicitly.

def get_antropogenic_release(xt, c1, c2, r1, r2, w1):
'''
Parameters
----------
xt : float
polution in lake at time t
c1 : float
center rbf 1
c2 : float
center rbf 2
r1 : float
r2 : float
w1 : float
weight of rbf 1

note:: w2 = 1 - w1

'''

rule = w1*(abs(xt-c1/r1))**3+(1-w1)*(abs(xt-c2/r2))**3
at = min(max(rule, 0.01), 0.1)
return at


Next, we need to adapt the lake_problem function itself to use this adapted version of the decision rule. This requires 2 changes: replace policy in the function signature of the lake_model function with the actual underlying parameters c1, c2, r1, r2, and w1, and use these when calculating the anthropological pollution rate.

def lake_model(b=0.42, q=2.0, mean=0.02, stdev=0.001, alpha=0.4, delta=0.98,
c1=0.25, c2=0.25, r1=0.5, r2=0.5, w1=0.5, nsamples=100,
steps=100):
Pcrit = root(lambda x: x**q/(1+x**q) - b*x, 0.01, 1.5)
X = np.zeros((steps,))
decisions = np.zeros((steps,))
average_daily_P = np.zeros((steps,))
reliability = 0.0
utility = 0.0
inertia = 0.0

for _ in range(nsamples):
X[0] = 0.0

natural_inflows = np.random.lognormal(
math.log(mean**2 / math.sqrt(stdev**2 + mean**2)),
math.sqrt(math.log(1.0 + stdev**2 / mean**2)),
size=steps)

for t in range(1,steps):
decisions[t-1] = get_antropogenic_release(X[t-1], c1, c2, r1, r2, w1)
X[t] = (1-b)*X[t-1] + X[t-1]**q/(1+X[t-1]**q) + decisions[t-1] + natural_inflows[t-1]
average_daily_P[t] += X[t]/float(nsamples)

reliability += np.sum(X < Pcrit)/float(steps)
utility += np.sum(alpha*decisions*np.power(delta,np.arange(steps)))
inertia += np.sum(np.diff(decisions) > -0.01)/float(steps-1)

max_P = np.max(average_daily_P)
reliability /= float(nsamples)
utility /= float(nsamples)
inertia /= float(nsamples)

return (max_P, utility, inertia, reliability)


This version of the code can be combined with the workbench already. However, we can clean it up a bit more if we want to. Note how there are 2 for loops in the lake model. The outer loop generates stochastic realizations of the natural inflow, while the inner loop calculates the the dynamics of the system given a stochastic realization. The workbench can be made responsible for this outer loop.

A quick note on terminology is in order here. I have a background in transport modeling. Here we often use discrete event simulation models. These are intrinsically stochastic models. It is standard practice to run these models several times and take descriptive statistics over the set of runs. In discrete event simulation, and also in the context of agent based modeling, this is known as running replications. The workbench adopts this terminology and draws a sharp distinction between designing experiments over a set of deeply uncertain factors, and performing replications of each experiment to cope with stochastic uncertainty.

Some other notes on the code:
* To aid in debugging functions, it is good practice to make a function deterministic. In this case we can quite easily achieve this by including an optional argument for setting the seed of the random number generation.
* I have slightly changed the formulation of inertia, which is closer to the mathematical formulation used in the various papers.
* I have changes the for loop over t to get rid of virtually all the t-1 formulations

from __future__ import division # python2
import math
import numpy as np
from scipy.optimize import brentq

def lake_model(b=0.42, q=2.0, mean=0.02, stdev=0.001, alpha=0.4,
delta=0.98, c1=0.25, c2=0.25, r1=0.5, r2=0.5,
w1=0.5, nsamples=100, steps=100, seed=None):
'''runs the lake model for 1 stochastic realisation using specified
random seed.

Parameters
----------
b : float
decay rate for P in lake (0.42 = irreversible)
q : float
recycling exponent
mean : float
mean of natural inflows
stdev : float
standard deviation of natural inflows
alpha : float
utility from pollution
delta : float
future utility discount rate
c1 : float
c2 : float
r1 : float
r2 : float
w1 : float
steps : int
the number of time steps (e.g., days)
seed : int, optional
seed for the random number generator
'''
np.random.seed(seed)

Pcrit = brentq(lambda x: x**q/(1+x**q) - b*x, 0.01, 1.5)
X = np.zeros((steps,))
decisions = np.zeros((steps,))

X[0] = 0.0

natural_inflows = np.random.lognormal(
math.log(mean**2 / math.sqrt(stdev**2 + mean**2)),
math.sqrt(math.log(1.0 + stdev**2 / mean**2)),
size=steps)

for t in range(steps-1):
decisions[t] = get_antropogenic_release(X[t], c1, c2, r1, r2, w1)
X[t+1] = (1-b)*X[t] + X[t]**q/(1+X[t]**q) + decisions[t] + natural_inflows[t]

reliability = np.sum(X < Pcrit)/steps
utility = np.sum(alpha*decisions*np.power(delta,np.arange(steps)))

# note that I have slightly changed this formulation to retain
# consistency with the equations in the papers
inertia = np.sum(np.abs(np.diff(decisions)) < 0.01)/(steps-1)
return X, utility, inertia, reliability


Now we are ready to connect this model to the workbench. This is fairly similar to how you would do it with Rhodium. We have to specify the uncertainties, the outcomes, and the policy levers. For the uncertainties and the levers, we can use real valued parameters, integer valued parameters, and categorical parameters. For outcomes, we can use either scalar, single valued outcomes or time series outcomes. For convenience, we can also explicitly control constants in case we want to have them set to a value different from their default value.

In this particular case, we are running the replications with the workbench. We still have to specify the descriptive statistics we would like to gather over the set of replications. For this, we can pass a function to an outcome. This function will be called with the results over the set of replications.

import numpy as np
from ema_workbench import (RealParameter, ScalarOutcome, Constant,
ReplicatorModel)

model = ReplicatorModel('lakeproblem', function=lake_model)
model.replications = 150

#specify uncertainties
model.uncertainties = [RealParameter('b', 0.1, 0.45),
RealParameter('q', 2.0, 4.5),
RealParameter('mean', 0.01, 0.05),
RealParameter('stdev', 0.001, 0.005),
RealParameter('delta', 0.93, 0.99)]

# set levers
model.levers = [RealParameter("c1", -2, 2),
RealParameter("c2", -2, 2),
RealParameter("r1", 0, 2),
RealParameter("r2", 0, 2),
RealParameter("w1", 0, 1)]

def process_p(values):
values = np.asarray(values)
values = np.mean(values, axis=0)
return np.max(values)

#specify outcomes
model.outcomes = [ScalarOutcome('max_P', kind=ScalarOutcome.MINIMIZE,
function=process_p),
ScalarOutcome('utility', kind=ScalarOutcome.MAXIMIZE,
function=np.mean),
ScalarOutcome('inertia', kind=ScalarOutcome.MINIMIZE,
function=np.mean),
ScalarOutcome('reliability', kind=ScalarOutcome.MAXIMIZE,
function=np.mean)]

# override some of the defaults of the model
model.constants = [Constant('alpha', 0.41),
Constant('steps', 100)]


## Open exploration

Now that we have specified the model with the workbench, we are ready to perform experiments on it. We can use evaluators to distribute these experiments either over multiple cores on a single machine, or over a cluster using ipyparallel. Using any parallelization is an advanced topic, in particular if you are on a windows machine. The code as presented here will run fine in parallel on a mac or Linux machine. If you are trying to run this in parallel using multiprocessing on a windows machine, from within a jupyter notebook, it won’t work. The solution is to move the lake_model and get_antropogenic_release to a separate python module and import the lake model function into the notebook.

Another common practice when working with the exploratory modeling workbench is to turn on the logging functionality that it provides. This will report on the progress of the experiments, as well as provide more insight into what is happening in particular in case of errors.

If we want to perform experiments on the model we have just defined, we can use the perform_experiments method on the evaluator, or the stand alone perform_experiments function. We can perform experiments over the uncertainties and/or over the levers. Any policy is evaluated over each of the scenarios. So if we want to use 100 scenarios and 10 policies, this means that we will end up performing 100 * 10 = 1000 experiments. By default, the workbench uses Latin hypercube sampling for both sampling over levers and sampling over uncertainties. However, the workbench also offers support for full factorial, partial factorial, and Monte Carlo sampling, as well as wrappers for the various sampling schemes provided by SALib.

from ema_workbench import (MultiprocessingEvaluator, ema_logging,
perform_experiments)
ema_logging.log_to_stderr(ema_logging.INFO)

with MultiprocessingEvaluator(model) as evaluator:
results = evaluator.perform_experiments(scenarios=10, policies=10)


## Directed Search

Similarly, we can easily use the workbench to search for a good candidate strategy. This requires that platypus is installed. If platypus is installed, we can simply use the optimize method. By default, the workbench will use $\epsilon$-NSGAII. The workbench can be used to search over the levers in order to find a good candidate strategy as is common in Many-Objective Robust Decision Making. The workbench can also be used to search over the uncertainties in order to find for example the worst possible outcomes and the conditions under which they appear. This is a form of worst case discovery. The optimize method takes an optional reference argument. This can be used to set the scenario for which you want to find good policies, or for setting the policy for which you want to find the worst possible outcomes. This makes implementing the approach suggested in Watson & Kasprzyk (2017) very easy.

with MultiprocessingEvaluator(model) as evaluator:
results = evaluator.optimize(nfe=1000, searchover='levers',
epsilons=[0.1,]*len(model.outcomes))


## Robust optimization

A third possibility is to perform robust optimization. In this case, the search will take place over the levers, but a given policy is than evaluated for a set of scenarios and the performance is defined over this set. To do this, we need to explicitly define robustness. For this, we can use the outcome object we have used before. In the example below we are defining robustness as the worst 10th percentile over the set of scenarios. We need to pass a variable_name argument to explicitly link outcomes of the model to the robustness metrics.

import functools

percentile10 = functools.partial(np.percentile, q=10)
percentile90 = functools.partial(np.percentile, q=90)

MAXIMIZE = ScalarOutcome.MAXIMIZE
MINIMIZE = ScalarOutcome.MINIMIZE
robustnes_functions = [ScalarOutcome('90th percentile max_p', kind=MINIMIZE,
variable_name='max_P', function=percentile90),
ScalarOutcome('10th percentile reliability', kind=MAXIMIZE,
variable_name='reliability', function=percentile10),
ScalarOutcome('10th percentile inertia', kind=MAXIMIZE,
variable_name='inertia', function=percentile10),
ScalarOutcome('10th percentile utility', kind=MAXIMIZE,
variable_name='utility', function=percentile10)]


Given the specification of the robustness function, the remainder is straightforward and analogous to normal optimization.

<br />n_scenarios = 200
scenarios = sample_uncertainties(lake_model, n_scenarios)
nfe = 100000

with MultiprocessingEvaluator(lake_model) as evaluator:
robust_results = evaluator.robust_optimize(robustnes_functions, scenarios,
nfe=nfe, epsilons=[0.05,]*len(robustnes_functions))


This blog has introduced the exploratory modeling workbench and has shown its basic functionality for sampling or searching over uncertainties and levers. In subsequent blogs, I will take a more in depth look at this funcitonality, as well as demonstrate how the workbench facilitates the entire Many-Objective Robust Decision Making process.

# Simple tricks to make your C/C++ code run faster

Us, engineers with no formal computer science training, for a myriad of good reasons tend to favor friendly programming languages such as Python over evil C/C++. However, when our application is performance sensitive and we cannot wait for results sitting in our chairs for as long as a Python code would require us to, we are sometimes forced to us C/C++. Why then not making the most of it when it this is the case?

Here are some simple tips to make your C/C++ code run even faster, and how to get some advice about further performance improvements. The last idea (data locality) is transferable to Python and other languages.

Most important trick

Improve your algorithm. Thinking if there is a simpler way of doing what you coded may reduce your algorithm’s complexity (say, from say n3 to n*log(n)), which would:

• yield great speed-up when you have lots of data or needs to run a computation several times in a row, and
• make you look smarter.

Compiler flags

First and foremost, the those who know what they are doing — compiler developers — do the work for you by calling the compiler with the appropriate flags. There are an incredible amount of flags you can call for that, but I would say that the ones you should have on whenever possible are -O3 and –march=native.

The optimization flags (-O1 to -O3, the latter more aggressive than the former) will perform a series of modification on your code behind the scenes to speed it up, some times by more than an order of magnitude. The issue is that this modifications may eventually make your code behave differently than you expected, so it’s always good to do a few smaller runs with -O0 and -O3 and compare their results before getting into production mode.

The –march=native flag will make the compiler fine tune your code to the processor it is being compiled on (conversely, –march=haswell would fine tune it to haswell architectures, for example). This is great if you are only going to run your code on your own machine or in another machine known to have a similar architecture, but if you try to run the binary on a machine with a different architecture, specially if it is an older one, you may end up with illegal instruction errors.

Restricted pointer array

When declaring a pointer array which you are sure will not be subject to pointer aliasing — namely there will be no other pointer pointing to the same memory address –, you can declare that pointer as a restrict pointer, as below:

• GCC: double* __restrict__ my_pointer_array
• Intel compiler: double* restrict my_pointer_array

This will let the compiler know that it can change order of certain operations involving my_pointer_array to make your code faster without changing some read/write order that may change your results. If you want to use the restricted qualifier with the intel compiler the flag -restrict must be passed as an argument to the compiler.

Aligned pointer array

By aligning an array, you are making sure the data is going to lie in the ideal location in memory for the processor to fetch it and perform the calculations it needs based on that array. To help your compiler optimize your data alignment, you need to (1) align your array when it is declared by a specific number of bytes and (2) tell your the compiler the array is aligned when using it to perform calculations — the compiler has no idea whether or not arrays received as arguments in function are aligned. Below are examples in C, although the same ideas apply to C++ as well.

GCC

#include <stdio.h>
#include <omp.h>

#define SIZE_ARRAY 100000
#define ALIGN 64

void sum(double *__restrict__ a, double *__restrict__ b, double *__restrict__ c, int n) {
a = (double*) __builtin_assume_aligned(a, ALIGN);
b = (double*) __builtin_assume_aligned(b, ALIGN);
c = (double*) __builtin_assume_aligned(c, ALIGN);
for (int i = 0; i < n; ++i)
c[i] = a[i] + b[i];
}

int main(void) {

double a[SIZE_ARRAY] __attribute__((aligned(ALIGN )));
double b[SIZE_ARRAY] __attribute__((aligned(ALIGN )));
double c[SIZE_ARRAY] __attribute__((aligned(ALIGN )));

for (int i = 0; i < SIZE_ARRAY; ++i) {
a[i] = 5.;
b[i] = 2.;
}
double start_time = omp_get_wtime();
sum(a, b, c, SIZE_ARRAY);
double time = omp_get_wtime() - start_time;
printf("%0.6fs", time);
}


Intel compiler

#include <stdio.h>
#include <omp.h>

#define SIZE_ARRAY 100000
#define ALIGN 64

void sum(double* restrict a, double* restrict b, double* restrict c, int n) {
__assume_aligned(a, ALIGN);
__assume_aligned(b, ALIGN);
__assume_aligned(c, ALIGN);
for (int i = 0; i < n; ++i)
c[i] = a[i] + b[i];
}

int main(void) {

__declspec(align(ALIGN )) float a[SIZE_ARRAY];
__declspec(align(ALIGN )) float b[SIZE_ARRAY];
__declspec(align(ALIGN )) float c[SIZE_ARRAY];

for (int i = 0; i < SIZE_ARRAY; ++i) {
a[i] = 5.;
b[i] = 2.;
}
double start_time = omp_get_wtime();
sum(a, b, c, SIZE_ARRAY);
double time = omp_get_wtime() - start_time;

printf("%0.6fs", time);
}


Edit: In a comment to this post, Krister Walfridsson not only caught an issue with my GCC code, for which mention I thank him, but he also shows the differences in machine code generated with and without alignment.

Data Locality

Computers are physical things, which means that data is physically stored and needs to be physically moved around in memory and between cache and processor in order to be used in calculations. This means that, if your data is stored all over the place in memory — e.g. in multiple pointer arrays in different parts of memory –, the processor will have to reach out to several parts of memory to fetch all your data before performing any computations. By having the data intelligently laid out in memory you ensure all data required for each computation is stored close to each other and in cache at the same time, which becomes even more important if your code uses too much data to fit in the cache at once.

In order to making your processor’s life easier, it is a good idea to ensure that all data required for a calculation step is close together. For example, if a given computation required three arrays of fixed sizes, it is always a good idea to merge them into one long array, as in the example below for the Intel compiler.

#include <stdio.h>
#include <omp.h>

#define SIZE_ARRAY 100000
#define ALIGN 64

void sum(double* restrict a, double* restrict b, double* restrict c, int n) {
__assume_aligned(a, ALIGN);
__assume_aligned(b, ALIGN);
__assume_aligned(c, ALIGN);
for (int i = 0; i < n; ++i)
c[i] = a[i] + b[i];
}

int main(void) {

__declspec(align(ALIGN )) float abc[3 * SIZE_ARRAY];

for (int i = 0; i < 2 * SIZE_ARRAY; ++i) {
a[i] = 5.;
b[i] = 2.;
}
double start_time = omp_get_wtime();
sum(abc, abc + SIZE_ARRAY, abc + 2 * ARRAY, SIZE_ARRAY);
double time = omp_get_wtime() - start_time;

printf("%0.6fs", time);
}


or even, since c[i] depends only on b[i] and a[i], we can have the values of a, b and c intercalated to assure that all computations will be performed on data that is right next to each other in memory:

#include <stdio.h>
#include <omp.h>

#define SIZE_ARRAY 100000
#define ALIGN 64
#define STRIDE 3

void sum(double* restrict abc, int n, int stride) {
__assume_aligned(abc, ALIGN);

for (int i = 0; i < n; i += stride)
abc[i+2] = abc[i] + abc[i+1];
}

int main(void) {
__declspec(align(ALIGN )) double abc[3 * SIZE_ARRAY];

for (int i = 0; i < 3 * SIZE_ARRAY; i += STRIDE) {
abc[i] = 5.;
abc[i+1] = 2.;
}
double start_time = omp_get_wtime();
sum(abc, SIZE_ARRAY, STRIDE );
double time = omp_get_wtime() - start_time;

printf("%0.6fs", time);
}


Conclusion

According a class project in which we had to write C code to perform matrix multiplication, the improvements suggested should may improve the performance of your code by 5 or 10 times. Also, the idea of data locality can be transferred to other languages, such as Python.

This semester I’m taking my first official CS class here at Cornell, CS 5220 Applications of Parallel Computers taught by Dave Bindel (for those of you in the Reed group or at Cornell, I would definitely recommend taking this class, which is offered every other year, once you have some experience coding in C or C++). In addition to the core material we’ve been learning in the class, I’ve been learning a lot by examining the structure and syntax of the code snippets written by the instructor and TA that are offered as starting points and examples for our class assignments. One element in particular  that stood out to me from our first assignment was the many function calls made through the makefile. This post will first take a closer look into the typical composition of a makefile and then examine how we can harness the structure of a makefile to help improve workflow on complicated projects.

## Dissecting a typical makefile

On the most basic level, a makefile simply consists of series of rules that each have an associated set of actions. Makefiles are how you use the “make” utility, a software package installed on all linux systems. Make has its own syntax similar to bash but with some distinct idiosyncrasies. For example, make allows you to store a snippet of code in whats called a “macro” (these are pretty much analogous to variables in most other languages).  A macro to store the flags you would like to run with your compiler could be defined like this:

CFLAGS  = -g -Wall

To referenced the CFLAGS macro, use a dollar sign and brackets, like this:

 $(CFLAGS) There are a series of “special” predifined macros that can be used in any makefile which are fairly common, you can find them here. Now that we’ve discussed makefile syntax, lets take a look at how rules are structured within a makefile. A rule specified by a makefile has the following shape: target: prerequisites recipe ... ... The target is usually the name of the file that is generated by a program, for example an executable or object file. A prerequisite is the specified input used to create the target (which can often depend on several files). The recipe is the action that make carries out for the intended target (note: this must be indented at every line). For example, a rule to build an executable called myProg from a c file called myProg.c using the gcc compiler with flags defined in CFLAGS might look like this: myProg: myProg.c gcc$(CFLAGS) $< ## Make the makefile do the work The most common rules within makefiles call the compiler to build code (hence the name “makefile”) and many basic makefiles are used for this sole purpose. However, a rule simply sends a series commands specified by its recipe to the command line and a rule can actually specify any action or series of actions that you want. A ubiquitous example of a rule that specifies an action is “clean”, which may be implemented like this: clean: rm -rf *.o$(PROGRAM)

Where PROGRAM is a macro containing the names of the executable compiled by the makefile.

In this example, the rule’s target is an action rather than an output file. You can call “clean” by simply typing “make clean” into the command line and you will remove all .o files and the executable PROGRAM from your working directory.

Utilizing this application of rules, we can now have our makefile do a lot of our command line work for us. For example, we could create a rule called “run” which submits a series of pbs jobs into a cluster.

run:
qsub job-$*.pbs We can then enter “make run” into the command line to execute this rule, which will submit the .pbs jobs for us (note that this will not perform any of the other rules defined in the makefile). Using this rule may be handy if we have a large number of jobs to submit. Alternatively we could make a rule called “plot” which calls a plotting function from python: plot: python plotter.py$(PLOTFILES)

Where PLOTFILES is a macro containing the name of files to be plotted and plotter.py is a python function that takes the file names as input.

Those are just two examples (loosely based on a makefile given in CS 5220) of how you can use a makefile to do your command line work for you, but the possibilities are endless!! Ok, maybe that’s getting a bit carried away, but I do find this functionality to be a simple and elegant way to improve the efficiency of your workflow on complex projects.

For some background on makefiles, I’d recommend taking a look a Billy’s post from last year. I also found the GNU make user manual helpful as well as this tutorial from Swarthmore that has some nice example makefiles.

# Developing parallelised code with MPI for dummies, in C (Part 2/2)

My last post introduced MPI and demonstrated a simple example for using it to parallelize a code across multiple nodes. In the previous example, we created an executable that could be run in parallel to complete the same task multiple times. But what if we want use MPI to on a code that has both parallel and serial sections, this is inevitable if we want everything to be self-contained.

As I tried to stress last time, MPI runs multiple versions of the same executable each with independent memory (please read this sentence three times, it is very different from how you learned to code). If you wish to share memory, you must explicitly send it. This allows no scope for a serial section!

We must, instead, imitate serial sections of code by designating a ‘root’ processor, conventionally the processor with rank = 0. We trap the ‘serial section’ inside an if-statement designating the root and send data to it from other processors when required.

Sending Data

I will build on the previous two examples by creating a loop that calculates the mean of a set of random numbers, we will parallelize the random number generation but leave the mean calculation in ‘serial’ (i.e. to be calculated by the root processor).

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(){
int size, rank,i;
MPI_Init(NULL,NULL);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
double randSum = 0;
srand(rank + 1);
double myRand = (double)rand()/(double)RAND_MAX;
printf("I evaluated rank = %d, myRand = %f\n",rank,myRand);
if (rank == 0){
for (i=0;i<size;i++){
if (i > 0){
MPI_Recv(&myRand, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
}
randSum = randSum + myRand;
}
printf("Mean random number = %f\n",randSum/size);
}
else{
MPI_Send(&myRand, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD);
}
MPI_Finalize();
return 0;
}

For contrast with a regular serial version:

#include <stdio.h>
#include <stdlib.h>
int main(){
int rank,size = 10;
MPI_Init(NULL,NULL);
double randSum = 0;
srand(rank + 1);
double myRand = (double)rand()/(double)RAND_MAX;
printf("I evaluated rank = %d, myRand = %f\n",rank,myRand);
if (rank == 0){
for (rank = 0; rank < size; ++rank){
srand(rank + 1);
randSum = randSum + (double)rand()/(double)RAND_MAX;
printf("I evaluated rank = %d, myRand = %f\n",rank,myRand);}
printf("Mean random number = %f\n",randSum/size);
}
}
return 0;
}

We introduce here two new MPI functions:

MPI_Send(data address, size of data, MPI type of data, processor destination (by rank), tag, communicator) sends the random number to the root (rank 0).

MPI_Recv(data address, size of data, MPI type of data, processor source (by rank), tag, communicator, status suppression) tells a processor, in our case the root, to receive data from a processor source.

Both MPI_Send and MPI_Recv prevent code from progressing further until the send-> receive is resolved. i.e. when rank = 5 reaches send, it will wait until rank = 0 has received data from ranks 1:4 before sending the data and progressing further.

Sending data between processors in MPI is moderately expensive, so we want to call send/recv as few times as possible. This means that vectors should be sent in one, rather than in a loop. It also means that when sending data from one processor to all (most commonly from the root), it is more efficient to use the built in ‘broadcast’ rather than sending to each processor individually (the reason for this is explained in: http://mpitutorial.com/tutorials/mpi-broadcast-and-collective-communication/).

Below we will introduce an example where the root broadcasts how many random numbers each processor should create, these vectors of random numbers are then sent back to the root for mean calculation.

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(){
int size, rank,i,j;
MPI_Init(NULL,NULL);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
double randSum = 0;
int numRands;
srand(rank+1);
if (rank == 0){
numRands = 5;
MPI_Bcast(&numRands,1,MPI_INT,0,MPI_COMM_WORLD);
}
else{
MPI_Bcast(&numRands,1,MPI_INT,0,MPI_COMM_WORLD);
}
double *myRand = calloc(numRands,sizeof(double));
for (i =0;i<numRands;++i){
myRand[i] = (double)rand()/(double)RAND_MAX;
}
if (rank == 0){
for (i=0;i<size;i++){
printf("root received from rank %d the vector: ",i);
if (i > 0){
MPI_Recv(myRand, numRands, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
}
for (j=0;j<numRands;j++){
printf("%f ",myRand[j]);
randSum = randSum + myRand[j];
}
printf("\n");
}
printf("Mean random number = %f\n",randSum/(size*numRands));
}
else{
MPI_Send(myRand, numRands, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD);
}
free(myRand);
MPI_Finalize();
return 0;
}

We have new used the new MPI function:

MPI_Bcast(data address, size of data, MPI data type, processor source, communicator) broadcasts from the processor source (in our case the root) to all other processors, readers should note the common mistake of using MPI_Recv instead of MPI_Bcast to receive the data; MPI_Bcast is the function to both send and receive data.

Another simple but common mistake that readers should note is the passing of dynamically sized data; note how myRand is sent without the & address operator (because the variable itself is an address) while numRands is sent with the & operator.

Concluding remarks

This tutorial should set you up to use much of the MPI functionality you need to parallelise your code. Some natural questions that may have arisen while reading this tutorial that we did not cover:

MPI_Barrier – while MPI_Send/Recv/Bcast require processors to ‘catch up’, if you are writing and reading data to files (particularly if a processor must read data written by another processor) then you need to force the processors to catch up; MPI_Barrier achieves this.

tags – you can contain metadata that can be described by integers (e.g. vector length or MPI data type) in the ‘tag’ option for MPI_Send/Recv.

MPI_Status – this structure can contain details about the data received (rank, tag and length of the message), although much of the time this will be known in advance. Since receiving the status can be expensive, MPI_STATUS_IGNORE is used to supress the status structure.

All of the MPI functions described in this tutorial are only a subset of those available that I have found useful in parallelizing my current applications. An exhaustive list can be found at: http://www.mpich.org/static/docs/latest/. If you want to go beyond the functions described in this post (or you require further detail) I would recommend: http://mpitutorial.com/tutorials/.

# Developing parallelised code with MPI for dummies, in C (Part 1/2)

Parallel computing allows for faster implementation of a code by enabling the simultaneous execution of multiple tasks. Before we dive in to how parallelisation of a code is achieved, let’s briefly review the components that make up a high performance computing (HPC) cluster (it should be noted that you can parallelise code on your own computer, but this post will focus on parallelisation on clusters).  High performance computing clusters are usually comprised of a network of individual computers known as nodes that function together as a single computing resource as shown in Figure 1. Each node has some number of processors (the chip within a node that actually executes instructions) and modern processors may contain multiple cores, each of which can execute operations independently. Processors performing tasks on the same node have access to shared memory, meaning they can write and reference the same memory locations as they execute tasks. Memory is not shared between nodes however, so operations that run on multiple nodes use what’s known as distributed-memory programming. In order to properly manage tasks using distributed memory, nodes must have a way to pass information to each other.

Figure 1: One possible configuration of a HPC cluster, based on the Cornell CAC presentation linked in the following paragraph.

Parallelization is commonly performed using OpenMP or MPI.  OpenMP (which stands for Open Multi-Processing) parallelises operations by multithreading, running tasks on multiple cores/units within a single node using shared memory.  MPI (which stands for Message Passing Interface) parallelises tasks by distributing them over multiple nodes (also possible over multiple processors) within a network, utilizing the distributed memory of each node. This has two practical implications; MPI is needed to scale to more nodes but communication between tasks is harder. The two parallelisation methods are not mutually exclusive; you could use OpenMP to parallelise operations on individual network nodes and MPI to communicate between nodes on the network (example: http://www.slac.stanford.edu/comp/unix/farm/mpi_and_openmp.html). Both OpenMP and MPI support your favourite languages (C, C++, FORTRAN, Python, but not Java – perfect!). The remainder of this post will focus on implementing MPI in your code, for references on using OpenMP, see this presentation by the Cornell Center for Advanced Computing: https://www.cac.cornell.edu/education/training/ParallelMay2012/ProgOpenMP.pdf.

So how does MPI work?

MPI creates multiple instances of your executable and runs one on each processor you have specified for use. These processors can communicate with each other using specific MPI functions. I will explain a few of the more basic functions in this post.

What does MPI code look like?

A parallel loop (compiles with: mpicc -O3 -o exampleParallel.exe exampleParallel.c -lm)

#include
#include
#include
int main(){
int size, rank;
MPI_Init(NULL,NULL);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf("I evaluated rank = %d\n",rank);
MPI_Finalize();
return 0;
}

The parallel loop can be distributed over 10 processors (ppn) in one node and submitted to the Cube as a Unix script:

#!/bin/bash
#PBS -N exampleParallel
#PBS -l nodes=1:ppn=10
#PBS -l walltime=0:00:05
#PBS -j oe
#PBS -o output
cd \$PBS_O_WORKDIR
mpirun ./exampleParallel.exe

This can be contrasted with a serial loop (compiles with: gcc -O3 -o exampleSerial.exe exampleSerial.c):

#include
#include
int main(){
int rank, size = 10;
for (rank = 0; rank < size; rank++){
printf("I evaluated rank = %d\n",rank);
}
return 0;}

Let’s have a look at what each line of the parallel code is doing;

MPI_Init is the first MPI function that must be called in a piece of code in order to initialize the MPI environment. NOTE! MPI_Init does not signify the start of a ‘parallel section’, as I said earlier, MPI does not have parallel sections, it runs multiple instances of the same executable in parallel.

MPI_Comm_size populates an integer address with the number of processors in a group, in our example, 10 (i.e. num nodes * processors per node).

MPI_Comm_rank populates an integer address with the processor number for the current instance of the executable. This ‘rank’ variable is the main way to differentiate between different instances of your executable, it is equivalent to a loop counter.

MPI_Finalize is the last MPI function that must be called in a piece of code, it terminates the MPI environment. As far as I can tell it is only recommended to have return functions after MPI_Finalize.

This simple example highlights the difference between MPI and serial code; that each executable is evaluated separately in parallel. While this makes MPI hard to code, and sharing data between parallel processes expensive, it also makes it much easier to distribute across processors.

Next week we will present examples demonstrating how to send data between nodes and introduce serial sections of code.