Water Programming Blog Guide (Part I)

The Water Programming blog continues to expand collaboratively through contributors’ learning experiences and their willingness to share their knowledge in this blog.  It now covers a wide variety of topics ranging from quick programming tips to comprehensive literature reviews pertinent to water resources research and multi-objective optimization.  This post intends to provide guidance to new, and probably to current users by bringing to light what’s available in the blog and by developing a categorization of topics.

This first post will cover:

Software requirements

1.Programming languages and IDEs

2.Frameworks of optimization, sensitivity analysis and decision support

3.The Borg MOEA

4.Parallel computing

Part II)  will focus  on version control, spatial data and maps, conceptual posts and literature reviews.  And finally Part III)  will cover visualization and figure editing, LaTex and literature management, tutorials and miscellaneous research and programming tricks.

*Note to my fellow bloggers: Feel free to disagree and make suggestions on the categorization of your posts, also your thoughts on facilitating an easier navigation through the blog are very welcomed.  For current contributors,  its always suggested to tag and categorize your post, you can also follow the guidelines in Some WordPress Tips to enable a better organization of our blog.  Also, if you see a 💡, it means that a blog post idea has been identified.

Software Requirements

If you are new to the group and would like to know what kind of software you require to get started with research.  Joe summed it up pretty well in his  New Windows install? Here’s how to get everything set up post, where he points out all the installations that you should have on your desktop.  Additionally, you can find some guidance on:  Software to Install on Personal Computers and Software to Install on Personal Computers – For Mac Users!. You may also want to check out  What do you need to learn?  if you are entering the group.  These posts are subject to update 💡 but they are a good starting point.

1. Programming languages and Integrated Development Environments (IDEs)

Dave Hadka’s Programming Language Overview provides a summary of key differences between the C, C++, Python and Java.  The programming tips found in the blog cover Python, C, C++,  R and Matlab, there are also some specific instances were Java is used which will be discussed in section 2.  I’ll give some guidance on how to get started on each of these programming languages and point out some useful resources in the blog.

1.1. Python

Python is  a very popular programming language in our group so there’s sufficient guidance and resources available in the blog.  Download is available here, also some online tutorials that I really recommend to get you up to speed with Python are:  learn python the hard waypython for everybody and codeacademy.   Additionally,  stackoverflow is a useful resource for specific tasks.  The python resources available in our blog are outlined as follows:

Data analysis and organization

Data analysis libraries that you may want to check out are  scipy,   numpy   and pandas. Here are some related posts:

Importing, Exporting and Organizing Time Series Data in Python – Part 1 and  Part 2

Comparing Data Sets: Are Two Data Files the Same?

Using Python IDEs

The use of an integrated development environment (IDE) can enable code development and  make the debugging process easier.  Tom has done a good amount of development in PyCharm, so he has generated a  sequence of posts that provide guidance on how to take better advantage of PyCharm:

A Guide to Using Git in PyCharm – Part 1 , Part 2

Debugging in Python (using PyCharm) – Part 1 , Part 2 and Part 3

PyCharm as a Python IDE for Generating UML Diagrams

Josh also provides instructions to setup PyDev in eclipse in his Setting up Python and Eclipse post,  another Python IDE that you may want to check out is Spyder.

Plotting

The plotting library for python is matplotlib.  Some of the example found in the blog will provide some guidance on importing and using the library.  Matt put together a github repository with several  Matlab and Matplotlib Plotting Examples, you can also find guidance on generating more specialized plots:

Customizing color matrices in matplotlib

Easy vectorized parallel plots for multiple data sets

Interactive plotting basics in matplotlib

Python Data Analysis Part 1a: Borg Runtime Metrics Plots (Preparing the Data) , Part 1b: Setting up Matplotlib and Pandas , Part 2: Pandas / Matplotlib Live Demo.

Miscellaneous  Python tips and tricks

Other applications in Python that my fellow bloggers have found useful are related to machine learning:  Basic Machine Learning in Python with Scikit-learn,  Solving systems of equations: Root finding in MATLAB, R, Python and C++  and using  Python’s template class.

1.2. Matlab

Matlab with its powerful toolkit, easy-to-use IDE and high-level language can be used for quick development as long as you are not concerned about speed.  A major disadvantage of this software is that it is not free … fortunately I have a generous boss paying for it.  Here are examples of Matlab applications available in the blog:

A simple command for plotting autocorrelation functions in Matlab

Plotting Probability Ellipses for Bivariate Normal Distributions

Solving Analytical Algebra/Calculus Expressions with Matlab

Generating .gifs from Matlab Figures

Code Sample: Stacked Bars and Lines in Matlab

1.3. C++

I have heard C++ receive extremely unflattering nicknames lately, it is a low-level language which means that you need to worry about everything, even memory allocation, but the fact is that it is extremely fast and powerful and is widely used in the group for modeling, simulation and optimization purposes which would take forever in other  languages.

Getting started

If you are getting started with C++,there are some online  tutorials , and you may want to check out the following material available in the blog:

Setting up Eclipse for C/C++

Getting started with C and C++

Matt’s Thoughts on C++

Training

Here is some training material that Joe put together:

C++ Training: Libraries , C++ Training: Valgrind ,  C++ Training: Makefiles ,  C++ Training: Exercise 1C++ Training: Using gprofCompiling Code using Makefiles

Debugging

If you are developing code in C++ is probably a good idea to install an IDE,  I recently started using CLion, following Bernardo’s and Dave’s recommendation, and I am not displeased with it.  Here are other posts available within this topic:

Quick testing of code online

Debugging the NWS model: lessons learned

Sample code

If you are looking for sample code of commonly used processes in C++, such as defining vectors and arrays, generating output files and timing functions, here are some examples:

C++: Vectors vs. Arrays

A quick example code to write data to a csv file in C++

Code Sample: Timing Functions for C++

1.4. R

R is another free open source environment widely used for statistics.  Joe recommends a reading in his Programming language R is gaining prominence in the scientific community post.  Downloads are available here.  If you happen to use an R package for you research, here’s some guidance on How to cite packages in R.  R also supports a very nice graphics package and the following posts provide plotting examples:

Survival Function Plots in R

Easy labels for multi-panel plots in R

R plotting examples

Parallel plots in R

1.5. Command line/ Linux:

Getting familiar with the command line and linux environment is essential to perform many of the examples and tutorials available in the blog.  Please check out the   Terminal basics for the truly newbies if you want an introduction to the terminal basics and requirements, also take a look at Using gdb, and notes from the book “Beginning Linux Programming”.   Also check out some useful commands:

Using linux “cut” , Using linux “split” , Using linux “grep”

Useful Linux commands to handle text files and speed up work

Using Linux input redirection in a C++ code test

Emacs in Cygwin

2. Frameworks for optimization, sensitivity analysis, and decision support

We use a variety of free open source libraries to perform commonly used analysis in our research.  Most of the libraries that I outline here were developed by our very own contributors.

2.2. MOEAFramework

I have personally used this framework for most of my research.  It has great functionality and speed. It is an open source Java library that supports several multi-objective evolutionary algorithms (MOEAs) and provides  tools to statistically test their performance.  It has other powerful capabilities for sensitivity and data analysis.   Download and documentation material are available here.  In addition to the documentation and examples provided on the MOEAFramework site, other useful resources and guidance can be found in the following posts:

Setup guidance

MOEAframework on Windows

How to specify constraints in MOEAframework (External Problem)

Additional information on MOEAframework Setup Guide

Extracting data

Extracting Data from Borg Runtime Files

Runtime metrics for MOEAFramework algorithms, extracting metadata from Borg runtime, and handling infinities

Parameter Description Files for the MOEAFramework algorithms and for the Borg MOEA

Other uses

Running Sobol Sensitivity Analysis using MOEAFramework

Speeding up algorithm diagnosis by epsilon-sorting runtime files

2.2. Project Platypus

This is the newest python framework developed by Dave Hadka that support a collection of libraries for optimization, sensitivity analysis, data analysis and decision making.  It’s free to download in the Project Platypus github repository .  The repository comes with its own documentation and examples.  We are barely beginning to document our experiences with this platform 💡, but it is very intuitive and user friendly.  Here is the documentation available in the blog so far:

A simple allocation optimization problem in Platypus

Rhodium – Open Source Python Library for (MO)RDM

Using Rhodium for RDM Analysis of External Dataset

 

 

2.3. OpenMORDM

This is an open source library in R for Many Objective robust decision making (MORDM), for more details and documentation on both MORDM and the library use, check out the following post:

Introducing OpenMORDM

2.4. SALib

SALib is a python library developed by Jon Herman that supports commonly used methods to perform sensitivity analysis.  It is available here, aside from the documentation available in the github repository,  you can also find  guidance on some of the available methods in the following posts:

Method of Morris (Elementary Effects) using SALib

Extensions of SALib for more complex sensitivity analyses

Running Sobol Sensitivity Analysis using SALib

SALib v0.7.1: Group Sampling & Nonuniform Distributions

There’s also an R Package for sentitivity analysis:  Starting out with the R Sensitivity Package.  Since we are on the subject, Jan Kwakkel provides guidance on Scenario discovery in Python as well.

2.5. Pareto sorting function in python (pareto.py)

This is a non-dominated sorting function for multi-objective problems in python available in Matt’s github repository.  You can find more information about it in the following posts:

Announcing version 1.0 of pareto.py

Announcing pareto.py: a free, open-source nondominated sorting utility

3.  Borg MOEA

The Borg Multi-objective Evolutionary Algorithm (MOEA) developed by Dave Hadka and Pat Reed, is widely used in our group due to its ability to tackle complex many-objective problems.   We have plenty of documentation and references in our blog so you can get familiar with it.

3.1. Basic Implementation

You can find a brief introduction and basic use in Basic Borg MOEA use for the truly newbies (Part 1/2) and (Part 2/2).  If you want to link your own simulation model to the optimization algorithm, you may want to check: Compiling, running, and linking a simulation model to Borg: LRGV Example.  Here are other Borg-related posts in the blog:

Basic implementation of the parallel Borg MOEA

Simple Python script to create a command to call the Borg executable

Compiling shared libraries on Windows (32 bit and 64 bit systems)

Collecting Borg’s operator dynamics

3.2. Borg MOEA Wrappers

There are Borg MOEA wrappers available for a number of languages.   Currently the Python, Matlab and Perl wrappers are  documented in the blog.  I believe an updated version of the Borg Matlab wrapper for OSX documentation is required at the moment 💡.

Using Borg in Parallel and Serial with a Python Wrapper – Part 1

Using Borg in Parallel and Serial with a Python Wrapper – Part 2

Setting Borg parameters from the Matlab wrapper

Compiling the Borg Matlab Wrapper (Windows)

Compiling the Borg Matlab Wrapper (OSX/Linux)

Code Sample: Perl wrapper to run Borg with the Variable Infiltration Capacity (VIC) model

4. High performance computing (HPC)

With HPC we can handle and analyse massive amounts of data at high speed. Tasks that would normally take months can be done in days or even minutes and it can help us tackle very complex problems.  In addition, here are some Thoughts on using models with a long evaluation time within a Parallel MOEA framework from Joe.

In the group we have a healthy availability of HPC resources; however, there are some logistics involved when working with computing clusters.  Luckily, most of our contributors have experience using HPC and have documented it in the blog.   Also, I am currently using the  MobaXterm interface to facilitate file transfer between my local and remote directories, it also enables to easily navigate and edit files in your remote directory.  It is used by our collaborators in Politecnico di Milano who recommended it to Julie who then recommended it to me.   Moving on, here are some practical resources when working with remote clusters:

4.1. Getting started with clusters and key commands

Python for automating cluster tasks: Part 1, Getting started and Part 2, More advanced commands

The Cluster and Basic UNIX Commands

Using a local copy of Boost on your cluster account

4.2. Submission scripts in  Bash

Some ideas for your Bash submission scripts

Key bindings for Bash history-search

4.3. Making bash sessions more enjoyable

Speed up your Bash sessions with “alias”

Get more screens … with screen

Running tmux on the cluster

Making ssh better

4.4. Portable Batch System (PBS)

Job dependency in PBS submission

PBS Job Submission with Python

PBS job chaining

Common PBS Batch Options

4.5. Python parallelization and speedup

Introduction to mpi4py

Re-evaluating solutions using Python subprocesses

Speed up your Python code with basic shared memory parallelization

Connecting to an iPython HTML Notebook on the Cluster Using an SSH Tunnel

NumPy vectorized correlation coefficient

4.6. Debugging

Debugging: Interactive techniques and print statements

Debugging MPI By Dave Hadka

4.7. File transfer

Globus Connect for Transferring Files Between Clusters and Your Computer

 

Advertisements

4 thoughts on “Water Programming Blog Guide (Part I)

  1. Quick question that you might want to address is the blogpost (or perhaps other posts such as the fresh Windows install post): which version of software/languages are commonplace in the group?

    An example would be whether Python 3.6 or 2.7 is the standard.

    Thanks!

  2. Pingback: Water Programming Blog Guide (Part 2) – Water Programming: A Collaborative Research Blog

  3. Pingback: Water Programming Blog Guide (3) – Water Programming: A Collaborative Research Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s