The Water Programming blog continues to expand collaboratively through contributors’ learning experiences and their willingness to share their knowledge in this blog. It now covers a wide variety of topics ranging from quick programming tips to comprehensive literature reviews pertinent to water resources research and multi-objective optimization. This post intends to provide guidance to new, and probably to current users by bringing to light what’s available in the blog and by developing a categorization of topics.
This first post will cover:
Software requirements
1.Programming languages and IDEs
2.Frameworks of optimization, sensitivity analysis and decision support
3.The Borg MOEA
4.Parallel computing
Part II) will focus on version control, spatial data and maps, conceptual posts and literature reviews. And finally Part III) will cover visualization and figure editing, LaTex and literature management, tutorials and miscellaneous research and programming tricks.
*Note to my fellow bloggers: Feel free to disagree and make suggestions on the categorization of your posts, also your thoughts on facilitating an easier navigation through the blog are very welcomed. For current contributors, its always suggested to tag and categorize your post, you can also follow the guidelines in Some WordPress Tips to enable a better organization of our blog. Also, if you see a 💡, it means that a blog post idea has been identified.
Software Requirements
If you are new to the group and would like to know what kind of software you require to get started with research. Joe summed it up pretty well in his New Windows install? Here’s how to get everything set up post, where he points out all the installations that you should have on your desktop. Additionally, you can find some guidance on: Software to Install on Personal Computers and Software to Install on Personal Computers – For Mac Users!. You may also want to check out What do you need to learn? if you are entering the group. These posts are subject to update 💡 but they are a good starting point.
1. Programming languages and Integrated Development Environments (IDEs)
Dave Hadka’s Programming Language Overview provides a summary of key differences between the C, C++, Python and Java. The programming tips found in the blog cover Python, C, C++, R and Matlab, there are also some specific instances were Java is used which will be discussed in section 2. I’ll give some guidance on how to get started on each of these programming languages and point out some useful resources in the blog.
1.1. Python
Python is a very popular programming language in our group so there’s sufficient guidance and resources available in the blog. Download is available here, also some online tutorials that I really recommend to get you up to speed with Python are: learn python the hard way, python for everybody and codeacademy. Additionally, stackoverflow is a useful resource for specific tasks. The python resources available in our blog are outlined as follows:
Data analysis and organization
Data analysis libraries that you may want to check out are scipy, numpy and pandas. Here are some related posts:
Importing, Exporting and Organizing Time Series Data in Python – Part 1 and Part 2
Comparing Data Sets: Are Two Data Files the Same?
Using Python IDEs
The use of an integrated development environment (IDE) can enable code development and make the debugging process easier. Tom has done a good amount of development in PyCharm, so he has generated a sequence of posts that provide guidance on how to take better advantage of PyCharm:
A Guide to Using Git in PyCharm – Part 1 , Part 2
Debugging in Python (using PyCharm) – Part 1 , Part 2 and Part 3
PyCharm as a Python IDE for Generating UML Diagrams
Josh also provides instructions to setup PyDev in eclipse in his Setting up Python and Eclipse post, another Python IDE that you may want to check out is Spyder.
Plotting
The plotting library for python is matplotlib. Some of the example found in the blog will provide some guidance on importing and using the library. Matt put together a github repository with several Matlab and Matplotlib Plotting Examples, you can also find guidance on generating more specialized plots:
Customizing color matrices in matplotlib
Easy vectorized parallel plots for multiple data sets
Interactive plotting basics in matplotlib
Python Data Analysis Part 1a: Borg Runtime Metrics Plots (Preparing the Data) , Part 1b: Setting up Matplotlib and Pandas , Part 2: Pandas / Matplotlib Live Demo.
Miscellaneous Python tips and tricks
Other applications in Python that my fellow bloggers have found useful are related to machine learning: Basic Machine Learning in Python with Scikit-learn, Solving systems of equations: Root finding in MATLAB, R, Python and C++ and using Python’s template class.
1.2. Matlab
Matlab with its powerful toolkit, easy-to-use IDE and high-level language can be used for quick development as long as you are not concerned about speed. A major disadvantage of this software is that it is not free … fortunately I have a generous boss paying for it. Here are examples of Matlab applications available in the blog:
A simple command for plotting autocorrelation functions in Matlab
Plotting Probability Ellipses for Bivariate Normal Distributions
Solving Analytical Algebra/Calculus Expressions with Matlab
Generating .gifs from Matlab Figures
Code Sample: Stacked Bars and Lines in Matlab
1.3. C++
I have heard C++ receive extremely unflattering nicknames lately, it is a low-level language which means that you need to worry about everything, even memory allocation, but the fact is that it is extremely fast and powerful and is widely used in the group for modeling, simulation and optimization purposes which would take forever in other languages.
Getting started
If you are getting started with C++,there are some online tutorials , and you may want to check out the following material available in the blog:
Setting up Eclipse for C/C++
Getting started with C and C++
Matt’s Thoughts on C++
Training
Here is some training material that Joe put together:
C++ Training: Libraries , C++ Training: Valgrind , C++ Training: Makefiles , C++ Training: Exercise 1, C++ Training: Using gprof, Compiling Code using Makefiles
Debugging
If you are developing code in C++ is probably a good idea to install an IDE, I recently started using CLion, following Bernardo’s and Dave’s recommendation, and I am not displeased with it. Here are other posts available within this topic:
Quick testing of code online
Debugging the NWS model: lessons learned
Sample code
If you are looking for sample code of commonly used processes in C++, such as defining vectors and arrays, generating output files and timing functions, here are some examples:
C++: Vectors vs. Arrays
A quick example code to write data to a csv file in C++
Code Sample: Timing Functions for C++
1.4. R
R is another free open source environment widely used for statistics. Joe recommends a reading in his Programming language R is gaining prominence in the scientific community post. Downloads are available here. If you happen to use an R package for you research, here’s some guidance on How to cite packages in R. R also supports a very nice graphics package and the following posts provide plotting examples:
Survival Function Plots in R
Easy labels for multi-panel plots in R
R plotting examples
Parallel plots in R
1.5. Command line/ Linux:
Getting familiar with the command line and linux environment is essential to perform many of the examples and tutorials available in the blog. Please check out the Terminal basics for the truly newbies if you want an introduction to the terminal basics and requirements, also take a look at Using gdb, and notes from the book “Beginning Linux Programming”. Also check out some useful commands:
Using linux “cut” , Using linux “split” , Using linux “grep”
Useful Linux commands to handle text files and speed up work
Using Linux input redirection in a C++ code test
Emacs in Cygwin
2. Frameworks for optimization, sensitivity analysis, and decision support
We use a variety of free open source libraries to perform commonly used analysis in our research. Most of the libraries that I outline here were developed by our very own contributors.
2.2. MOEAFramework
I have personally used this framework for most of my research. It has great functionality and speed. It is an open source Java library that supports several multi-objective evolutionary algorithms (MOEAs) and provides tools to statistically test their performance. It has other powerful capabilities for sensitivity and data analysis. Download and documentation material are available here. In addition to the documentation and examples provided on the MOEAFramework site, other useful resources and guidance can be found in the following posts:
Setup guidance
MOEAframework on Windows
How to specify constraints in MOEAframework (External Problem)
Additional information on MOEAframework Setup Guide
Extracting data
Extracting Data from Borg Runtime Files
Runtime metrics for MOEAFramework algorithms, extracting metadata from Borg runtime, and handling infinities
Parameter Description Files for the MOEAFramework algorithms and for the Borg MOEA
Other uses
Running Sobol Sensitivity Analysis using MOEAFramework
Speeding up algorithm diagnosis by epsilon-sorting runtime files
2.2. Project Platypus
This is the newest python framework developed by Dave Hadka that support a collection of libraries for optimization, sensitivity analysis, data analysis and decision making. It’s free to download in the Project Platypus github repository . The repository comes with its own documentation and examples. We are barely beginning to document our experiences with this platform 💡, but it is very intuitive and user friendly. Here is the documentation available in the blog so far:
A simple allocation optimization problem in Platypus
Rhodium – Open Source Python Library for (MO)RDM
Using Rhodium for RDM Analysis of External Dataset
2.3. OpenMORDM
This is an open source library in R for Many Objective robust decision making (MORDM), for more details and documentation on both MORDM and the library use, check out the following post:
Introducing OpenMORDM
2.4. SALib
SALib is a python library developed by Jon Herman that supports commonly used methods to perform sensitivity analysis. It is available here, aside from the documentation available in the github repository, you can also find guidance on some of the available methods in the following posts:
Method of Morris (Elementary Effects) using SALib
Extensions of SALib for more complex sensitivity analyses
Running Sobol Sensitivity Analysis using SALib
SALib v0.7.1: Group Sampling & Nonuniform Distributions
There’s also an R Package for sentitivity analysis: Starting out with the R Sensitivity Package. Since we are on the subject, Jan Kwakkel provides guidance on Scenario discovery in Python as well.
2.5. Pareto sorting function in python (pareto.py)
This is a non-dominated sorting function for multi-objective problems in python available in Matt’s github repository. You can find more information about it in the following posts:
Announcing version 1.0 of pareto.py
Announcing pareto.py: a free, open-source nondominated sorting utility
3. Borg MOEA
The Borg Multi-objective Evolutionary Algorithm (MOEA) developed by Dave Hadka and Pat Reed, is widely used in our group due to its ability to tackle complex many-objective problems. We have plenty of documentation and references in our blog so you can get familiar with it.
3.1. Basic Implementation
You can find a brief introduction and basic use in Basic Borg MOEA use for the truly newbies (Part 1/2) and (Part 2/2). If you want to link your own simulation model to the optimization algorithm, you may want to check: Compiling, running, and linking a simulation model to Borg: LRGV Example. Here are other Borg-related posts in the blog:
Basic implementation of the parallel Borg MOEA
Simple Python script to create a command to call the Borg executable
Compiling shared libraries on Windows (32 bit and 64 bit systems)
Collecting Borg’s operator dynamics
3.2. Borg MOEA Wrappers
There are Borg MOEA wrappers available for a number of languages. Currently the Python, Matlab and Perl wrappers are documented in the blog. I believe an updated version of the Borg Matlab wrapper for OSX documentation is required at the moment 💡.
Using Borg in Parallel and Serial with a Python Wrapper – Part 1
Using Borg in Parallel and Serial with a Python Wrapper – Part 2
Setting Borg parameters from the Matlab wrapper
Compiling the Borg Matlab Wrapper (Windows)
Compiling the Borg Matlab Wrapper (OSX/Linux)
Code Sample: Perl wrapper to run Borg with the Variable Infiltration Capacity (VIC) model
4. High performance computing (HPC)
With HPC we can handle and analyse massive amounts of data at high speed. Tasks that would normally take months can be done in days or even minutes and it can help us tackle very complex problems. In addition, here are some Thoughts on using models with a long evaluation time within a Parallel MOEA framework from Joe.
In the group we have a healthy availability of HPC resources; however, there are some logistics involved when working with computing clusters. Luckily, most of our contributors have experience using HPC and have documented it in the blog. Also, I am currently using the MobaXterm interface to facilitate file transfer between my local and remote directories, it also enables to easily navigate and edit files in your remote directory. It is used by our collaborators in Politecnico di Milano who recommended it to Julie who then recommended it to me. Moving on, here are some practical resources when working with remote clusters:
4.1. Getting started with clusters and key commands
Python for automating cluster tasks: Part 1, Getting started and Part 2, More advanced commands
The Cluster and Basic UNIX Commands
Using a local copy of Boost on your cluster account
4.2. Submission scripts in Bash
Some ideas for your Bash submission scripts
Key bindings for Bash history-search
4.3. Making bash sessions more enjoyable
Speed up your Bash sessions with “alias”
Get more screens … with screen
Running tmux on the cluster
Making ssh better
4.4. Portable Batch System (PBS)
Job dependency in PBS submission
PBS Job Submission with Python
PBS job chaining
Common PBS Batch Options
4.5. Python parallelization and speedup
Introduction to mpi4py
Re-evaluating solutions using Python subprocesses
Speed up your Python code with basic shared memory parallelization
Connecting to an iPython HTML Notebook on the Cluster Using an SSH Tunnel
NumPy vectorized correlation coefficient
4.6. Debugging
Debugging: Interactive techniques and print statements
Debugging MPI By Dave Hadka
4.7. File transfer
Globus Connect for Transferring Files Between Clusters and Your Computer