Water Programming Blog Guide (Part 2)

Water Programming Blog Guide (Part 1)

This second part of the blog guide will cover the following topics:

  1. Version control using git
  2. Generating maps and working with spatial data in python
  3. Reviews on synthetic streamflow and synthetic weather generation
  4. Conceptual posts

1. Version Control using git

If you are developing code it’s worth the time to gain familiarity with git to maintain reliable and stable development.  Git allows a group of people to work together developing large projects minimizing the chaos when multiple people are editing the same files.   It is also valuable for individual projects as it allows you to have multiple versions of a project, show the changes that you have made over time and undo those changes if necessary.  For a quick introduction to git terminology and functionality, check out  Getting Started: Git and GitHub. The Intro to git Part 1: Local version control and  Intro to git Part 2: Remote Repositories  posts will guide you through your first git project (local or remote) while providing a set of useful commands.  Other specialized tips can be found in: Git branch in bash prompt and GitHub Pages. And if you are wondering how to use git with pycharm, you’ll find these couple of posts useful: A Guide to Using Git in PyCharm – Part 1A Guide to Using Git in PyCharm – Part 2

2. Generating maps and working with spatial data in python

To learn more about python’s capabilities on this subject,  this  lecture  summarizes key python libraries relevant for spatial analysis.  Also,  Julie and the Jons have documented their efforts when working with spatial data and with python’s basemap, leaving us with some valuable examples:

Working with raster data

Python – Extract raster data value at a point

Python – Clip raster data with a shapefile

Using arcpy to calculate area-weighted averages of gridded spatial data over political units (Part 1) , (Part 2)

Generating maps

Making Watershed Maps in Python

Plotting geographic data from geojson files using Python

Generating map animations

Python makes the world go ’round

Making Movies of Time-Evolving Global Maps with Python

3. Reviews on synthetic streamflow and weather generation

We are lucky to have thorough reviews on synthetic weather and synthetic streamflow generation written by our experts Julie and Jon L.  The series on synthetic weather generation consists of five parts. Part I and Part II cover parametric and non-parametric methods, respectively. Part III covers multi-site generation.  Part IV discusses how to modify both parametric and non-parametric methods to simulate weather with climate change projections and finally Part V covers how to simulate weather with seasonal climate forecasts:

Synthetic Weather Generation: Part I , Part II , Part III , Part IV , Part V

The synthetic streamflow review provides a historical perspective while answering key questions on “Why do we care about synthetic streamflow generation?  “Why do we use it in water resources planning and management? and “What are the different methods available?

Synthetic streamflow generation

4.  Conceptual posts

Multi-objective evolutionary algorithms (MOEAs)

We frequently use multi-objective evolutionary algorithms due to their power and flexibility to solve multi-objective problems in water resources applications, so you’ll find sufficient documentation in the blog on basic concepts, applications and performance metrics:

MOEAs: Basic Concepts and Reading

You have a problem integrated into your MOEA, now what?

On constraints within MOEAs

MOEA Performance Metrics

Many Objective Robust Decision Making (MORDM) and Problem framing

The next post discusses the MORDM framework which combines many objective evolutionary optimization, robust decision making, and interactive visual analytics to frame and solve many objective problems under uncertainty.  This is a valuable reading along with the references within.  The second post listed provides a systematic way of thinking about problem formulation and defines the key components of a many-objective problem:

Many Objective Robust Decision Making (MORDM): Concepts and Methods

“The Problem” is the Problem Formulation! Definitions and Getting Started

Econometric analysis and handling multi-variate data

To close this second part of the blog guide, I leave you with a couple selected topics  from the Econometrics and Multivariate statistics courses at Cornell documented by Dave Gold:

A visual introduction to data compression through Principle Component Analysis

Dealing With Multicollinearity: A Brief Overview and Introduction to Tolerant Methods

 

Advertisements

Water Programming Blog Guide (Part I)

The Water Programming blog continues to expand collaboratively through contributors’ learning experiences and their willingness to share their knowledge in this blog.  It now covers a wide variety of topics ranging from quick programming tips to comprehensive literature reviews pertinent to water resources research and multi-objective optimization.  This post intends to provide guidance to new, and probably to current users by bringing to light what’s available in the blog and by developing a categorization of topics.

This first post will cover:

Software requirements

1.Programming languages and IDEs

2.Frameworks of optimization, sensitivity analysis and decision support

3.The Borg MOEA

4.Parallel computing

Part II)  will focus  on version control, spatial data and maps, conceptual posts and literature reviews.  And finally Part III)  will cover visualization and figure editing, LaTex and literature management, tutorials and miscellaneous research and programming tricks.

*Note to my fellow bloggers: Feel free to disagree and make suggestions on the categorization of your posts, also your thoughts on facilitating an easier navigation through the blog are very welcomed.  For current contributors,  its always suggested to tag and categorize your post, you can also follow the guidelines in Some WordPress Tips to enable a better organization of our blog.  Also, if you see a 💡, it means that a blog post idea has been identified.

Software Requirements

If you are new to the group and would like to know what kind of software you require to get started with research.  Joe summed it up pretty well in his  New Windows install? Here’s how to get everything set up post, where he points out all the installations that you should have on your desktop.  Additionally, you can find some guidance on:  Software to Install on Personal Computers and Software to Install on Personal Computers – For Mac Users!. You may also want to check out  What do you need to learn?  if you are entering the group.  These posts are subject to update 💡 but they are a good starting point.

1. Programming languages and Integrated Development Environments (IDEs)

Dave Hadka’s Programming Language Overview provides a summary of key differences between the C, C++, Python and Java.  The programming tips found in the blog cover Python, C, C++,  R and Matlab, there are also some specific instances were Java is used which will be discussed in section 2.  I’ll give some guidance on how to get started on each of these programming languages and point out some useful resources in the blog.

1.1. Python

Python is  a very popular programming language in our group so there’s sufficient guidance and resources available in the blog.  Download is available here, also some online tutorials that I really recommend to get you up to speed with Python are:  learn python the hard waypython for everybody and codeacademy.   Additionally,  stackoverflow is a useful resource for specific tasks.  The python resources available in our blog are outlined as follows:

Data analysis and organization

Data analysis libraries that you may want to check out are  scipy,   numpy   and pandas. Here are some related posts:

Importing, Exporting and Organizing Time Series Data in Python – Part 1 and  Part 2

Comparing Data Sets: Are Two Data Files the Same?

Using Python IDEs

The use of an integrated development environment (IDE) can enable code development and  make the debugging process easier.  Tom has done a good amount of development in PyCharm, so he has generated a  sequence of posts that provide guidance on how to take better advantage of PyCharm:

A Guide to Using Git in PyCharm – Part 1 , Part 2

Debugging in Python (using PyCharm) – Part 1 , Part 2 and Part 3

PyCharm as a Python IDE for Generating UML Diagrams

Josh also provides instructions to setup PyDev in eclipse in his Setting up Python and Eclipse post,  another Python IDE that you may want to check out is Spyder.

Plotting

The plotting library for python is matplotlib.  Some of the example found in the blog will provide some guidance on importing and using the library.  Matt put together a github repository with several  Matlab and Matplotlib Plotting Examples, you can also find guidance on generating more specialized plots:

Customizing color matrices in matplotlib

Easy vectorized parallel plots for multiple data sets

Interactive plotting basics in matplotlib

Python Data Analysis Part 1a: Borg Runtime Metrics Plots (Preparing the Data) , Part 1b: Setting up Matplotlib and Pandas , Part 2: Pandas / Matplotlib Live Demo.

Miscellaneous  Python tips and tricks

Other applications in Python that my fellow bloggers have found useful are related to machine learning:  Basic Machine Learning in Python with Scikit-learn,  Solving systems of equations: Root finding in MATLAB, R, Python and C++  and using  Python’s template class.

1.2. Matlab

Matlab with its powerful toolkit, easy-to-use IDE and high-level language can be used for quick development as long as you are not concerned about speed.  A major disadvantage of this software is that it is not free … fortunately I have a generous boss paying for it.  Here are examples of Matlab applications available in the blog:

A simple command for plotting autocorrelation functions in Matlab

Plotting Probability Ellipses for Bivariate Normal Distributions

Solving Analytical Algebra/Calculus Expressions with Matlab

Generating .gifs from Matlab Figures

Code Sample: Stacked Bars and Lines in Matlab

1.3. C++

I have heard C++ receive extremely unflattering nicknames lately, it is a low-level language which means that you need to worry about everything, even memory allocation, but the fact is that it is extremely fast and powerful and is widely used in the group for modeling, simulation and optimization purposes which would take forever in other  languages.

Getting started

If you are getting started with C++,there are some online  tutorials , and you may want to check out the following material available in the blog:

Setting up Eclipse for C/C++

Getting started with C and C++

Matt’s Thoughts on C++

Training

Here is some training material that Joe put together:

C++ Training: Libraries , C++ Training: Valgrind ,  C++ Training: Makefiles ,  C++ Training: Exercise 1C++ Training: Using gprofCompiling Code using Makefiles

Debugging

If you are developing code in C++ is probably a good idea to install an IDE,  I recently started using CLion, following Bernardo’s and Dave’s recommendation, and I am not displeased with it.  Here are other posts available within this topic:

Quick testing of code online

Debugging the NWS model: lessons learned

Sample code

If you are looking for sample code of commonly used processes in C++, such as defining vectors and arrays, generating output files and timing functions, here are some examples:

C++: Vectors vs. Arrays

A quick example code to write data to a csv file in C++

Code Sample: Timing Functions for C++

1.4. R

R is another free open source environment widely used for statistics.  Joe recommends a reading in his Programming language R is gaining prominence in the scientific community post.  Downloads are available here.  If you happen to use an R package for you research, here’s some guidance on How to cite packages in R.  R also supports a very nice graphics package and the following posts provide plotting examples:

Survival Function Plots in R

Easy labels for multi-panel plots in R

R plotting examples

Parallel plots in R

1.5. Command line/ Linux:

Getting familiar with the command line and linux environment is essential to perform many of the examples and tutorials available in the blog.  Please check out the   Terminal basics for the truly newbies if you want an introduction to the terminal basics and requirements, also take a look at Using gdb, and notes from the book “Beginning Linux Programming”.   Also check out some useful commands:

Using linux “cut” , Using linux “split” , Using linux “grep”

Useful Linux commands to handle text files and speed up work

Using Linux input redirection in a C++ code test

Emacs in Cygwin

2. Frameworks for optimization, sensitivity analysis, and decision support

We use a variety of free open source libraries to perform commonly used analysis in our research.  Most of the libraries that I outline here were developed by our very own contributors.

2.2. MOEAFramework

I have personally used this framework for most of my research.  It has great functionality and speed. It is an open source Java library that supports several multi-objective evolutionary algorithms (MOEAs) and provides  tools to statistically test their performance.  It has other powerful capabilities for sensitivity and data analysis.   Download and documentation material are available here.  In addition to the documentation and examples provided on the MOEAFramework site, other useful resources and guidance can be found in the following posts:

Setup guidance

MOEAframework on Windows

How to specify constraints in MOEAframework (External Problem)

Additional information on MOEAframework Setup Guide

Extracting data

Extracting Data from Borg Runtime Files

Runtime metrics for MOEAFramework algorithms, extracting metadata from Borg runtime, and handling infinities

Parameter Description Files for the MOEAFramework algorithms and for the Borg MOEA

Other uses

Running Sobol Sensitivity Analysis using MOEAFramework

Speeding up algorithm diagnosis by epsilon-sorting runtime files

2.2. Project Platypus

This is the newest python framework developed by Dave Hadka that support a collection of libraries for optimization, sensitivity analysis, data analysis and decision making.  It’s free to download in the Project Platypus github repository .  The repository comes with its own documentation and examples.  We are barely beginning to document our experiences with this platform 💡, but it is very intuitive and user friendly.  Here is the documentation available in the blog so far:

A simple allocation optimization problem in Platypus

Rhodium – Open Source Python Library for (MO)RDM

Using Rhodium for RDM Analysis of External Dataset

 

 

2.3. OpenMORDM

This is an open source library in R for Many Objective robust decision making (MORDM), for more details and documentation on both MORDM and the library use, check out the following post:

Introducing OpenMORDM

2.4. SALib

SALib is a python library developed by Jon Herman that supports commonly used methods to perform sensitivity analysis.  It is available here, aside from the documentation available in the github repository,  you can also find  guidance on some of the available methods in the following posts:

Method of Morris (Elementary Effects) using SALib

Extensions of SALib for more complex sensitivity analyses

Running Sobol Sensitivity Analysis using SALib

SALib v0.7.1: Group Sampling & Nonuniform Distributions

There’s also an R Package for sentitivity analysis:  Starting out with the R Sensitivity Package.  Since we are on the subject, Jan Kwakkel provides guidance on Scenario discovery in Python as well.

2.5. Pareto sorting function in python (pareto.py)

This is a non-dominated sorting function for multi-objective problems in python available in Matt’s github repository.  You can find more information about it in the following posts:

Announcing version 1.0 of pareto.py

Announcing pareto.py: a free, open-source nondominated sorting utility

3.  Borg MOEA

The Borg Multi-objective Evolutionary Algorithm (MOEA) developed by Dave Hadka and Pat Reed, is widely used in our group due to its ability to tackle complex many-objective problems.   We have plenty of documentation and references in our blog so you can get familiar with it.

3.1. Basic Implementation

You can find a brief introduction and basic use in Basic Borg MOEA use for the truly newbies (Part 1/2) and (Part 2/2).  If you want to link your own simulation model to the optimization algorithm, you may want to check: Compiling, running, and linking a simulation model to Borg: LRGV Example.  Here are other Borg-related posts in the blog:

Basic implementation of the parallel Borg MOEA

Simple Python script to create a command to call the Borg executable

Compiling shared libraries on Windows (32 bit and 64 bit systems)

Collecting Borg’s operator dynamics

3.2. Borg MOEA Wrappers

There are Borg MOEA wrappers available for a number of languages.   Currently the Python, Matlab and Perl wrappers are  documented in the blog.  I believe an updated version of the Borg Matlab wrapper for OSX documentation is required at the moment 💡.

Using Borg in Parallel and Serial with a Python Wrapper – Part 1

Using Borg in Parallel and Serial with a Python Wrapper – Part 2

Setting Borg parameters from the Matlab wrapper

Compiling the Borg Matlab Wrapper (Windows)

Compiling the Borg Matlab Wrapper (OSX/Linux)

Code Sample: Perl wrapper to run Borg with the Variable Infiltration Capacity (VIC) model

4. High performance computing (HPC)

With HPC we can handle and analyse massive amounts of data at high speed. Tasks that would normally take months can be done in days or even minutes and it can help us tackle very complex problems.  In addition, here are some Thoughts on using models with a long evaluation time within a Parallel MOEA framework from Joe.

In the group we have a healthy availability of HPC resources; however, there are some logistics involved when working with computing clusters.  Luckily, most of our contributors have experience using HPC and have documented it in the blog.   Also, I am currently using the  MobaXterm interface to facilitate file transfer between my local and remote directories, it also enables to easily navigate and edit files in your remote directory.  It is used by our collaborators in Politecnico di Milano who recommended it to Julie who then recommended it to me.   Moving on, here are some practical resources when working with remote clusters:

4.1. Getting started with clusters and key commands

Python for automating cluster tasks: Part 1, Getting started and Part 2, More advanced commands

The Cluster and Basic UNIX Commands

Using a local copy of Boost on your cluster account

4.2. Submission scripts in  Bash

Some ideas for your Bash submission scripts

Key bindings for Bash history-search

4.3. Making bash sessions more enjoyable

Speed up your Bash sessions with “alias”

Get more screens … with screen

Running tmux on the cluster

Making ssh better

4.4. Portable Batch System (PBS)

Job dependency in PBS submission

PBS Job Submission with Python

PBS job chaining

Common PBS Batch Options

4.5. Python parallelization and speedup

Introduction to mpi4py

Re-evaluating solutions using Python subprocesses

Speed up your Python code with basic shared memory parallelization

Connecting to an iPython HTML Notebook on the Cluster Using an SSH Tunnel

NumPy vectorized correlation coefficient

4.6. Debugging

Debugging: Interactive techniques and print statements

Debugging MPI By Dave Hadka

4.7. File transfer

Globus Connect for Transferring Files Between Clusters and Your Computer