Installing EPA SWMM on a Mac using WineBottler

A while ago, I posted a video tutorial on how to use WineBottler to install some engineering software on a Mac that often only works on Windows. I just realized I never posted it here, so the video is embedded below! YouTube also has a more general tutorial on WineBottler that is linked here.

We have had some success using this for HEC programs and EPA programs, but sometimes there are issues that preclude this solution working for all programs. But it is nice when it does work!

Python example: Hardy Cross method for pipe networks

I teach a class called Water Resources Engineering at the University of Colorado Boulder, and I have often added in examples where students use Python to perform the calculations.  For those of you who are new to programming, you might get a kick out of this example since it goes through an entire procedure of how to solve an engineering problem using Python, and the numpy library.  The example is two YouTube videos embedded below! The best way to watch them, though, is in full screen, which you can get to by clicking the name of the video, opening up a new YouTube tab in your browser.

Let us know if you have any questions in the comments section below. For more on Python from this blog, search “Python” in the search box.

Using a virtual machine to run 32-bit software on a modern PC

In this post, I’ll talk about how to set up a virtual machine on a PC, in order to run outdated software that may have been optimized for a different version of Windows. For example, a collaborator of mine uses the EPA Water Treatment Plant model which only seems to work under 32-bit versions of the operating system.

Continue reading

Simple Python script to create a command to call the Borg executable

Please see my GitHub for a new Python program that may be of interest. If the link becomes broken in the future, look up the ‘misc_scripts’ repository within ‘jrkasprzyk’ GitHub.

It saves you having to type out a long Borg command when using the ‘command line interface’, especially when there are many many decision variables. The script also contains a simple set of Python objects of interest: Objective contains the name and epsilon of an objective, and Decision contains the name, lower bound, and upper bound of an objective.

You can imagine that this would be able to facilitate complicated MOEA analyses such as random seed analysis, and the running of multiple problems. I have also used this to organize my thoughts when doing other analyses in Python such as sampling of the decision space, and different sorts of processing of MOEA output.

On constraints within MOEAs

When thinking about setting up a problem formulation for an MOEA like Borg, it’s helpful to spend a minute (or 15!) talking about our friend the constraint.  I’ve written about problem formulations in the past, and we even have a video about problem formulations, but there’s a subtlety that I though it would be helpful to discuss again here!

A comment about constraints in classical mathematical programming formulations

Classical mathematical programming formulations are chock-full of constraints.  Hundreds of constraints, even!  This is due to the fact that when you are setting up a linear program (LP) or something similar, the set of constraints allows the program to have some amount of realism.  The classical linear programming water allocation problem (for example, something you would find in the classic Revelle et al textbook)  is something like:

Decisions: water allocations to different actors in different periods of time

Objective: minimize cost of the allocation plan, or maximize the net benefits of the plan


  1. Ensure that demands are met

  2. Ensure that, at each supply node, inflow equals outflow

  3. Ensure that the capacity of the conveyance system is not exceeded

  4. Ensure that all decision variables are greater than or equal to zero

  5. Keep the water quality values above or below some threshold

We have proposed 5 categories of constraints, and there may be even more.  But mathematically, this simple formulation above could have hundreds of constraints, if you have to solve each equation for a particular point in time, or at a particular spatial location.

This is perfectly fine!  If you are solving a LP, I encourage you to put more constraints in there because constraints are really the only way that you can make your solutions look more physically accurate.  LP theory will tell you that, if there is one or more optimal solutions, at least one of them will be at the intersection of two or more of the constraints.  So, the constraints are important (more on this later).

How the simulation-optimization of MOEAs is different

The MOEA approach we talk about on this blog is a simulation-optimization approach.  What that means, is that instead of having a set of a few hundred constraints to model your problem, you are actually using a simulation model of the system to model it.  This is going to fundamentally change the way that your decisions and constraints are defined, and it may even change your objectives.  Let’s talk about each in turn, using our simple water allocation example as, well, as an example.

Decisions: It would probably make more sense to optimize rules about how the water is allocated, instead of actually treating each decision variable as a separate volume of water over time.  There are a number of reasons why this is.  In the formulation above, if each decision variable is an allocation of water in a particular period of time, once you start to plan over long planning horizons, the problem will get unwieldy, with thousands of variables.  Plus, if you say that decision variable x_83 is the allocation of water to user 5, at t = 3, then how are you to plan what happens if you don’t have accurate or deterministic data for what happens at t = 3?  But that’s really the purpose of another post.  We’re here to talk about:

Constraints: So, in the above formulation, constraints were designed to create physical reality to your system.  For example, if the decision variables are volumes of water, and you have a basic estimation of the concentration of some pollutant, you can create a constraint that says the number of kilograms of a pollutant should be less than a particular value.  In that fashion, you can think of hundreds of constraints, and we just talked about this, right?

In the simulation-optimization approach, though the simulation model will provide physical reality to the system, not the constraint set.  So it is better to treat constraints as limits on acceptable performance.  For example if you are calculating reliability, you can say that all solutions should have at least some lower threshold of reliability (say, 98%, for example).  Or, don’t have constraints at all and instead just handle the value as an objective!  The point of doing tradeoff analysis with a MOEA is to find diversity of solutions, after all.  If you find that the performance of the values is too poor, or that you have too many solutions, you can impose constraints next time.  Or, you can also change the epsilon resolution within Borg in order to change the number of points in the tradeoff analysis; see this paper by Kollat and Reed where they showed the same tradeoff set with different resolutions, for example.

Constraints can also help with the representation of decisions within your formulation.  After all, the most basic constraint in any LP is that xi >= 0, right?  But think about creative ways to change the representation of decision variables such that all of the algorithm’s solutions are feasible.  This will greatly help the search process!

For example, consider a problem where you have to create drought restriction plans at different levels.  In other words, imagine that the value of restriction at level 2 needed to be greater than the value at level 1.  Mathematically, you could set this up as a constraint (x2 >= x1), but it will be very difficult to meet this, especially with an initially random population within a MOEA.  Especially when you have, say, 5 or 10 or 100 of these levels that each need to be greater as you go up. So, instead, you can do something like this.  Let xi’ be the decision variable in the algorithm, and xi (without a prime) be the actual decision variable used in your model.  Then, let:

x1 = x1′

x2 = x1 + x2′

The algorithm actually searches on x1′ and x2′.  Now, if you set your bounds up correctly, the algorithm can never generate an infeasible solution, and you are spending all your computational resources within search actually finding good, feasible solutions instead of just trying to find a single set of feasible solutions with which to work.  Something similar to this was used in Zeff et al. (2014).

Some comments on how constraint handling works within Borg

The concepts of epsilon non-domination, and non-domination, are used to determine whether solutions survive within the search process.  The general concept of non-domination is, informally, that a solution’s objective function values are better in one objective and at least as good as another solution in all other objectives.  But that’s not what we’re here to talk about!  We are here to talk about constraints, remember? 🙂

Well remember that the definition of a feasible and infeasible solution is the same in LP-land versus for us.  Feasible solutions meet all constraints, infeasible solutions violate one or more constraints. An infeasible solution is defined to not be acceptable to the decision maker.  This is an extremely important point to remember, so it is worth repeating!  An infeasible solution is unacceptable. Because of this, it’s up to you to set up the constraints to keep acceptable solutions, and throw away unacceptable (infeasible) solutions.

However, there is an important distinction between, say, an LP and what we do.  Remember the simplex algorithm for solving LPs?  What it does is intelligently finds a corner of the feasible space (defined by your constraint set), and then it basically walks through each of the corners, trying each one of them to determine if the corner is optimal.  Notice what’s going on here, though.  Most of the time, at least for a simple problem, it is easy to find a feasible solution in the simplex method, and then you are off to the races.  If you can’t find a feasible solution, you’re in a lot of trouble!

Similarly, you also need to find one feasible solution to get started with MOEA search…preferably you’ll have many many feasible solutions!  But what if it is difficult to find a feasible solution? This is going to impact your ability to find solutions, and if your function evaluation count, or search duration, is not long enough you can have a failed search.

So let’s get to the point of this section, which is explaining how constraint handling works within Borg.  As explained in Reed et al., most of the algorithms we’ve typically tested within this blogosphere use restricted tournament selection. There are three conditions where a solution can dominate b:

  1. Both solutions are feasible, and based on objective function performance, a dominates b

  2. The solution is feasible, and is infeasible.

  3. Both and are infeasible, and the sum of a’s constraint violations are less than the sum of b‘s constraint violations

So as you can see, item 3 is very interesting!  What item 3 means, is that Borg or these other algorithms will actually keep some infeasible solutions in the population or archive while all solutions are infeasible.  This is important because it helps move the population toward eventually finding a feasible region of the decision space!  But, according to item 2, any solution that is feasible will beat an infeasible solution, which is consistent with the fact that infeasible solutions are unacceptable, and oh gosh I already said that a few times didn’t I!

By the way, you can also use penalty functions to handle constraints, instead of using this built in restricted tournament selection feature.  An example of that is in the Kollat and Reed paper that I already linked to.  Plus there are other ways to do this too!  By no means is this an exhaustive discussion, but I hope what we’ve talked about is helpful.  Anyway:

Some informal recommendations on how to use constraints within Borg if you choose to use them

  1. If you don’t have to use constraints, don’t use them!  Optimize without them and see how you do (see above).

  2. Make the constraint violations that you’re reporting to Borg meaningful.  Notice item 3 in the restricted tournament selection actually does use the values that you report from constraint handling.  I’ve seen some folks just say: if infeasible, c = 100000000.  But that won’t actually help the restricted tournament mechanism move the set toward meaningful, feasible solutions.  For example let’s say that r gives you a value of reliability for a system, and reliability must be above capital R.  You could set your constraint violation to: c = R – r, which will give you the difference between your performance and what you are required to have.  If all of your solutions are infeasible initially, the algorithm will favor a solution that has, say, 1% reliability lower than R over a solution that has 20% reliability lower than R.

  3. If you have an expensive simulation, skip the objective function calculations if the solution is infeasible.  The objective function values for an infeasible solution are not really used for anything, in restricted tournament selection.  So you can report the constraint violation, but don’t spend time calculating objective functions that will never be used.

Two quick (important) caveats here:  One, remember that infeasible solutions are unacceptable.  And the reason why I keep bringing this up is the fact that you can technically shoot yourself in the foot here, and impose constraints that are too strict, where a real decision maker may actually want solutions that violate your imposed constraint.  So be careful when you define them!  Two, if you do skip the objective function calculation be sure to still report a value of your objectives to Borg, even if that value is just a placeholder.  This is important, to make sure the Borg procedure keeps running.  If you have questions about these caveats please leave them in the comment box below.

To conclude

This has been quite a long post so thanks for sticking with me til the end.  Hopefully you’ve learned something, I know I did!  We can continue the discussion on this blog (for the contributors) and on the comment boxes below (for everyone)!  Have a great day.

Programming language R is gaining prominence in the scientific community

Recently, R Balaji mentioned this piece in Nature, that talked about how many scientists are using the programming language R as an alternative to expensive, proprietary analysis packages.  It’s worth a read!

Also, we have covered R on this blog, with the posts below.

Parallel Plots in R

Citing Packages in R

R code examples on Jon Herman’s GitHub repository

As a quick reminder, the best way to navigate this site is to use the search box to the right of the text, as well as the categories and keywords listed below the box.  And if there’s something that we don’t cover and you would like to write about it, join us as a collaborator!  Email joseph.kasprzyk “at” for more information.

Thanks for a great 2014! Our year in review

The stats helper monkeys prepared a 2014 annual report for this blog.  We’d like to share it below!

Also, on behalf of all my collaborators I’d like to personally thank you for coming to this blog and contributing.  It wouldn’t be anything without you, the community!  I hope you had a great holiday season and I wish you a healthy, happy, and productive 2015.

Here’s an excerpt:

The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 33,000 times in 2014. If it were a concert at Sydney Opera House, it would take about 12 sold-out performances for that many people to see it.

Click here to see the complete report.

Using a local copy of Boost on your cluster account

Boost is a set of libraries for C++.  It increases the language’s functionality, allowing you to do all sorts of interesting things (for example it has lots of random number generators).  Boost may already be installed on your local research computing cluster.  But there are several reasons why it may be a good idea to have your own copy of Boost to use within your user account:

  • It may be difficult or impossible to actually find the location of your computer’s Boost libraries.
  • Boost functions are introduced with newer and newer versions of the software.  So what if you want to use a function that came out in a later version (i.e., 1.5.6) that is not in the version installed on your computer?
  • Perhaps you want to be able to see the source code of the Boost functions within your own account, to better understand how they work.

If so, it’s easy enough to download Boost to your local computer, then upload the files to your user account.  Click “current release” on the main Boost website (see the link above).  Then download the files to your computer.  If you’re on a Windows machine, use a program like 7-zip to unpack all the files (or simply keep the tgz file and unpack them on the cluster, that’s probably faster anyway).  Then, upload the Boost files to the cluster.  I recommend placing the boost_1_56_0 folder inside the /lib/ folder on your home directory, that way all your libraries can be in one place.

Here’s the important part: any time you use Boost you need to point to where the libraries are stored.  Because of that, you’ll need to know the path of Boost, that is, where the files “live” on your computer.  There is probably a command in your makefile already that starts with -I.  All you have to do is add your new Boost path to the command, on my system it looks something like:


That’s it!  Comments questions and concerns shall be given below.

How to set up MinGW for the first time

I was quite surprised that no one on the internet has written a simple list of what to do to get a code compiled with MinGW before.  Seems like you have to have 3 PhDs in computer science to read the directions (or maybe I’m just having a bad day).  Nonetheless, we’re here to help.

By the way, you may be asking “What is MinGW?” It is a port of the GNU compilers on Windows (the GNU compilers being the ones that are used on things like Linux and the computing clusters).  Its website is here.  “Wait a sec, why would we want to use MinGW if we already have Cygwin?  Doesn’t it do the same thing?”  Yes.  But, sometimes things won’t work on Cygwin.   Also, I think MinGW is a lot smaller and lighter weight than Cygwin so if you are limited in resources that’s another reason.

Anyway, here goes:

  1. Go to the MinGW website:

  2. Click Downloads, and select ‘mingw-get-setup.exe.’

  3. This is an installer that will grab all the proper packages for you.  Make sure to keep the installation directory the same as default: ‘C:\MinGW\’.  Once you click through the set of initial prompts, you will see an install manager.  That install manager has a set of checkboxes, that ask you if you want gfortran, g++, etc.  Right click on the check boxes and select ‘Mark for Installation.’

  4. In the program, under the Installation menu, select ‘Apply Changes’

  5. Wait for everything to download and install, and follow the instructions.  After you are done, it won’t tell you anything, you should just trust that it has completed.

  6. Note that there is a utility called MINSYS that is installed with MinGW.  What that is, is kind of like ‘Cygwin Lite’.  It is a small set of linux commands and a shell, that you can use to compile programs.  We aren’t going to cover MINSYS here but, this is a set of instructions for how to use it.  But what we want to do, is compile programs in the native DOS command window of Windows.  So we need to do a few more things.

  7. Next we need to set the PATH of windows to know where MinGW is.  Why?  Well you want to open a command window in any directory in Windows, and type ‘g++’ and have the computer know what it is you’re talking about. For Windows XP – Windows 7 users, check out instructions here.  We are using Windows 8, so we can use the search functionality.  If you type ‘environment’ in the Windows search box, you’ll see a link for ‘Set System Environment Variables’. Click it.

  8. It brings you to a window that is titled ‘System Properties’.  Click the button that says ‘Environment variables…’  In that window, in the second box, you can scroll down and see a variable called Path.  It should start with ‘C:\windows\system32;C:\windows;’  Highlight Path and click Edit…

  9. At the very end of the list, type a ; (without a space) and ‘C:\MinGW\bin’  So your path should look like: ‘C:\windows\system32;C:\windows;[otherstuff];C:\MinGW\bin

  10. Click OK three times, to get out of all of the windows.

  11. This step is very important!  Restart your computer, otherwise the Path settings won’t take.

  12. Now, after you are restarted, get to the command line by typing ‘cmd’ in a search box.  A black, old-timey window will pop up.  If you type the command: ‘g++’ , it should tell you “g++: fatal error: no input files”  This means g++ is installed, and it is on the path correctly, and it (ostensibly) works.

  13. You will likely need an additional command instead of just the vanilla ‘g++’ and ‘gfortran’  If you are using a Makefile, you need an additional utility.  On Linux this is called ‘make’, but here it’s called something different: mingw32-make.  Thanks to this forum post for this (and other helpful) hint(s)!

You should be good to go.  Now, you have a minimalistic way to compile stuff that is supposed to work using GNU compilers (on Unix and the clusters) on Windows!  As usual, comments questions and concerns should be posted below.