# Integrating a MOEA with a sample water management problem

On this blog, we’ve covered some basics of getting started using MOEAs — you can search the “getting started” tag below for some examples, such as the problem formulation post, basic concepts and reading, etc.  But I thought it might be helpful to run through a sort-of hypothetical example, where you would be trying to optimize a portfolio of water infrastructure.  With that, let’s get started!

The first step is defining your problem formulation.  This type of formulation looks a little different than a traditional formulation for optimization of a water system.  What I mean is, if you are using an optimization model of, say, a reservoir, you may want to maximize a single benefit function of the reservoir, and your decision variables will actually be volumes of water that are allocated in the system.  The constraints will ensure continuity in the system (no water is created or destroyed), and do other housekeeping such as making sure that all the decision variables are non-negative.  While this is a perfectly fine way to set things up, there are some limitations there.  In the traditional system, your constraints are the only way to model the actual physics of the system, so you’re going to make some assumptions about how the system works.  Also, that single benefit function can’t always capture all of the important functionality of the system.  What if you want to maximize some cubic function for environmental flow, but your other benefits are calculated in another manner?

It seems I’ve gotten off into a tangent here.  The point is that our so-called many objective approach has a different way of defining those terms in the problem formulation.  It’s a simulation-optimization approach.  And what that means is that the simulation model is responsible for taking care of the system physics.  So your constraints aren’t needed to ensure continuity, that’s inside the model.  What you need to be concerned with is three major categories, explained below.  I’ll talk about each one of them means, and I’ll set up a hypothetical problem as an illustration

The problem in a nutshell: Imagine you have a system with three major locations A, B, and C.  You are trying to propose whether or not to build a reservoir at B.  Once you have the reservoir at B, you could use several different types of release curves at the reservoir.  You have some water transfers and conveyances in the system.  One of the transfer schemes is set, but there is another one, between B and C, that you have to design.  Each of the transfers uses a generic threshold to determine how the water should be conveyed from one location to the other.

## Decision Variables

These are “levers” that the decision maker/designer/stakeholder can change in the system.  Some phrases to think about: “whether or not something should be designed”, “how big should it be”, “what rule should we use to operate it?”, “what design parameter should we use?” “what material?” “what regulation should we set?”  You can be very creative in setting this up!  The big rule to remember here, though, is that you need to be able to code this into something that the MOEA can automatically change.  Decision variables to a MOEA are typically values that can be modified, which have a uniform distribution between an upper and a lower range.  When I say distribution, I don’t necessarily mean a statistical distribution — I just mean that it is plausible that the values range between the upper and lower bound.  Perhaps the optimal values are all toward the high end of the range!  Who knows.  The point is you just need to set an upper and a lower bound.  This will become clearer when we actually look at the variables for this problem:

Variable — Lower Bound — Upper Bound — Description/Notes
Res — 0 — 1 — Boolean variable.  1 if the reservoir is built, 0 otherwise.
Rule — 1 — 3 — Discrete variable, which uses a lookup table.  You have 3 release curves, this variable shows which one you’re using.
Cap — 0 — 1000 — Real variable, the capacity of the reservoir.
Transf — 0 — 1 — Boolean variable. 1 if the transfer scheme is built, 0 otherwise.
Thresh — 0.0 — 3.0 — Real variable, the threshold used for the transfer scheme.

Be aware that some of these variables are technically dependent on one another.  That is, if Res = 0, Rule and Cap don’t matter.  If Transf = 0, Thresh doesn’t matter.  Amazingly, the MOEA can operate under these conditions!

## Objectives

Multiple, quantitative metrics of system performance.  This isn’t much different than the classical definition of an objective function.  Just note that your traditional problem usually has the objective function as a direct function of the decision variables (which classically are volumes of water that are being allocated or routed).  Here, the objective function is sort of indirectly a function of the decision variables.  You’ll see this when we define them below:

Objective — Description
Cost — Fixed and operating cost of the reservoirs and transfers.  Here, the fixed cost is obviously directly correlated to the Res decision variable.  But, the operating costs are a function of a number of things — the input data in the simulation, how conservative the release curve is (i.e., the Rule variable), the transfer rules (the Thresh variable).  Cost is minimized.
Reliability — The likelihood of meeting performance targets.  There’s lots of ways to define reliability (maybe I can cover them in a separate post).  I like to think of it as “storage reliability” (meeting storage targets) and “demand reliability” (meeting your demands).  In general this is a percentage or ratio, such as 92% or 95%, and it is maximized.  There’s a conflict here with cost — if you spend a lot and build the big reservoir, you’re going to improve reliability.  But minimizing cost will conflict with this.  Or, there could be other strategies that don’t cost a lot but can still maintain performance.
Environmental Flows — Some function of flows that indicates how well your reservoir releases meet the needs of the environment.  It would be easy to operate a reservoir to maximize water supply and flood control.  Simply release when you need flood storage, retain water when you need storage for supply.  But fish are used to variety in the flows, so meeting that variety while also meeting your other uses could be challenging.

## Constraints

Acceptable limits on performance, or preventing “infeasible” system designs.  This is a little different than the classical formulation, since the constraints are not needed to ensure the system runs properly.  But, similar to the classical formulation there is the idea of “feasibility”.  I like to think of it as setting limits on what a plausible solution is.  If your regulatory requirements say you have to meet 90% reliability, this can be a constraint.  Solutions with lower than 90% reliability will be infeasible.  Similarly, if it’s physically impossible to have a high transfer threshold if the reservoir is in place, that could be considered a constraint as well.  Unlike classical optimization, I’d venture to say constraints are not necessary for a proper many objective problem formulation.  But they can help ensure your tradeoff solutions have high performance and are meaningful.

## The Actual Solution Process

About 1000 words later we get to the “good stuff”!  It’s broken up into a few parts.  First, set up the information the algorithm needs to know.  Then, a brief overview on what the algorithm is doing.  Next, I talk about what happens to the decision variables at each iteration.  Finally, I discuss what information is needed within the simulation model.

### Setting Up the Algorithm

The algorithm needs the following information about the decision variables, set up in some type of table such as shown below.  You’ll notice we’ve covered some of this already:

Variable — Lower Bound — Upper Bound — Tag
Res — 0 — 1 — {Res}
Cap — 0 — 1000 — {Cap}
Rule — 1 — 3 — {Rule}
Transf — 0 — 1 — {Transf}
Thresh — 0.0 — 3.0 — {Thresh}

Here we need some tags.  Tags tell the algorithm where to replace the information already in the simulation model, with the specific value for a given solution.  More on that in a minute.

Regarding objectives, the algorithm needs to know how many you have, and what their names are.  You’ll also need “epsilon precision” for a lot of MOEAs, but that will be the topic of another post.  Basically that’s a significant precision for each objective — if your cost is in millions of dollars, for example, you might not care about cost differences of \$0.01 so you set the precision accordingly.

### So what is the algorithm doing?

The algorithm’s goal is to ‘create’ new values of the decision variables, and manipulate them to find better and better values of your objective functions.  When I say ‘create’, I mean two things.  At first, the algorithm will start with random generation of the decision variables.  After the second generation, an iterative process of selection of good solutions and variation on those solutions (like mutation) is used to create values that have better and better objective performance.  Lower costs, higher reliabilities — all while finding the tradeoff between the objectives.  Tradeoff here means, what is the best reliability I can get at each level of cost?  The best environmental flow at every level of reliability and cost?

### Decision Variables Inside the Algorithm

Let xi represent the ith decision variable.  So x1 is the Res variable, x2 is cap, and so forth.  MOEAs use a population based approach.  What that means is there are multiple solutions that are generated at each step.  So the algorithm is responsible for the ‘creation’ of new values.  (Like I said, ‘creation’ at first is completely random.  Later, as the search continues, it consists of selection and variation of good values.)

Create a population of solutions:
Solution — Res, Cap, Rule, Transf, Thresh
Solution 1 — 1, 1000, 2, 1, 1.5
Solution 2 — 0, 500, 2, 1, 2.0
Solution 3 — 1, 500, 1, 0, 3.0

Solution 1 builds the reservoir at capacity 1000, operating the reservoir with rule set 2.  It also selects the transfer scheme, running it with a threshold of 1.5.

Solution 2 does not build the reservoir.  Recall that its values for the Cap and Rule variables are meaningless.  It does select the transfer scheme, though, and runs it with a threshold of 2.0

Solution 3 builds the reservoir with capacity 500 and rule set 1.  It does not select the transfer.

Notice the only human input here is the ranges for the decision variables!  A human being did not create solutions 1 2 and 3.  However, later in the process a human will decide between solutions 1 and 3 in the final tradeoff analysis.  But that, again, is the subject of another post.

So now the algorithm is done with its business for the time being, and it needs to:

Send the decision variable values to the simulation model

It then waits around for the simulation model to calculate objective functions.  Then, the algorithm:

Receives objective function values from the simulation model

and then,

Repeat for all solutions in the population

### Setup within the Simulation Model

If you have a lot of control over the simulation model, you can write a function that receives decision variable values, puts them where they need to go, and then spits out objective function values seamlessly.  Unfortunately this isn’t always possible.  So I’ll explain a slightly more complicated version of how to do this that may be helpful for your application.

Simulation Setup File. Imagine you have a file that looks something like this:

==Reservoirs==
Name — Capacity — Release Rule — Height
Res A — 2000 — Rule A — 20
{Name} — {Cap} — {Rule} — 30
Res C — 1000 — Rule B — 40

==Transfers==
Name — On/Off — Threshold
A to B — On — 1.2
B to C — {Transf} — {Thresh}

The setup file specifies a generic solution for your simulation.  Reservoir A exists, with a capacity of 2000 ran with release rule set A, and a height of 20.  Reservoir B is the design!  Notice that its fields have tags in them — but no matter what Reservoir B will be at height 30.  Reservoir C also exists.  You can see the same thing is going on with the transfers.  The simulation model needs to do the following procedure:

0. When the model is started up, input data should be run in, memory allocated, and everything set up and ready to go.  These routines will ideally only be called once.
1. When decision variables are received from the algorithm, a function should be written which goes in and replaces all the tags with the appropriate values.  A tag like {Cap} is easy because there’s a 1 to 1 relationship between the decision variable value and the actual value used by the simulation.  A tag like {Name} is tougher though.  You have to write a function that says, if the value of Res is 1, then {Name} is replaced with something like “Reservoir B”.  If Res is 0, replace {Name} with “Off” or “N/A” or something that signifies to the simulation model that the reservoir does not exist.  Alternatively you can also set the capacity of Reservoir B equal to zero, and pass all the water through.  Whatever works for your application.
2. Now, run this set of infrastructure in the model.  Get some results!
3. Translate the results into the objectives which are output to the model.  There are varying levels of sophistication here.  One way of doing it is setting up a system of tags just like for the decision variables.
4. Write the values of the objectives back on your output stream, and send them back to the algorithm.
5. Make sure your simulation model is also set up to repeat this process for all solutions in the population, indeed for all solutions in the search procedure as it moves forward.

## Conclusion

Hopefully this has been a nice introduction to the “nuts and bolts” of how the whole process works.  Again, what you’re really doing is automatically generating a set of designs that balance conflicting objectives that you care about in your system.  The result is different infrastructure portfolios that you can visualize and analyze, ultimately leading to better or more sustainable water planning!