In this post, I’ll talk about how to set up a virtual machine on a PC, in order to run outdated software that may have been optimized for a different version of Windows. For example, a collaborator of mine uses the EPA Water Treatment Plant model which only seems to work under 32-bit versions of the operating system.
Please see my GitHub for a new Python program that may be of interest. If the link becomes broken in the future, look up the ‘misc_scripts’ repository within ‘jrkasprzyk’ GitHub.
It saves you having to type out a long Borg command when using the ‘command line interface’, especially when there are many many decision variables. The script also contains a simple set of Python objects of interest: Objective contains the name and epsilon of an objective, and Decision contains the name, lower bound, and upper bound of an objective.
You can imagine that this would be able to facilitate complicated MOEA analyses such as random seed analysis, and the running of multiple problems. I have also used this to organize my thoughts when doing other analyses in Python such as sampling of the decision space, and different sorts of processing of MOEA output.
When thinking about setting up a problem formulation for an MOEA like Borg, it’s helpful to spend a minute (or 15!) talking about our friend the constraint. I’ve written about problem formulations in the past, and we even have a video about problem formulations, but there’s a subtlety that I though it would be helpful to discuss again here!
A comment about constraints in classical mathematical programming formulations
Classical mathematical programming formulations are chock-full of constraints. Hundreds of constraints, even! This is due to the fact that when you are setting up a linear program (LP) or something similar, the set of constraints allows the program to have some amount of realism. The classical linear programming water allocation problem (for example, something you would find in the classic Revelle et al textbook) is something like:
Decisions: water allocations to different actors in different periods of time
Objective: minimize cost of the allocation plan, or maximize the net benefits of the plan
- Ensure that demands are met
Ensure that, at each supply node, inflow equals outflow
Ensure that the capacity of the conveyance system is not exceeded
Ensure that all decision variables are greater than or equal to zero
Keep the water quality values above or below some threshold
We have proposed 5 categories of constraints, and there may be even more. But mathematically, this simple formulation above could have hundreds of constraints, if you have to solve each equation for a particular point in time, or at a particular spatial location.
This is perfectly fine! If you are solving a LP, I encourage you to put more constraints in there because constraints are really the only way that you can make your solutions look more physically accurate. LP theory will tell you that, if there is one or more optimal solutions, at least one of them will be at the intersection of two or more of the constraints. So, the constraints are important (more on this later).
How the simulation-optimization of MOEAs is different
The MOEA approach we talk about on this blog is a simulation-optimization approach. What that means, is that instead of having a set of a few hundred constraints to model your problem, you are actually using a simulation model of the system to model it. This is going to fundamentally change the way that your decisions and constraints are defined, and it may even change your objectives. Let’s talk about each in turn, using our simple water allocation example as, well, as an example.
Decisions: It would probably make more sense to optimize rules about how the water is allocated, instead of actually treating each decision variable as a separate volume of water over time. There are a number of reasons why this is. In the formulation above, if each decision variable is an allocation of water in a particular period of time, once you start to plan over long planning horizons, the problem will get unwieldy, with thousands of variables. Plus, if you say that decision variable x_83 is the allocation of water to user 5, at t = 3, then how are you to plan what happens if you don’t have accurate or deterministic data for what happens at t = 3? But that’s really the purpose of another post. We’re here to talk about:
Constraints: So, in the above formulation, constraints were designed to create physical reality to your system. For example, if the decision variables are volumes of water, and you have a basic estimation of the concentration of some pollutant, you can create a constraint that says the number of kilograms of a pollutant should be less than a particular value. In that fashion, you can think of hundreds of constraints, and we just talked about this, right?
In the simulation-optimization approach, though the simulation model will provide physical reality to the system, not the constraint set. So it is better to treat constraints as limits on acceptable performance. For example if you are calculating reliability, you can say that all solutions should have at least some lower threshold of reliability (say, 98%, for example). Or, don’t have constraints at all and instead just handle the value as an objective! The point of doing tradeoff analysis with a MOEA is to find diversity of solutions, after all. If you find that the performance of the values is too poor, or that you have too many solutions, you can impose constraints next time. Or, you can also change the epsilon resolution within Borg in order to change the number of points in the tradeoff analysis; see this paper by Kollat and Reed where they showed the same tradeoff set with different resolutions, for example.
Constraints can also help with the representation of decisions within your formulation. After all, the most basic constraint in any LP is that xi >= 0, right? But think about creative ways to change the representation of decision variables such that all of the algorithm’s solutions are feasible. This will greatly help the search process!
For example, consider a problem where you have to create drought restriction plans at different levels. In other words, imagine that the value of restriction at level 2 needed to be greater than the value at level 1. Mathematically, you could set this up as a constraint (x2 >= x1), but it will be very difficult to meet this, especially with an initially random population within a MOEA. Especially when you have, say, 5 or 10 or 100 of these levels that each need to be greater as you go up. So, instead, you can do something like this. Let xi’ be the decision variable in the algorithm, and xi (without a prime) be the actual decision variable used in your model. Then, let:
x1 = x1′
x2 = x1 + x2′
The algorithm actually searches on x1′ and x2′. Now, if you set your bounds up correctly, the algorithm can never generate an infeasible solution, and you are spending all your computational resources within search actually finding good, feasible solutions instead of just trying to find a single set of feasible solutions with which to work. Something similar to this was used in Zeff et al. (2014).
Some comments on how constraint handling works within Borg
The concepts of epsilon non-domination, and non-domination, are used to determine whether solutions survive within the search process. The general concept of non-domination is, informally, that a solution’s objective function values are better in one objective and at least as good as another solution in all other objectives. But that’s not what we’re here to talk about! We are here to talk about constraints, remember? 🙂
Well remember that the definition of a feasible and infeasible solution is the same in LP-land versus for us. Feasible solutions meet all constraints, infeasible solutions violate one or more constraints. An infeasible solution is defined to not be acceptable to the decision maker. This is an extremely important point to remember, so it is worth repeating! An infeasible solution is unacceptable. Because of this, it’s up to you to set up the constraints to keep acceptable solutions, and throw away unacceptable (infeasible) solutions.
However, there is an important distinction between, say, an LP and what we do. Remember the simplex algorithm for solving LPs? What it does is intelligently finds a corner of the feasible space (defined by your constraint set), and then it basically walks through each of the corners, trying each one of them to determine if the corner is optimal. Notice what’s going on here, though. Most of the time, at least for a simple problem, it is easy to find a feasible solution in the simplex method, and then you are off to the races. If you can’t find a feasible solution, you’re in a lot of trouble!
Similarly, you also need to find one feasible solution to get started with MOEA search…preferably you’ll have many many feasible solutions! But what if it is difficult to find a feasible solution? This is going to impact your ability to find solutions, and if your function evaluation count, or search duration, is not long enough you can have a failed search.
So let’s get to the point of this section, which is explaining how constraint handling works within Borg. As explained in Reed et al., most of the algorithms we’ve typically tested within this blogosphere use restricted tournament selection. There are three conditions where a solution a can dominate b:
- Both solutions are feasible, and based on objective function performance, a dominates b
The solution a is feasible, and b is infeasible.
Both a and b are infeasible, and the sum of a’s constraint violations are less than the sum of b‘s constraint violations
So as you can see, item 3 is very interesting! What item 3 means, is that Borg or these other algorithms will actually keep some infeasible solutions in the population or archive while all solutions are infeasible. This is important because it helps move the population toward eventually finding a feasible region of the decision space! But, according to item 2, any solution that is feasible will beat an infeasible solution, which is consistent with the fact that infeasible solutions are unacceptable, and oh gosh I already said that a few times didn’t I!
By the way, you can also use penalty functions to handle constraints, instead of using this built in restricted tournament selection feature. An example of that is in the Kollat and Reed paper that I already linked to. Plus there are other ways to do this too! By no means is this an exhaustive discussion, but I hope what we’ve talked about is helpful. Anyway:
Some informal recommendations on how to use constraints within Borg if you choose to use them
- If you don’t have to use constraints, don’t use them! Optimize without them and see how you do (see above).
Make the constraint violations that you’re reporting to Borg meaningful. Notice item 3 in the restricted tournament selection actually does use the values that you report from constraint handling. I’ve seen some folks just say: if infeasible, c = 100000000. But that won’t actually help the restricted tournament mechanism move the set toward meaningful, feasible solutions. For example let’s say that r gives you a value of reliability for a system, and reliability must be above capital R. You could set your constraint violation to: c = R – r, which will give you the difference between your performance and what you are required to have. If all of your solutions are infeasible initially, the algorithm will favor a solution that has, say, 1% reliability lower than R over a solution that has 20% reliability lower than R.
If you have an expensive simulation, skip the objective function calculations if the solution is infeasible. The objective function values for an infeasible solution are not really used for anything, in restricted tournament selection. So you can report the constraint violation, but don’t spend time calculating objective functions that will never be used.
Two quick (important) caveats here: One, remember that infeasible solutions are unacceptable. And the reason why I keep bringing this up is the fact that you can technically shoot yourself in the foot here, and impose constraints that are too strict, where a real decision maker may actually want solutions that violate your imposed constraint. So be careful when you define them! Two, if you do skip the objective function calculation be sure to still report a value of your objectives to Borg, even if that value is just a placeholder. This is important, to make sure the Borg procedure keeps running. If you have questions about these caveats please leave them in the comment box below.
This has been quite a long post so thanks for sticking with me til the end. Hopefully you’ve learned something, I know I did! We can continue the discussion on this blog (for the contributors) and on the comment boxes below (for everyone)! Have a great day.
A like-minded blogger shares some useful tips about GitHub, RStudio, and creating a DOI for your data. Cool stuff!
Recently, R Balaji mentioned this piece in Nature, that talked about how many scientists are using the programming language R as an alternative to expensive, proprietary analysis packages. It’s worth a read!
Also, we have covered R on this blog, with the posts below.
As a quick reminder, the best way to navigate this site is to use the search box to the right of the text, as well as the categories and keywords listed below the box. And if there’s something that we don’t cover and you would like to write about it, join us as a collaborator! Email joseph.kasprzyk “at” colorado.edu for more information.
The WordPress.com stats helper monkeys prepared a 2014 annual report for this blog. We’d like to share it below!
Also, on behalf of all my collaborators I’d like to personally thank you for coming to this blog and contributing. It wouldn’t be anything without you, the community! I hope you had a great holiday season and I wish you a healthy, happy, and productive 2015.
Here’s an excerpt:
The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 33,000 times in 2014. If it were a concert at Sydney Opera House, it would take about 12 sold-out performances for that many people to see it.
Boost is a set of libraries for C++. It increases the language’s functionality, allowing you to do all sorts of interesting things (for example it has lots of random number generators). Boost may already be installed on your local research computing cluster. But there are several reasons why it may be a good idea to have your own copy of Boost to use within your user account:
- It may be difficult or impossible to actually find the location of your computer’s Boost libraries.
- Boost functions are introduced with newer and newer versions of the software. So what if you want to use a function that came out in a later version (i.e., 1.5.6) that is not in the version installed on your computer?
- Perhaps you want to be able to see the source code of the Boost functions within your own account, to better understand how they work.
If so, it’s easy enough to download Boost to your local computer, then upload the files to your user account. Click “current release” on the main Boost website (see the link above). Then download the files to your computer. If you’re on a Windows machine, use a program like 7-zip to unpack all the files (or simply keep the tgz file and unpack them on the cluster, that’s probably faster anyway). Then, upload the Boost files to the cluster. I recommend placing the boost_1_56_0 folder inside the /lib/ folder on your home directory, that way all your libraries can be in one place.
Here’s the important part: any time you use Boost you need to point to where the libraries are stored. Because of that, you’ll need to know the path of Boost, that is, where the files “live” on your computer. There is probably a command in your makefile already that starts with -I. All you have to do is add your new Boost path to the command, on my system it looks something like:
That’s it! Comments questions and concerns shall be given below.