Recommended Software for the Kasprzyk Group

In this post, I provide a list of recommended software for multi-objective optimization research and a bit of context about each item. This is an update of two posts (for Windows users and Mac-users, respectively) that Joe made several years ago and is intended for Windows users. Although this list is catered toward members of the Kasprzyk Group at the University of Colorado Boulder (CU), it should be relevant to most readers of this blog.

Please feel free to make comments and give additional suggestions!

Text Editors

Text editors are a great way to view data (e.g., csv or space-delimited data) or review some code. If you are looking to run code in an interactive manner, you’ll want to get an interactive development environment (IDE) which corresponds to the programming language you are working with. I’ll get into that more in the “Programming Languages” section. If you have a Windows machine, you are probably used to opening things with the default text editor, Notepad. Notepad is the worst.

Notepad++

Notepad++ is infinitely better! The formatting is great, it has a bunch of plug-ins you can download, it improves readability, and runs quickly. Notepad++ is my preferred text editor if you want to look at a couple files. If you are looking to manage a larger project with several files and multiple directories, Atom is the way to go.

Atom

Atom is extremely powerful and customizable. It is made for software developers in mind and is basically the modern version of Notepad++ (which has been around for a while). It integrates easily with Git and GitHub (see “Version Control and Open Source Repositories” for explanation of these tools), it has an extensive library of packages–similar to Notepad++ plug-ins–and its growing constantly. Atom also blurs the line between text editor and IDE, because with the Atom-IDE packages, it can have IDE-like functionality.

Programming Languages

You never know what languages you might work with in your research, but the main languages we use are Python, R, and C/C++. If Joe asks your preference, tell him that Python >> R and C/C++ >> R. Although if you catch him at a moment of weakness, he might admit that R can do some stats things, I guess.

Python

If you are working in Python (download), it will likely depend on the project whether you are working in 2.7 or 3.X (the X’s being whatever version they are on). A lot of scientific computing research still uses Python 2.7, but more people are transitioning to Python 3. In fact, they have decided to stop maintaining Python 2.7 at midnight on January 1, 2020. So remember to pour one out for your homie, Python 2.7, on New Years 2020. Although there are plenty of other python IDEs  (e.g., Spyder, Rodeo), we generally use PyCharm Community (download) in the group. For ease of installing packages, download Anaconda (download) which installs Python and over 150 scientific packages automatically.

R

If you are working it R (download), most everyone uses Rstudio (download Open Source Licence) as an IDE. You will also want to download Rtools, which is helpful to have installed for building some packages which require it. Generally, these packages require command line tools and compiling languages other than R.

C/C++

Although most projects are focused on Python and R, we have some work in C and C++ as well. These very interrelated languages, unlike Python and R, need to be compiled before you can run the code. To do this, you’ll need to download a compiler. We generally recommend installing MinGW (download). This will allow you to compile programs that will work across different platforms (e.g., Windows, Linux) while working on a Windows machine. Pro tip: if you have Rtools installed, it comes with MinGW so you don’t need to download it separately! If you want to compile things in for POSIX application deployment for Windows, you’ll want Cygwin. If you have no idea what that means, that’s okay. It’s my belief that you shouldn’t download Cygwin unless you need it because it takes up a lot of space on your system. But if you don’t care about storage than go ahead! For most cases I run into, MinGW does the job.

Why do we even care about compiling things across platforms? Well, you may be running code on a Windows machine, but if you are doing any work with the supercomputer, you will also need to run that code in a Linux environment. So you want to make sure your code will work on both platforms.

Although it isn’t perfect, CodeLite (download) is a nice (and free) IDE for C/C++ work.

Version Control and Online Repositories

Version control is the best way to your manage your code. It allows you to track your changes, make notes, and even revert to older versions of your code.

Git

Git (download) is the most popular way to implement version control but other methods do exist. Learning Git takes a bit of time, but it is an essential skill to learn which will pay dividends in the future. Most IDE (interactive development environments) interface with Git and even some text editors (I’m looking at you, Atom).

GitHub

GitHub is where people post their code and data (bundled into so-called repositories) to share with the world. It is amazing collaborative environment, and since we do computational research, it is best practices to create a GitHub repository for code related to papers that you publish. Soon it will be a requirement in most journals!

Both Git and GitHub have stellar documentation and tutorials about how to get started, so you will have lots of support when you start learning the ropes.

Command Line Interfaces, Supercomputing, and File Transfer

If you haven’t used a command line interface (CLI) before, you will definitely learn in this line of work. CLIs can be useful for interacting with Git (I prefer it for most version control tasks), installing open source software, and tasks like copying/moving files or creating/moving directories on your local machine. My apologies in advance if I butcher the language in this section related to Linux, shell, bash, etc!

Git Bash

For these tasks, you can use the Command Prompt which is a default program in Windows. But more likely, you’ll want to use a CLI which accepts Linux commands (that way you can use the same commands for your local CLI–which runs Windows–and for interacting with the supercomputer–which runs Linux). I like using Git Bash since it gets installed automatically when you install Git. You can also use Cygwin, but like I said before, this is generally overkill if you are just interested in writing Linux commands if you have Git Bash or something else already installed. I’m sure there are many other CLIs to choose from as well that I’m not aware of.

While its a nifty skill to know how to use the command line while working on your local computer, it is absolutely essential when working on the supercomputer (i.e., cloud or remote computing). Although I’ve seen some graphical interfaces and interactive environments for using cloud computing resources, it is far more common to perform tasks from the command line. You can even connect to the supercomputer with a CLI using the ‘ssh’ command.

MobaXterm

To make your life a bit easier though when connecting to remote computing resources, you can download MobaXterm or Putty and WinSCP. As David explains in his post, MobaXterm is probably the best way to go. It does the job of both Putty (connecting to remote computing) and WinSCP (moving files between your local computer and a remote resource).

If you’re doing any work on the supercomputer at CU, check out Research Computing for tutorials and other details about our supercomputers.

Cloud Storage

University of Colorado has unlimited Google Drive storage which is linked to your CU Gmail account; therefore, it is the cloud storage of choice for the group. Google Drive File Stream allows you to access files stored on your Drive on your local computer without having to download your whole drive. Meaning it won’t take up a ton of memory but it will ‘feel’ like the files have been downloaded onto your computer (as long as you are connected to the internet). If you know you won’t be connected to the internet, you can easily download certain files/folders or even your whole Drive if you would like.

The supercomputer can serve as cloud storage; however, it is best to keep those files backed up locally, if possible. I’ve heard too many horror stories about people storing important data on the supercomputer and it getting erased! Although this can be avoided by storing things in the right place, you might sleep better if you’ve got another copy.

Multi-objective Optimization and Visualization

Borg

You’ve probably heard of Borg Multi-objective Evolutionary Algorithm (MOEA). If not, you will soon! There’s no direct download link for Borg, but you can fill out a form on its website to request the source code.

DiscoveryDV

Once you’ve performed an optimization, you will want to visualize the results. You can do this in your favorite programming language, but it is often difficult to interact with the data that way. For an interactive visualization experience, we generally use DiscoveryDV. Just like Borg, DiscoveryDV is not available for download directly, but you can request it on their website.

Open Source Projects

Additionally, here are a few open source projects related to multi-objective optimization, robust decision making, and visualization that may be useful to be familiar with: Project Platypus, OpenMORDM, and Exploratory modeling workbench.

Reference Managers

Reference managers are amazing things. Finding the right one will save you a lot of time and effort in the future. As Jazmin mentions in her post on research workflows, there are a ton to choose from. Check out this comparison table of reference management software, if you want to go down that rabbit hole.

Zotero

Joe’s group uses one called Zotero (download both the standalone and Chrome connector) which is free, easy to use, and integrates well with Microsoft Word.

Graphic Design

There are many ways to create custom figures for papers–PowerPoint is an easy choice because you likely already have Microsoft Office on your computer. However, Adobe Illustrator is much more powerful. Since Illustrator requires a license, ask Joe for more details if you need the software.

Happy downloading!

Using a virtual machine to run 32-bit software on a modern PC

In this post, I’ll talk about how to set up a virtual machine on a PC, in order to run outdated software that may have been optimized for a different version of Windows. For example, a collaborator of mine uses the EPA Water Treatment Plant model which only seems to work under 32-bit versions of the operating system.

Continue reading

Setting up Eclipse for C/C++

IDEs are tools to make code development a lot easier, specially if your project has multiple files, classes, and functions. However, setting up the IDE can sometimes be as painful as developing complex codes without an IDE. This post will present a short tutorial about how to install and configure Eclipse for C/C++ on Windows 7 in a (hopefully) fairly painless manner. This tutorial is sequenced as follows:

  1. Installation
    1. Downloading the Java Runtime Environment.
    2. Downloading the GCC compiler.
    3. Downloading Eclipse.
  2. First steps with Eclipse
    1. Setting up a template (optional)
    2. Creating a new project
    3. Including libraries in your project

INSTALLATION

Downloading the Java Runtime Environment

To check if you have the Java Runtime Environment installed, go to java.com with either Internet Explorer or Firefox (Chrome will block the plugin) and click on “Do I have Java?”. Accept running all the pluggins and, If the website tells you you do not have java, you will have to download and install it from the link displayed on the website.

Downloading the GCC compiler

After the check is done, you will have to download the GCC compiler, which can be done from http://www.equation.com. On the side menu, there will be a link to Programming Tools, which after expanded shows a link to Fortran, C, C++. Click on this link and download the right GCC version for your system (32/64 bit), as shown in the following screenshot.

DownloadGCC

After downloading it, double click on the executable, accept the licence, and type “c:\MinGW” as the installation directory. This is important because this is the first folder where Eclipse will look for the compiler in your computer. Proceed with the installation.

Downloading Eclipse

Now it is time to download an install eclipse. Go to the Eclipse download website and download Eclipse IDE for C/C++ Developers. Be sure to select the right option for your computer (Windows, 32bit/64bit), otherwise eclipse may not install and even if it does it will not run after installed. If unsure about which version you should download, this information can be found at Control Panel -> System by looking at System type.

Download

After downloading it, extract the file contents to “C:\Program Files\eclipse” (“Program Files (x86) if installing the 32 bits version) so that everything is organized. Note that for this you will need to start WinRAR or any other file compression program with administrative privileges. This can be done by right clicking the name of the program on the start menu and clicking on Run as Administrator.

Now, go to C:\Program Files\eclipse and double click on eclipse.exe to open eclipse. In case you get an error message saying, among other things:

Java was started but returned exit code=13
...
...
-os win32
-ws win32
...

then delete the whole eclipse folder, go back to the eclipse download page, download eclipse 32 bit, and extract it as previously described. You should not see the same error again when trying to run eclipse.exe.

Now that Eclipse is up and running, it is time to use it.

FIRST STEPS WITH ECLIPSE

The first thing eclipse will do is ask you to choose a workspace folder. This is the folder where all your code projects will be stored. It should not matter too much which folder you choose, so using the default is probably a good idea.

Setting up templates (optional)

It is helpful to create a code template in order to avoid retyping the same standard piece of code every time you create a new file or project. Many scientific codes have similar imports (such as math.h and stdio.h) and all of them must have a main method (as any C++ code). If we create a code template with a few common imports and the int main function, we can just tell Eclipse when creating a new project to add these to a new .cpp file.

In order to create the mentioned template, go to Window -> Preferences. There, under C/C++ -> Code Style on the left panel, click on Code Templates. Under Configure generated code and comments, expand Files -> C++ Source File, and then click on New. Choose a meaningful name for your template (I chose “Cpp with main”) and type a short description. After that, copy and paste the template below under “Pattern”.

/*
File: ${file_name}

Author: ${user}
Date: ${date}
*/

#include <iostream>
#include <string>
#include <math.h>
#include <stdio.h>
#include <string.h>

using namespace std;

int main()
{
    // Your code here.

    return 0;
}

Note ${file_name}, ${data}, and ${user} are variables, which means that they will be replaced by your file’s actual data. To see a list of the other variables that can be inserted in your template, click on Insert Variable…. Click Ok and Ok again and your template will be ready to be used!

Configuring_template

Creating a new project

Click on File -> New -> C++ Project. Under Project type choose Empty Project, then under Toolchains choose MinGW GCC, and, finally, type “project1” as your project name an click on Finish.

New_project

After your project is created, click on File -> New -> Source File. Type “say_something.cpp” (no quotes and do not forget the .cpp after the file name) as the name of your source file and choose the template you created as the template. The window should then look like this:

New_file

Click on Finish. If you used the template, replace the comment “// Your code here.” by “cout << “Yay, it worked!” << endl;”. Your code should look like the snippet below. If you have not created the template, just type the following code to your file.

/*
File: say_something.cpp

Author: bct52
Date: Jun 26, 2015
*/

#include <iostream>
#include <string>
#include <math.h>
#include <stdio.>
#include <string.h>

using namespace std;

int main()
{
    cout << "Yay, it worked!" << endl;

    return 0;
}

Now, build the code by clicking on the small hammer above the code window and, after the project is built, click on the run button (green circle with white play sign in the center). If everything went well, your window should look like the screenshot below, which means your code compiled and is runs as expected.

Project1_run

Including libraries in your project

When developing code, often times other people have had to develop pieces of code to perform some of the intermediate steps we want our code to perform. These pieces of code are often publicly available in the form of libraries. Therefore, instead of reinventing the wheel, it may be better to simply use a library.

Some libraries are comprised of one or a few files only, and can be included in a project simply by dragging the file into the Eclipse project. Others, however, are more complex and should be installed in the computer and then called from the code. The procedure for the latter case will be described here, as it is the most general case . The process of installation and usage of the Boost library with MinGW (GCC) will be used here as a case study.

The first step is downloading the library. Download the Boost library from here and extract it anywhere in your computer, say in C:\Users\my_username\Downloads (it really doesn’t matter where because these files will not be used after installation is complete).

Now it is time to install it. For this:

    1. Hold the Windows keyboard button and press R, type “cmd”, and press enter.
    2. On the command prompt, type “cd C:\Users\bct52\Downloads\boost_1_58_0” (or the directory where you extracted boost to) and press enter.
    3. There should be a file called bootstrap.bat in this folder. If that is the case, run the command:
      bootstrap.bat mingw
    4. In order to compile Boost to be used with MinGW, compile Boost with the gcc toolset. You will have to choose an installation directory for Boost, which WILL NOT be the same directory where you extracted the files earlier. In my case, I used C:\boost. For this, run the command:
      b2 install --prefix=C:\boost toolset=gcc

      Now go read a book or work on something else because this will take a while.

Now, if the installation worked with just warnings, it is time to run a code example from Boost’s website that, or course, uses the Boost library. Create a new project called “reveillon” and add a source file to it called “days_between_new_years.cpp” following the steps from the “Creating a new project” section. there is no need to use the template this time.

You should now have a blank source file in front of you. If not, delete any text/comments/codes in the file so that the file is blank. Now, copy and paste the following code, from Boost’s example, into your file.

 /* Provides a simple example of using a date_generator, and simple
   * mathematical operatorations, to calculate the days since
   * New Years day of this year, and days until next New Years day.
   *
   * Expected results:
   * Adding together both durations will produce 366 (365 in a leap year).
   */
  #include <iostream>
  #include "boost/date_time/gregorian/gregorian.hpp"

  int
  main()
  {
    
    using namespace boost::gregorian;

    date today = day_clock::local_day();
    partial_date new_years_day(1,Jan);
    //Subtract two dates to get a duration
    days days_since_year_start = today - new_years_day.get_date(today.year());
    std::cout << "Days since Jan 1: " << days_since_year_start.days()
              << std::endl;
    
    days days_until_year_start = new_years_day.get_date(today.year()+1) - today;
    std::cout << "Days until next Jan 1: " << days_until_year_start.days()
              << std::endl;
    return 0;
  };

Note that line 9 (“#include “boost/date_time/gregorian/gregorian.hpp””) is what tells your code what exactly is being used from Boost in your code. Line 15 (“using namespace boost::gregorian;”) saves you from having to type boost::gregorian every time you want to use one of its functions.

However, the project will still not compile in Eclipse because Eclipse still does not know where to look for the Boost library. This will require a couple of simple steps:

  1. Right click on the project (reveillon), under the Project Explorer side window, then click on Properties. Under C/C++ Build->Settings, click on Includes under GCC C++ Compiler. On the right there should be two blank boxes, the top one called Include paths (-I) and the other called Include files (-include). Under Include paths (top one), add the path “C:\boost\include\boost-1_58” (note that this path must reflect the path where you installed Boost as well as which version of Boost you have). This is where the compiler will look for the header file specified in the code with the #include statement.
  2. The compiled library files themselves must be included through the linker. This step is necessary only if you are using a compiled library. For this, on the same window, click on Libraries under MinGW C++ Linker. Add the path to the Boost libraries folder to the Library search path (-L) (bottom box). this path will be “C:\boost\lib” (again, if you installed Boost in a different folder your path will be slightly different). Now the actual compiled library must be added to the Libraries (-i) (top box). First, we need to figure out the name of the compiled library file used in the code. In this case, it is the file “libboost_date_time-mgw51-mt-d-1_58.a”. Therefore, add boost_date_time-mgw51-mt-d-1_58 (no lib prefix, no .a postfix, and be sure to match the name of your file) to Libraries (-i). Click Ok and Ok again.

Now compile the code by clicking on the hammer button and run the rode by clicking on the play button. Below is a screenshot reflecting both steps above as well as the expected output after running the program.

configuring_library

That’s it. After your model is in a good shape and it is time to run it with Borg (or other optimization algorithm), just change your “int main()” to a function with your model’s name and the right Borg’s arguments, add the standard Borg main, and change the makefile accordingly. Details on how to do all this for Borg will be explained in a future post.

Setting up Python and Eclipse

According to its website, Python is:

…an easy to learn, powerful programming language. It has efficient high-level data structures and a simple but effective approach to object-oriented programming. Python’s elegant syntax and dynamic typing, together with its interpreted nature, make it an ideal language for scripting and rapid application development in many areas on most platforms.

The Python interpreter and the extensive standard library are freely available in source or binary form for all major platforms from the Python Web site,http://www.python.org/, and may be freely distributed. The same site also contains distributions of and pointers to many free third party Python modules, programs and tools, and additional documentation.

This post covers how to set up Python and the Eclipse development environment.  We also provide a collection of posts on how to use Python for data analysis, starting here.

————————————————————————————————————————————————————–

PYTHON:

————————————————————————————————————————————————————–

The first step is to download Python and its various packages that will likely be useful to you at some point.

Python itself is available at: http://www.python.org/download/

I would recommend downloading and installing version 2.7.2, the latest production release under the 2.X series.  Also, stick with the 32-bit version as most all packages will be available for this version.  Avoid Python 3.X for now.  It is not as widely supported among the various Python packages that you might find useful and as such, should be avoided for now.  Keep in mind that there are some syntax differences as well between versions 2.X and 3.X that would need to be addressed whenever it does come time to update.

Just use the default settings during installation.

NOTE: If you have Cygwin installed on your system, it too has likely installed a version of Python.  Whenever you run Python from the command line, you should be careful to ensure that you are using the version that you expect (i.e., the default Cygwin installed Python versus the one that you installed).  Just be aware of this.  In general, it is easy to identify the version being picked up from the path name.  Also, it is generally best to use the version that you have installed.  It will usually be located in C:\Python27 whereas the Cygwin version will be located in C:\Cygwin\bin.

Now, install the various packages that may be useful. You should always be careful to install a version of the package that matches your version of Python (i.e., 2.7 if you are following my instructions).  Sometimes, if a package is not available for the version you are using (i) you may still be able to use it, or (ii) you may need to make minor tweaks to the package source to get things running. Also, always download the package installers, not the source.  Here are the common ones that you should definately install:

  • NumPy and SciPy available at http://numpy.scipy.org/.  These packages are useful for performing scientific computing within Python.  Download the “win32 superpacks” for each of these packages for the version of Python that you have installed.
  • PIL – the Python Imaging Library available at http://www.pythonware.com/products/pil/.  This package is useful to manipulating image files.
  • matplotlib – a 2D plotting library with Matlab-like syntax available at http://matplotlib.sourceforge.net/.  This package is very good for creating good publication quality figures.  If you starting using it, you will probably notice that the appearance of the figures, even on-screen, is much improved over what Matlab can produce.

The following are some optional packages based on your particular needs:

  • Py2exe – a package for bundling Python scripts into MS Windows executable programs available at http://www.py2exe.org/.  This is what I use to bundle all of the libraries and source code required by AeroVis into a self contained package that can be installed on any Windows system without the need to build or install Python, VTK, Qt, etc.
  • wxPython – GUI package for Python available at http://wxpython.org/.  Note, this is for developing graphical user interfaces (GUIs) for your Python scripts, it is not a GUI for Python.
  • PyQt – another GUI package for Python available at http://www.riverbankcomputing.co.uk/software/pyqt/intro.  PyQt is a set bindings for Nokia’s Qt application framework – a very rich and full featured graphical interface development framework.  AeroVis uses PyQt for its graphical interface.

————————————————————————————————————————————————————–

ECLIPSE:

————————————————————————————————————————————————————–

Now that you have Python and all of your needed packages installed, you can now move on to Eclipse. Eclipse is available from http://www.eclipse.org/downloads/packages/release/indigo/r.  The latest release (and probably the version you should be using) is Indigo.  Since we primarily use Visual Studio for C/C++ development, I would recommend downloading the IDE for Java as this will serve to provide you with a Java environment should you choose to explore this down the road.  I think you should be able to install either the 32-bit or 64-bit versions without issue.  Just make sure you are running a 64-bit OS if you choose to install that version.  When you go to download, Penn State actually has a mirror so choose this.  BTW, don’t choose the BitTorrent option – not a good idea on PSU networks.

Once you have downloaded the zip file containing Eclipse, you just unzip it wherever you want it to be installed.  This includes portable drives etc.  The beauty of Eclipse is that unlike many Windows programs, it is completely self contained and as such, can be run from any location.  Once unzipped, create a shortcut to the Eclipse executable and start it up.

————————————————————————————————————————————————————–

PYDEV:

————————————————————————————————————————————————————–

Now that Eclipse is installed, we can add a Python development environment inside Eclipse that will provide a very nice Python IDE with debugging capabilities, etc.

The install for packages inside Eclipse proceeds a little differently than what you may be used to.

The best option for installing PyDev is probably to install Aptana Studio which includes a variety of development tools.  Go to this site for instructions http://www.aptana.com/downloads/start or read on.

1) In the Eclipse Help menu, select Install New Software
2) Paste this URL into the Work With box: http://download.aptana.com/studio3/plugin/install
3) Check the box for Aptana Studio and click Next
4) Accept the license, etc., and restart Eclipse

Another option is to only install PyDev from within Eclipse, carefully follow the instructions available at: http://pydev.org/manual_101_install.html.  There’s no need for me to rehash all of these instructions here as they are quite good at the PyDev site.

Once PyDev is installed, you should be ready to go.

————————————————————————————————————————————————————–

Let me know if you run into any problems by leaving a comment.

————————————————————————————————————————————————————–

Up Next Time…

Developing and debugging Python scripts and projects in Eclipse

An Alternative File Manager

Here’s another useful piece of software for those of you who are sick of the built-in options for file management, especially in Windows.  Q-Dir located at http://www.softwareok.com/?seite=Freeware/Q-Dir will allow you to view up to four locations and once and it allows you to do everything that you would in a normal file manager.  Additionally, it is simply an executable, which means that you can just place it on a portable drive and create a shortcut to it on your desktop.

I have been using this for several months now and have been quite happy with it.

See also: here

Software for Comparing Files and Directories

I recently needed to compare a bunch of directories with one another in order to ensure that I had the most up-to-date files archived on my system.  For those of you involved on the AWR Comparison Study, it was for synchronizing all of the re-run data.  I found the following software available at http://winmerge.org/ quite useful in doing this.

WinMerge will help you to compare the contents of two directories to ensure that you don’t end up losing some files when you go to clean up your directories following a study.  Additionally, you can just drag both of the folders you want to compare into the main application window rather than having to browse to the directories.  It is very fast and efficient.

See also: here