How to set up MinGW for the first time

I was quite surprised that no one on the internet has written a simple list of what to do to get a code compiled with MinGW before.  Seems like you have to have 3 PhDs in computer science to read the directions (or maybe I’m just having a bad day).  Nonetheless, we’re here to help.

By the way, you may be asking “What is MinGW?” It is a port of the GNU compilers on Windows (the GNU compilers being the ones that are used on things like Linux and the computing clusters).  Its website is here.  “Wait a sec, why would we want to use MinGW if we already have Cygwin?  Doesn’t it do the same thing?”  Yes.  But, sometimes things won’t work on Cygwin.   Also, I think MinGW is a lot smaller and lighter weight than Cygwin so if you are limited in resources that’s another reason.

Anyway, here goes:

  1. Go to the MinGW website: http://www.mingw.org/.

  2. Click Downloads, and select ‘mingw-get-setup.exe.’

  3. This is an installer that will grab all the proper packages for you.  Make sure to keep the installation directory the same as default: ‘C:\MinGW\’.  Once you click through the set of initial prompts, you will see an install manager.  That install manager has a set of checkboxes, that ask you if you want gfortran, g++, etc.  Right click on the check boxes and select ‘Mark for Installation.’

  4. In the program, under the Installation menu, select ‘Apply Changes’

  5. Wait for everything to download and install, and follow the instructions.  After you are done, it won’t tell you anything, you should just trust that it has completed.

  6. Note that there is a utility called MINSYS that is installed with MinGW.  What that is, is kind of like ‘Cygwin Lite’.  It is a small set of linux commands and a shell, that you can use to compile programs.  We aren’t going to cover MINSYS here but, this is a set of instructions for how to use it.  But what we want to do, is compile programs in the native DOS command window of Windows.  So we need to do a few more things.

  7. Next we need to set the PATH of windows to know where MinGW is.  Why?  Well you want to open a command window in any directory in Windows, and type ‘g++’ and have the computer know what it is you’re talking about. For Windows XP – Windows 7 users, check out instructions here.  We are using Windows 8, so we can use the search functionality.  If you type ‘environment’ in the Windows search box, you’ll see a link for ‘Set System Environment Variables’. Click it.

  8. It brings you to a window that is titled ‘System Properties’.  Click the button that says ‘Environment variables…’  In that window, in the second box, you can scroll down and see a variable called Path.  It should start with ‘C:\windows\system32;C:\windows;’  Highlight Path and click Edit…

  9. At the very end of the list, type a ; (without a space) and ‘C:\MinGW\bin’  So your path should look like: ‘C:\windows\system32;C:\windows;[otherstuff];C:\MinGW\bin

  10. Click OK three times, to get out of all of the windows.

  11. This step is very important!  Restart your computer, otherwise the Path settings won’t take.

  12. Now, after you are restarted, get to the command line by typing ‘cmd’ in a search box.  A black, old-timey window will pop up.  If you type the command: ‘g++’ , it should tell you “g++: fatal error: no input files”  This means g++ is installed, and it is on the path correctly, and it (ostensibly) works.

  13. You will likely need an additional command instead of just the vanilla ‘g++’ and ‘gfortran’  If you are using a Makefile, you need an additional utility.  On Linux this is called ‘make’, but here it’s called something different: mingw32-make.  Thanks to this forum post for this (and other helpful) hint(s)!

You should be good to go.  Now, you have a minimalistic way to compile stuff that is supposed to work using GNU compilers (on Unix and the clusters) on Windows!  As usual, comments questions and concerns should be posted below.

Advertisements

PDFExtract: Get a list of BibTeX references from a scholarly PDF

So you’ve found a review article with a great list of references that you’d like to include in your own paper/thesis/etc. You could look them up, one-by-one, on Google Scholar, and export the citation format of your choice. (You could also retype them all by hand, but let’s assume you’re savvy enough to use some kind of citation manager).

This is not a great use of your time.

Check out PDFExtract, a Ruby library written by folks at CrossRef. Its goal is to read text from a PDF, identify which sections are “references”, and return this list to the user. As of recently, it has the ability to return a list of references in BibTeX format after resolving the DOIs over the web. When the references in the PDF are identified correctly (about 80-90% of the time in my experience), you’ll now have all the references from that paper to do with as you please—to cite in LaTeX, or import to Zotero, etc.

How to use it

You will need a recent version of Ruby and its gem package manager. Search around for how to do this on your particular OS. As usual, this will be a lot easier on *nix, but I have it working in Cygwin too so don’t despair.

The latest version of PDFExtract (with BibTeX output) is not on the central gem repository yet, but for now you can build and install from source:

git clone https://github.com/CrossRef/pdfextract
cd pdfextract
gem build pdf-extract.gemspec
gem install pdf-extract-0.1.1.gem  # check version number

You should now have a program called pdf-extract available from the command line. Navigate to a directory with a PDF whose references you’d like to extract, and run the following:

pdf-extract extract-bib --resolved_references MyFile.pdf

It will take a minute to start running, and then it will begin listing the references it finds, along with their resolved DOIs from CrossRef’s web API, like so:

Found DOI from Text: 10.1080/00949659708811825 (Score: 5.590546)
Found DOI from Text: 10.1016/j.ress.2011.10.017 (Score: 4.6864557)
Found DOI from Text: 10.1016/j.ssci.2008.05.005 (Score: 0.5093678)
Found DOI from Text: 10.1201/9780203859759.ch246 (Score: 0.6951939)
Found DOI from Text: 10.1016/s0377-2217(96)00156-7 (Score: 5.2922735)
...

Note that not all resolutions are perfect. The score reflects the degree of confidence that the reference extracted from the PDF matches the indicated DOI. Scores below 1.0 will not be included in the final output, as they are probably incorrect.

Go make yourself a coffee while it searches for the rest of the DOIs. Eventually it will move to the second phase of this process, which is to use the DOI to obtain a full BibTeX entry from the web API. Again, this will not be done for DOIs with scores below 1.0.

Found BibTeX from DOI: 10.1080/00949659708811825
Found BibTeX from DOI: 10.1016/j.ress.2011.10.017
Found BibTeX from DOI: 10.1016/s0377-2217(96)00156-7
Found BibTeX from DOI: 10.1016/j.ress.2006.04.015
Found BibTeX from DOI: 10.1111/j.1539-6924.2010.01519.x
Found BibTeX from DOI: 10.1002/9780470316788.fmatter
...

Finish your coffee, check your email, and chuckle at the poor saps out there gathering their references by hand. When the program finishes, look for a file called MyFile.bib—the same filename as the original PDF—in the same directory from which you invoked the pdf-extract command. Open it up in a text editor or reference manager and take a look. Here’s the output from my example:

@article{Archer_1997,
doi = {10.1080/00949659708811825},
url = {http://dx.doi.org/10.1080/00949659708811825},
year = 1997,
month = {May},
publisher = {Informa UK Limited},
volume = {58},
number = {2},
pages = {99-120},
author = {G. E. B. Archer and A. Saltelli and I. M. Sobol},
title = {Sensitivity measures,anova-like Techniques and the use of bootstrap},
journal = {Journal of Statistical Computation and Simulation}
}
@article{Auder_2012,
doi = {10.1016/j.ress.2011.10.017},
url = {http://dx.doi.org/10.1016/j.ress.2011.10.017},
year = 2012,
month = {Nov},
publisher = {Elsevier BV},
volume = {107},
pages = {122-131},
author = {Benjamin Auder and Agn\`es De Crecy and Bertrand Iooss and Michel Marqu\`es},
title = {Screening and metamodeling of computer experiments with functional outputs. Application to thermal$\textendash$hydraulic computations},
journal = {Reliability Engineering \& System Safety}
}

... (and many more!)

A few extra-nice things: (1) it includes all DOIs, which journals sometimes require and are pesky to track down, and (2) it attempts to escape all BibTeX special characters by default. Merge this with your existing library, and be happy! (You could even use this to recover or develop a reference library from your own papers!)

Caveats

  • This works a lot better on journal articles than on longer documents like theses and textbooks. It assumes that the “Reference” section is toward the end, so a chapter-based or footnote-based reference format will cause it to choke.

  • It will not work on non-digital articles—for example, older articles which were scanned and uploaded to a journal archive.

  • Careful with character encoding when you are importing/exporting BibTeX with other applications (like Zotero), or even managing the file yourself. You may want to look for settings in all of your applications that allow you to change the character encoding to UTF-8.

  • Lots of perfectly good references do not have DOIs and thus will not be resolved by the web API. This includes many government agency reports, for example. In general do not expect to magically BibTeXify things other than journal articles and the occasional textbook.

  • Reading a PDF is tricky business—there are some journal formats that just won’t work. You will notice failures based on (1) consistently bad DOI resolution scores, (2) complete failure with an error message from the PDF reader (very hard to trace these), or (3) if your BibTeX file contains bizarre entries at the end. I’ve accidentally “extracted” references about ornithology, for example—just delete these and move on.

RSS Feeds for Water Resources Journals

If you want to keep up with new journal publications, but don’t want to receive dozens of robo-email alerts every week, this post is for you. Go find an RSS reader of your choice and set up an account (some popular choices are The Old Reader, Feedly, and NewsBlur). Personally I use The Old Reader, but your mileage may vary. Feedly has a nice mobile app if that’s a requirement for you.

example-reader

Once you set up your account, you can add subscriptions using the following list of links. These are sometimes hard to find depending on the publisher. If you want to add a journal that’s not listed here, searching Google for “<journal name> rss feed” is usually your best bet. Happy productivity!

Default parameters for MOEAFramework algorithms and Borg

I recently ran into an interesting case when performing runtime dynamics and control map diagnostics on my 2 objective formulation of the lake problem with 2 constraints.  Although Borg yielded a much better reference set for the control map with a small LHS sample for each algorithm, the other algorithms did better on runtime when limited to 25,000 NFE.  In part this resulted from sbx being a dominant operator, allowing those algorithms with it to charge forward while Borg’s adaptive operators took longer to choose it, but it still made me wonder what the default parameters were.

Borg’s were easily found in Table 3 of the paper
Hadka, David, Patrick M. Reed, and Timothy W. Simpson. “Diagnostic assessment of the borg MOEA for many-objective product family design problems.” Evolutionary Computation (CEC), 2012 IEEE Congress on. IEEE, 2012.
found here http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6256466&tag=1.

Ranges of Borg parameters and default

Ranges of Borg parameters and default parameters. L is the number of decision variables

You can find the default parameters for the MOEAFramework algorithms by inspecting the source code.
Below is a table showing the default parameters that one would specify when control mapping for MOEAFramework Native algorithms followed by where to look in the source code for those, Jmetal, and PISA algorithms and operator default parameters should you want more information. Please note, this table does not provide the actual value of the parameter in all cases as the framework seems to adjust some of the values by certain properties of the problem to attain a final value. These are merely user input values.

Default parameters for native MOEAFramework algorithms.  L is the number of decision variables.

Default parameters for native MOEAFramework algorithms. L is the number of decision variables.

To find the default parameters for the algorithms native to the framework (eNSGAII, NSGAII, eMOEA, MOEA/D, and GDE3) you can check the following java class in the source code:

org.moeaframework.algorithm.StandardAlgorithm

For JMetal Algorithms see:

org.moeaframework.algorithm.jmetal.JMetalAlgorithms

and for Pisa Algorithms see:

org.moeaframework.algorithm.pisa.PISAAlgorithms

Finally, if you want the default parameters for the operators, you can look in this class:

org.moeaframework.core.spi.OperatorFactory

Thanks to Dave for pointing out where to look in the source code!

Parallel plots in R

Today I’d like to share some simple code we worked up with the help of CU graduate student Rebecca Smith, for creating parallel coordinate plots in R.  The code plots a Pareto approximate set in light gray, and on top of it is placed several highlighted solutions.  Showing 3 highlighted solutions really gives a nice view of the tradeoffs in many dimensions simultaneously.  You can follow the colored lines and see how objectives and decisions interact, and having the colored solutions above the full tradeoff gives you a sense of how individual solutions compare to the full tradeoff set (i.e., is a solution the lowest in the set?  The highest?)  I would show you an example, but, well, that is left to the reader 🙂

I’ll post the code here and then explain below.

###
#R template to make Parallel Coordinate plots with the full set in gray and highlighted solutions in color
#
#by Rebecca Smith and Joe Kasprzyk
###

#
#User defined parameters
#

#the excel file containing the data, with headers and rows
myFilename="an-excel-file.xlsx"

#the file folder you're working in
myWorkingDirectory="D:/Data/Folder/"

#a vector of the indices you want to plot, in order
objIndices=c(16, 2, 11, 5, 9, 17)

#highlighted solutions.  The default is 3, more or less and you have to change the code below

sol1.row = 125
sol1.color = "blue"

sol2.row = 109   #the red one
sol2.color = "red"

sol3.row = 179   #the green one
sol3.color = "green"

#
#The code is below.  You should have to make minimal changes below this point!
#
setwd(myWorkingDirectory)

options(java.parameters = "-Xmx4g")
require(XLConnect)
library(MASS)

wb1 = loadWorkbook(myFilename)
archive=as.matrix(readWorksheet(wb1,sheet="Sheet1",header = TRUE))

N=length(archive[,1])

#save the specific data you want to plot
par.plot.arch=archive[,objIndices]

#Concatenate a matrix that starts with the rows of all the solutions,
#then appends the specific solutions you want.
par.plot.arch.sol.1.2.3=rbind(par.plot.arch,par.plot.arch[sol1.row,],par.plot.arch[sol2.row,],par.plot.arch[sol3.row,])

#parcoord takes the following arguments:
#the data
#col is a vector of the colors you want to plot
#lwd is a vector of values that will be assigned to line weight
#lty is a vector of values that will be assigned to line type

#note, below that rep.int repeats the integer the specified number of times.  Basically we are telling R to make the first N values
#gray, and then the rest are assigned a specific colour.

#including linetype...
#parcoord(par.plot.arch.sol.1.2.3, col=c(rep.int("grey",N),sol1.color,sol2.color,sol3.color), lwd=c(rep.int(1,N),6,6,6),lty=c(rep.int(1,N),2,2,2),var.label=TRUE)

#without linetype
parcoord(par.plot.arch.sol.1.2.3, col=c(rep.int("grey",N),sol1.color,sol2.color,sol3.color), lwd=c(rep.int(1,N),6,6,6),var.label=TRUE)

Most of my comments are in the comments above, but a few quick thoughts.

First, the code is separated into stuff the user can change, and stuff the user shouldn’t have to play around with too much. One big thing is that the number of ‘highlighted’ solutions is hardcoded to 3. It is pretty simple to add more, you just have to make sure that you add the new entities everywhere where they belong (i.e., in concatenating the data, and also in the plotting).

Second, play around with the ordering of the columns for the best effect. You’ll see that R uses a c construction for you to make a vector with columns indices in any order.

Finally, remember you can also use the col option in parcoord to plot a real-valued color spectrum. We have found, though, this is kind of difficult to do if there isn’t a clear monotonic relationship between the colored variable and the others.

That’s all for now but feel free to comment below!

Runtime metrics for MOEAFramework algorithms, extracting metadata from Borg runtime, and handling infinities

I have been working with runtime metrics for a variety of algorithms on the “Lake Problem” for max NFE of only 25,000 at an output frequency of 1,000. Here are a few things I learned along the way for which the group does not seem to have consolidated resources.

First, most of our recent runtime resources have been focused on Borg, so although it is extremely easy to get runtime metrics for MOEAFramework algorithms (in fact it’s one of the site’s examples), I was initially at a loss for where to look. What I ended up doing was copying the code from the 3rd example found here: http://moeaframework.org/Example3.java and making the following changes.

At the top of the code, I added the following line

import java.io.File;

just below the line that read:

import java.io.IOException;

First, clearly change the name in .withProblem to the name of the problem you defined (See examples 4 and 5 and the manual for more info). You can specify your desired output frequency in .withFrequency and include runtime info beyond that shown in the example. It only collects Elapsed Time and Generational Distance.  Because my problem was not built into the framework, I had to specify a reference set I had already made.  That is why I had to include the import java.io.File line.  Until I added it, I received compilation errors.

My section for setting up the instrumenter looks like this:

// setup the instrumenter to record metrics
Instrumenter instrumenter = new Instrumenter()
.withReferenceSet(new File("./5obj_2const_stoch/myLake5Obj2ConstStoch.reference"))
.withFrequency(1000)
.attachElapsedTimeCollector()
.attachGenerationalDistanceCollector()
.attachHypervolumeCollector()
.attachAdditiveEpsilonIndicatorCollector();

You will also need to specify the desired problem, algorithm and max NFE for the Executor. Further if you are using an epsilon dominance algorithm, you can specify the epsilons with the following line:

.withEpsilon(0.01,0.01)

I had a two objective problem with epsilons of 0.01 for both objectives.

Finally, I became frustrated with the runtime printing format included in the example as it didn’t have a consistent field separator when I ran it. This may not have actually been important, but below is what I used:

System.out.println("NFE"+"\t"+"Elapsed Time"+"\t"+
"Generational Distance"+"\t"+"Hypervolume"+"\t"+
"Additive Epsilon Indicator");

for(int ii=0; ii<accumulator.size("NFE"); ii++){
System.out.println(accumulator.get("NFE",ii)+"\t"+
accumulator.get("Elapsed Time",ii)+"\t"+
accumulator.get("GenerationalDistance",ii)+"\t"+
accumulator.get("Hypervolume",ii)+"\t"+
accumulator.get("AdditiveEpsilonIndicator",ii));
}

You could also use the code included in Matt’s blog post here: https://waterprogramming.wordpress.com/2013/02/05/matplotlib-part-i-borg-runtime-metrics-plots/ although I haven’t actually tried it yet.

Once this was done, I was excited to plot the results, but that is where I ran into another glitch.  At the extremely small intervals, the values for Additive Epsilon Indicator and Generational Distance were Infinity as the algorithms had yet to find any feasible solutions.  This was easily solved by the following command.  Thank you Jon for recommending it!

sed 's/Infinity/-9999.0/g' all_of_my_files.txt >> write_to_file.txt

It goes through the first file specified and replaces all of the “Infinity” values with -9999.0 before writing the output to the file after the sideways carrots.  This made it much easier to import the data to Matlab where I set the -9999.0 values to NaNs.  Since I had 300 files, I put this in a bash script that looped through my runtime files for all algorithms and seeds.

Finally, I wanted to plot Borg operator probabilities, but now that Borg is not in the framework, its initial output files are not very pretty.  There is a way to extract the data if one consults section 10.2.2 of the MOEAFramework manual.  Then, you get nice files that look kind of like the one’s in the blog post I referenced by Matt earlier.  This made it much easier to plot the data.  It also let me repeat the Python exercise in that post with my own files, which was fun.

March 27, 2015 edit: I just found this video, which provides an overview of calculating runtime metrics with the MOEAFramework.  It also includes an example of reading a seed from the command line at the beginning of the java code.