Beginner’s LaTeX Guide

What are TeX and LaTeX?

TeX is a low-level markup and programming language used to typeset documents, created by Donald Knuth. TeX is a powerful typesetting tool, but can be difficult to use because of the time it takes to create custom text formatting macros.  To get around this difficulty, there are programs, like LaTeX, that come with pre-built macros. LaTeX is more user-friendly, but lacks the flexibility of TeX.

Installing a TeX System: MiKTeX

MiKTeX is an implementation of Knuth’s TeX system. You’ll need a TeX system on your computer so the LaTeX commands are recognized by your machine. My decision to personally use MiKTeX is based on its compatibility with the WinEdt software we use in the Reed group, you can also use TeX Live as your TeX system, but I have no experience with that software.  Once you have a TeX system installed on your computer, you can compile LaTeX documents using a command line and text files (saved with the proper file extension). Most people find this difficult, which is why many people use a TeX editing software.

Installing a TeX Editor: WinEdt

The software that we have a license for in the Reed group is WinEdt. There are other free options such as TeXnicCenter and many, many others. For a whole discussion on pros and cons of different editors see Wikipedia’s article comparing different TeX editors. Once you’ve installed WinEdt, you can go to Documents -> Current Work (Samples) within the program to compile one of the sample documents included in the program to ensure your software is properly installed/configured.

(Reed Members: Talk to Josh for license information. I believe the Reed license is only valid for WinEdt 5.5, which is not the latest version.)

Learning some Basic Commands

Luckily, there are MANY MANY sources for learning LaTeX commands. A good place to start is Wikipedia’s LaTeX Wikibook. Starting under the tab “Absolute Beginners” will walk you through very simple document creation. Another good place to start is Andre Heck‘s short course in LaTeX called Learning LaTeX by Doing. Within this course, there are 24 exercises designed to get you familiar with commands, and typing your own LaTeX documents. If you’re just interested in trying out these exercises without installing software, you can use Latexlabs.org to compile your LaTeX documents online. Once you become familiar with the commands, a good place to start with a unique document is putting together a LaTeX resume/CV.  This will get you familiar with simple document commands such as tables and lists.

Some Resources

Winston Chang has written a comprehensive document that compresses most of the major LaTeX commands to two pages: http://www.stdout.org/~winston/latex/latexsheet.pdf.

If you’re interested in using LaTeX to write a Penn State thesis/dissertation, Gary L. Gray and Francesco Costanzo have written a thesis template to use: http://www.esm.psu.edu/psuthesis/

There’s even a LaTeX template that makes your documents look like MS Word!

Getting the most out of academic papers

This post puts together some informal thoughts on how to get the most out of an academic paper.  I’m grateful to discussions with Pat Reed, Thorsten Wagener, and Klaus Keller through the years that have given me some of these ideas.

How to Find Good Papers

  • Use Web of Knowledge and/or Google Scholar to search for the most relevant citations. You can even start with a general topic, such as “water supply planning”.  There will be 1,000s of citations of course. But “sort by times cited” and you will likely find the most important benchmark papers everyone has read.  Download these (at least) and read them (preferably).  You’ll be expected to know these references!
  • Make good use of review articles.  Did you know that Nicklow et al. (2010) reviewed applications of evolutionary algorithms in water resources? (find it here: http://link.aip.org/link/doi/10.1061/(ASCE)WR.1943-5452.0000053)  Review articles like this are great resources for learning a lot about a field.  There are similar reviews for hydro-economic modeling (Harou et al., http://dx.doi.org/10.1016/j.jhydrol.2009.06.037) and multi-reservoir operations (Labadie, http://dx.doi.org/10.1061/(ASCE)0733-9496(2004)130:2(93)).  I’m sure there are good examples for your field too.
  • Look at group websites to collect more than one paper from the same author.  Our group website is a great example of course.
  • Use literature reviews in other papers and theses.  A lot of times, other papers, dissertations, and theses do some of the work for you by reviewing the literature in a particular field.  Use these resources and download the papers cited by these other authors.  Of course, do not plagiarize their words.  If you’re borrowing ideas from a list of literature from Smith (2012), you can even cite Smith (2012) by saying “As reviewed by Smith (2012)…”

How to Read a Paper

You’ve found some good papers to read.  So you get yourself a cup of tea, print out a paper, and start out at page 1.  That’s not really the best way to go about reading the paper!  What if this paper isn’t one that you actually need to read?  Let’s face it, you will probably have to cite 100 papers in your thesis and it is difficult to read every single one, especially in one sitting.  What if the important info doesn’t start until page 15?  The human attention span is not very long, and you could get yourself lost.

Instead, try this approach:

  • Read the abstract. A good abstract will tell you what the paper aims to achieve, what methods the authors used to achieve those aims, and the implications of the results. A great abstract will also discuss the limitations of prior work in the field, and how the presented work could be expanded to other studies or other fields.
  • Does the abstract seem relevant and interesting? Great. Now Look at the figures. What types of analysis are the authors presenting?  Do the figures make sense, and do the captions explain what you’re supposed to look for?  When you’re reading the full text later, you’ll want to use the figures as a roadmap.  It’s helpful to know what’s coming so that you’ve seen it before you get there.
  • Is the paper still keeping your interest? Wonderful.  Time to read the conclusion.  The conclusion should give the authors’ insight on what it is that they actually did.  This should give you the take-home message that you should, well, take home when you read the work.
  • Now you can Start at the beginning and read the paper. Pay particular attention to the methodology — if the paper talks about a basin in Malaysia, it probably uses a model or analysis technique that you could apply to your own basin.  It’s not a good enough excuse to say “Oh, well the authors aren’t working on a problem that’s exactly like mine.”  You should try to be familiar with papers that are from all sorts of different fields.

Remember that you can get a lot out of the first few steps of the process.  So if you look at the abstract, the figures, and introduction, you may get enough out of it to save the paper for a more careful treatment later.  It’s better to be familiar with a whole lot of references from many different authors and groups, in my opinion, than get tunnel vision on one paper.  Especially since you will get more out of a paper if you revisit it later after you’ve learned more about the field.

Some Tasks to Try

A lot of people need to “do” something when reading to make sure they get the jist of the paper effectively.  Here are some suggestions:

  • Highlighting.  This is pretty self explanatory, but try the features in Adobe Reader or the free FoxIt reader (see http://www.foxitsoftware.com/)  Also question things that you don’t understand or don’t agree with in the margins of the paper (i.e., “What were they thinking?”).  This really helps when you revisit the paper later.
  • Write a one-sentence summary.  This is harder that it would seem at first.  How do you distill a 20 page paper down to a single sentence?  This is a good habit to get into for every paper you read, especially since you will probably need to do it when you’re writing the literature review in your thesis.  Most papers will put a sentence like this right in the abstract, so adapt it from there.
  • Write a 500 word summary.  Again, it’s harder than it initially seems.  This page gives some helpful hints on writing summaries.  Always do this without plagiarizing the original material.  Writing a succinct summary of something can be a valuable skill, especially when adapting your own work in different venues.

An Exercise

Download Reed and Minsker (2004) “Striking the Balance: Long-Term Groundwater Monitoring Design for Conflicting Objectives” here: http://link.aip.org/link/doi/10.1061/(ASCE)0733-9496(2004)130:2(140).  It’s a foundational paper for our field, since it’s one of the first applications of a many-objective (4 or more) optimization problem in water.  Fulfill the following tasks:

  1. Write a one-sentence summary of the paper.
  2. Write a 500-word summary of the paper, making sure you hit the most important results presented there.
  3. Provide a brief critique, including one thing the paper did well and one thing it did poorly or you want to see expanded.
  4. List the most important 3 references cited in the paper and discuss their relevance to the study.  Does the current study expand or improve on these references?

As always feel free to add comments or questions below!

How to cite packages in R

R is a nice statistical tool or language to use, because it is free and provides many useful packages for data analysis.  I just found out about a neat way that R will actually generate a BibTeX citation for you regarding a specific package.  It’s explained here:

http://astrostatistics.psu.edu/su07/R/html/utils/html/citation.html

Do you have tips on using R?  If so edit this post or provide a comment below.

Virtual Machines for Remote Code Development

Setting up a Virtual Machine [VM] for Remote Code Development

Many times you’ll be asked to develop applications on remote machines.  Generally these machines are running some flavor of Linux or Unix (*nix systems).  Often, this can be quite complicated for those who are unfamiliar with using command lines or the “vi” editor.  This guide will get you started using a virtual machine to run a Linux operating system on your Window’s PC, and will help alleviate some of the headache associated with remote development.

I’m suggesting the use of a VM for remote development as opposed to separate SSH and X-Server forwarding software such as Cygwin because the VM gives you access to a lot of the software and features of the remote machines on your local machine.  Even things like LaTeX become readily available.  I’m suggesting setting up a local development area because once you’ve “cut your chops” on remote development, you’ll appreciate being able to rapidly develop code locally and then push updates to the remote machines for ‘production’ runs.

This guide will help you get up an running with the VirtualBox VM, with a version of CentOS (a popular flavor of Linux).  CentOS comes with software packages which are directed at software development, such as Eclipse, and contains features available on large cluster systems.  You’ll even be able to install openmpi, and be able to program and test parallel applications, if you so choose.

Installing VirtualBox

VirtualBox is a pretty well-supported, open-source piece of software.  The homepage is located at www.virtualbox.org, and contains links to installers as well as users guides and documentation.  You’ll need to download the Host Software as well as a Guest OS.  The Host software, (VirtualBox) runs the virtual machine (the Guest).  Just like a standalone computer, you’ll need an OS to run on the Guest.  Here’s a quick link to their download page:

https://www.virtualbox.org/wiki/Downloads

You’ll want to head over there and download the latest version of VirtualBox for Windows Hosts (x86/amd64).  This is the VM software that will manage your virtual machines.  At the time of this writing, VirtualBox is at version 4.1.8.  They have a pretty solid install guide in their manual, located here:

https://www.virtualbox.org/manual/UserManual.html

Go ahead and install VirtualBox, and configure a new virtual machine.  A good video guide (though slightly dated) is provided below.  When they get to the step where they determine the size of the hard drive, they use dynamically-sized storage.  You’ll want to change it to fixed-size; around 20 gigabytes or more if you can handle it.  Our settings will be for Red Hat (the core Linux within CentOS).  We’ll get to installing an OS on the virtual machine shortly.

Installing an OS to the VM

Congratulations! You now have a computer running within a computer.  This is where things may become complicated if you’ve never installed an OS before.  It’s become much easier than in the past, and there’s a pretty good video posted on YouTube with an example walkthrough.  You’ll want to make sure your computer is connected to the internet before you start the install.  Virtualbox will automatically connect to the net, and the OS installation will grab some packages from the net if you tell it to.  Around 3:15 in the video, you’ll see an example of the package selection screen.  You’ll want to select the “Software Development Workstation” option.

Installing Additional Software: VirtualBox Guest Addons

Now that the OS is up and running, you want to be able to fully use all the fancy features and graphics packages of VirtualBox to maximize its performance and ease of use.  This is done through the VirtualBox Guest Addons.  It will allow you to “fullscreen” the VM, as well as some other nifty tricks.  Technically this is optional, but it really should be required.  A walkthrough is provided byVirtualBox, and can be found here:

http://www.virtualbox.org/manual/ch04.html

Simply follow the provided instructions for Guest Additions for Linux. Since we’ve installed CentOS, we’re using a variant of Red Hat Enterprise Linux (RHEL), so scroll to the specific instructions for CentOS, Red Hat Enterprise Linux and Oracle Enterprise Linux.  A video walkthrough is provided below:

Configuring Remote Display

This is where things get easy.  Since you’re running Linux in the VM, you’re basically already set up to push GUIs and windows back to your local machine from the remote one.  All it takes is the command:

ssh -Y <username>@<remote-system>

The -Y option for ssh allows the remote system to forward X11 data (GUIs, windows) back to your machine.  Pretty easy, right?  Log in to a machine and type xterm to push a remote terminal window back through your ssh connection to test it.

Discussion

First, I’d like to note that a virtual machine is useful for running multiple OS for various reasons, however you take a hit to performance due to the virtualization layer.  Most modern computers are multi-core, which allows your primary OS to offload the virtualization to one core, and run calculations with the other.  This alleviates some of the performance hits, but does not remove it completely.  Some newer Intel systems can use specialized hardware to improve performance and even use the 64-bit (x86_64) versions of OS.  Such optimizations are beyond the scope of this guide, but if there’s enough demand I’ll write a more in-depth version including it.

Installing Additional (Optional) Software: Netbeans

While the Software Development Package comes with an Integrated Development Environment (IDE) called Eclipse, my personal IDE-du-jour is Netbeans.  Eclipse is more versatile when it comes to languages, but if you’re a C/C++ developer, Netbeans is also pretty convenient.  It is a little simpler to configure for remote development, and is easier to switch between local and remote.  If you wish to familiarize yourself with Eclipse, a good resource can be found here and at the Eclipse website, here.

From inside your remote machine, head over to netbeans.org and download the install script for the “All” version.  Remember where you download it, as you’ll have to navigate to the script on the command line. Installation instructions can be found here, but remember to perform these commands as “su” using the sudo command:

sudo chmod +x <installer file name> sudo ./<installer file name> 

During the installation, agree to the licenses, and if you’d like feel free to install the Glassfish server and Apache Tomcat packages.  If you don’t know what they are, you probably don’t need them, unless you’re deep into open-source development and web environments for Java.

Once installed, you’ll be able to find the software in the Applications->Programming menu.

Installing Additional (Optional) Software: OpenMPI

You can do this one of two ways: through the Add Software GUI, just search for openmpi and it’ll come up in the list of packages.

or on the command line, by running:

sudo yum install openmpi openmpi-devel

C++ Training: Libraries

Libraries are a powerful way to combine codes from different sources and access functions that do cool stuff.  Since we often do most stuff on the unix cluster, I thought this link was a great way to learn:

http://www.yolinux.com/TUTORIALS/LibraryArchives-StaticAndDynamic.html

Feel free to add any tips to this post, or for the “general public” add remarks using comments!  Happy archiving 🙂

Using YAML in C++

YAML stands for “YAML Ain’t Markup Language”.  It is a “human friendly data serialization standard for all programming languages”.  What this means is that a human can read the files you write in YAML, and there are libraries and packages in almost every language that can also parse these files.  It’s a little bit more formal way to do parameter files and input files for models, since all the reading and error catching is provided by a library instead of by lines and lines of tedious code that you write.

I’m just playing around with this right now so I’ll share my notes here as I get it working.

The C++ libraries are available here.

  1. Follow the instructions on the website to download the zip file.
  2. The next instructions will either work on your Linux desktop or on the cluster.  They will probably work in Windows too, but I haven’t tried that yet.  I successfully ran the trial on my home computer running Ubuntu 11.10, but now I will replicate the process on the cluster.  Unzip the contents of the file on your computer of choice.
  3. Follow the website instructions to create the build directory in your folder and navigate to it.
  4. On the cluster, make sure to enable the program cmake by typing “module load cmake”.  Then, once you are in the build directory, you want to run cmake on the files in the outer directory, so type “cmake ..”
  5. When cmake runs successfully, it generates a custom Makefile just for you and your system.  To run the makefile, simply type “make”.  You should see some colorful commands that show that the program has compiled successfully.
  6. At the end, you’ll have a library called libyaml-cpp.a in your build directory.  Success!

Now we have a brand-new yaml-cpp library that contains all the functions you’ll need to parse yaml in your own program.  How do we test it out?  I’m glad you asked.

  1. Create a new folder that’s outside of the yaml-cpp folder.  You can call it “program1” or some other name.  Into that folder, copy libyaml-cpp.a from your yaml-cpp/build/ folder.  Also navigate into the /include/ folder inside yaml-cpp, and you’ll find another folder called yaml-cpp.  This folder contains the headers for all the functions inside the library.  In your project folder, you can either copy it over as /include/yaml-cpp, or just as /yaml-cpp.  In my project, I just copied it as yaml-cpp, in order to not have too many folders laying around.
  2. On the yaml-cpp site, try the monsters example at this page.  You’ll need a file called monsters.yaml, and the main cpp file, which I called test.cpp.  Here’s an important tip that it took me about a day (and help from the internet) to figure out: Only use spaces when indenting your blocks in the yaml file, not tabs!
  3. Now compile your program.  You can use a command like this: “g++ -Wall -I. -g test.cpp -lyaml-cpp -L. -o monsterstest” which tells the compiler to find your include paths in the working folder (referred to with a dot), and to name the executable “monsterstest”.
  4. Run the program using “monsterstest”  Did it work?  If so, great!

In a later post, I’ll give some example code that could be used to read objective titles, epsilons, constraints, model parameters, and so forth from a yaml file.  My idea is to have a master yaml file that contains all the parameters for a run.  The yaml can then be read by script programs that write input text files, java classes, or anything else you’d like.  The yaml will also be accessible to the C++ wrapper that interfaces with MOEAframework, and it can even be used directly by your simulation model.  This will give the user a lot of control, in a format that is flexible and fairly easy to use.  But more on that later!

Setting up Python and Eclipse

According to its website, Python is:

…an easy to learn, powerful programming language. It has efficient high-level data structures and a simple but effective approach to object-oriented programming. Python’s elegant syntax and dynamic typing, together with its interpreted nature, make it an ideal language for scripting and rapid application development in many areas on most platforms.

The Python interpreter and the extensive standard library are freely available in source or binary form for all major platforms from the Python Web site,http://www.python.org/, and may be freely distributed. The same site also contains distributions of and pointers to many free third party Python modules, programs and tools, and additional documentation.

This post covers how to set up Python and the Eclipse development environment.  We also provide a collection of posts on how to use Python for data analysis, starting here.

————————————————————————————————————————————————————–

PYTHON:

————————————————————————————————————————————————————–

The first step is to download Python and its various packages that will likely be useful to you at some point.

Python itself is available at: http://www.python.org/download/

I would recommend downloading and installing version 2.7.2, the latest production release under the 2.X series.  Also, stick with the 32-bit version as most all packages will be available for this version.  Avoid Python 3.X for now.  It is not as widely supported among the various Python packages that you might find useful and as such, should be avoided for now.  Keep in mind that there are some syntax differences as well between versions 2.X and 3.X that would need to be addressed whenever it does come time to update.

Just use the default settings during installation.

NOTE: If you have Cygwin installed on your system, it too has likely installed a version of Python.  Whenever you run Python from the command line, you should be careful to ensure that you are using the version that you expect (i.e., the default Cygwin installed Python versus the one that you installed).  Just be aware of this.  In general, it is easy to identify the version being picked up from the path name.  Also, it is generally best to use the version that you have installed.  It will usually be located in C:\Python27 whereas the Cygwin version will be located in C:\Cygwin\bin.

Now, install the various packages that may be useful. You should always be careful to install a version of the package that matches your version of Python (i.e., 2.7 if you are following my instructions).  Sometimes, if a package is not available for the version you are using (i) you may still be able to use it, or (ii) you may need to make minor tweaks to the package source to get things running. Also, always download the package installers, not the source.  Here are the common ones that you should definately install:

  • NumPy and SciPy available at http://numpy.scipy.org/.  These packages are useful for performing scientific computing within Python.  Download the “win32 superpacks” for each of these packages for the version of Python that you have installed.
  • PIL – the Python Imaging Library available at http://www.pythonware.com/products/pil/.  This package is useful to manipulating image files.
  • matplotlib – a 2D plotting library with Matlab-like syntax available at http://matplotlib.sourceforge.net/.  This package is very good for creating good publication quality figures.  If you starting using it, you will probably notice that the appearance of the figures, even on-screen, is much improved over what Matlab can produce.

The following are some optional packages based on your particular needs:

  • Py2exe – a package for bundling Python scripts into MS Windows executable programs available at http://www.py2exe.org/.  This is what I use to bundle all of the libraries and source code required by AeroVis into a self contained package that can be installed on any Windows system without the need to build or install Python, VTK, Qt, etc.
  • wxPython – GUI package for Python available at http://wxpython.org/.  Note, this is for developing graphical user interfaces (GUIs) for your Python scripts, it is not a GUI for Python.
  • PyQt – another GUI package for Python available at http://www.riverbankcomputing.co.uk/software/pyqt/intro.  PyQt is a set bindings for Nokia’s Qt application framework – a very rich and full featured graphical interface development framework.  AeroVis uses PyQt for its graphical interface.

————————————————————————————————————————————————————–

ECLIPSE:

————————————————————————————————————————————————————–

Now that you have Python and all of your needed packages installed, you can now move on to Eclipse. Eclipse is available from http://www.eclipse.org/downloads/packages/release/indigo/r.  The latest release (and probably the version you should be using) is Indigo.  Since we primarily use Visual Studio for C/C++ development, I would recommend downloading the IDE for Java as this will serve to provide you with a Java environment should you choose to explore this down the road.  I think you should be able to install either the 32-bit or 64-bit versions without issue.  Just make sure you are running a 64-bit OS if you choose to install that version.  When you go to download, Penn State actually has a mirror so choose this.  BTW, don’t choose the BitTorrent option – not a good idea on PSU networks.

Once you have downloaded the zip file containing Eclipse, you just unzip it wherever you want it to be installed.  This includes portable drives etc.  The beauty of Eclipse is that unlike many Windows programs, it is completely self contained and as such, can be run from any location.  Once unzipped, create a shortcut to the Eclipse executable and start it up.

————————————————————————————————————————————————————–

PYDEV:

————————————————————————————————————————————————————–

Now that Eclipse is installed, we can add a Python development environment inside Eclipse that will provide a very nice Python IDE with debugging capabilities, etc.

The install for packages inside Eclipse proceeds a little differently than what you may be used to.

The best option for installing PyDev is probably to install Aptana Studio which includes a variety of development tools.  Go to this site for instructions http://www.aptana.com/downloads/start or read on.

1) In the Eclipse Help menu, select Install New Software
2) Paste this URL into the Work With box: http://download.aptana.com/studio3/plugin/install
3) Check the box for Aptana Studio and click Next
4) Accept the license, etc., and restart Eclipse

Another option is to only install PyDev from within Eclipse, carefully follow the instructions available at: http://pydev.org/manual_101_install.html.  There’s no need for me to rehash all of these instructions here as they are quite good at the PyDev site.

Once PyDev is installed, you should be ready to go.

————————————————————————————————————————————————————–

Let me know if you run into any problems by leaving a comment.

————————————————————————————————————————————————————–

Up Next Time…

Developing and debugging Python scripts and projects in Eclipse

Programming Language Overview

This post details the programming languages commonly used in our group.  For each language, we have included installation instructions, suggested reading materials, and other notes.

C

Description Procedural
API Reference http://www.cplusplus.com/reference/clibrary/
Windows Installation MinGW (Blog post about installing this is here), Cygwin
Linux Installation sudo apt-get install gcc
Notes
  • Be cautious of string functions, many built-in C functions are unsafe
  • GNU GCC allows mixing Fortran and C/C++ object files (i.e., call Fortran method from C/C++)

C++

Description Object Oriented
Tutorial http://www.cplusplus.com/doc/tutorial/
API Reference http://www.cplusplus.com/reference/
Windows Installation MinGW (Blog post about installing this is here), Cygwin
Linux Installation sudo apt-get install g++
Notes
  • The Boost libraries contain many reusable components

Java

Description Object Oriented, Managed Memory
Tutorial http://docs.oracle.com/javase/tutorial/
API Reference http://docs.oracle.com/javase/6/docs/api/
Books Effective Java by Joshua Bloch
Windows Installation JDK 6
Linux Installation sudo apt-get install openjdk-6-jdk
Notes
  • Java is a verbose language, but the verbosity allows stronger type safety
  • Consider developing in Eclipse or NetBeans
  • Oracle recently released version 7, which is backwards-compatible with earlier versions

Python

Description Object Oriented, Functional, Managed Memory
Tutorial http://docs.python.org/tutorial/index.html
API Reference http://docs.python.org/
Books Dive into Python by Mark Pilgrim
Windows Installation Python 2.7.2
Linux Installation sudo apt-get install python
Notes
  • There are two major versions of Python, 2.7 and 3.2.  They share similarities, but are not compatible
  • Use easy_install to quickly install packages
  • Use matplotlib for Matlab-like plotting