Announcing version 1.0 of pareto.py

Jon and I are pleased to announce version 1.0 of pareto.py, the free, open-source ε-nondominated sorting utility. Find the ε-nondominated solutions in your data today! ε-nondominated sorting lets you specify the resolution of your objectives to obtain a set of meaningfully distinct efficient points in your data set. (Confused? See this post for more context.)

ε-nondominated sorting

ε-nondominated sorting: Red ε-boxes are dominated, yellow solutions are ε-nondominated.

  • available on GitHub
  • licensed under LGPL
  • pure Python implementation with minimal dependencies (only standard library, does not require numpy)
  • ε-nondominated sorting
  • “importable”: use pareto.pyfrom other Python scripts
  • “pipelineable”: include pareto.py in a Unix pipeline
  • tag-along variables: columns (even non-numeric ones) other than those to be sorted may be retained in the output
  • verbatim transcription of input: nondominated rows in the output appear exactly the same as they did in the input
  • flexible column specification: e.g. 0-5 6 10-7
  • mix minimization and maximization objectives
  • skip header rows, commented rows, and blank lines
  • annotate each row in the output with the file it came from, including stdin
  • add line numbers to annotations
  • index columns from the end of the row, allowing rows of varying lengths to be sorted together

(source code for the impatient)

(previous announcement: version 0.2)

Advertisements

Globus Connect for Transferring Files Between Clusters and Your Computer

I recently learned about a service called Globus Online that allows you to easily transfer files to the cluster.  It’s similar to WinSCP or SSH, but the transfers can happen in the background and get resumed if they are interrupted.  It is supported by the University of Colorado: https://www.rc.colorado.edu/filetransfer and the NSF XSEDE machines: https://www.xsede.org/globus-online. Also, courtesy of Jon Herman, a note about Blue Waters: There is an endpoint called ncsa#NearLine where you can push your data for long-term storage (to avoid scratch purges). However on NearLine there is a per-user quota of 5TB. So if you find yourself mysteriously unable to transfer any more files, you’ll know why.

To get started, first create a Globus Online account.  Then, you’ll need to create “endpoints” on your account.  The obvious endpoint is, say, the cluster.  The University of Colorado for example has instructions on how to add their cluster to your account.  Then, you need to make your own computers an endpoint!  To do this, click Manage Endpoints then click “Add Globus Connect.”  Give your computer a name, and then it will generate a unique key that you can then use on the desktop application for the service.  Download the program for Mac, Unix, or Windows.  The cool thing is you can do this on all your computers.  For example I have a computer called MacBookAir, using OSX, and another one called MyWindows8 or something like that, that uses Windows.

File transfers are then initiated as usual, only you’re using a web interface instead of a standalone program.

As usual feel free to comment in the comments below.

Running tmux on the cluster

tmux is a terminal multiplexer — a program that lets you do more than one thing at a time in a terminal window. For example, tmux lets you switch between running an editor to modify a script and running the script itself without exiting the editor. As an added bonus, should you lose your ssh connection, your programs are still running inside tmux and you can bring them up again when you re-connect. If you’ve ever used GNU Screen, it’s the same idea.

Here’s what it looks like in the MinTTY terminal that comes with Cygwin.   In this example I’ve split the window across the middle, with an editor open in the bottom and a command prompt at the top.  (You can do vertical splits too.)

tmux terminal multiplexer showing a split window

tmux terminal multiplexer showing a split window

Building tmux

I recently built tmux again. It’s pretty easy to do, and it takes up less than 7 megabytes of disk space. Here’s how:

Make a directory for your own installs. Mine is called ~/local

mkdir ~/local

You’ll probably have to build libevent 2.0 first if, like me, it’s not already on your system.

Make a directory for building things, if you haven’t already got one.

mkdir build
cd build

Download libevent. Be sure to replace 2.0.21 with the latest stable version.
wget https://github.com/downloads/libevent/libevent/libevent-2.0.21-stable.tar.gz

Extract libevent

tar xzvf libevent-2.0.21-stable.tar.gz
cd libevent-2.0.21-stable

Build libevent. I used an absolute path with the prefix to be on the safe side.

./configure --prefix=/path/to/my/home/directory/local
make
make install

Download and extract tmux. Get the latest version, which is 1.8 at the time of this writing.

wget http://downloads.sourceforge.net/tmux/tmux-1.8.tar.gz
tar xzvf tmux-1.8.tar.gz
cd tmux-1.8

If you rolled your own libevent, you’ll need to set appropriate CFLAGS and LDFLAGS when you run configure. Otherwise you can skip export CFLAGS and export LDFLAGS.

export CFLAGS=-I/path/to/my/home/directory/local/include
export LDFLAGS="-Wl,-rpath=/path/to/my/home/directory/local/lib -L/path/to/my/home/directory/local/lib"
./configure --prefix=/path/to/my/home/directory/local
make
make install

Then add tmux to your path, and you’re done. Do this by putting the following lines in your .bashrc:

export MYPACKAGES=/path/to/my/home/directory/local/
export PATH=${MYPACKAGES}/bin:${PATH}