Installing Matplotlib and Pandas
So it turns out my two-part series of blog posts is going to be three parts, at least. This one is about getting and installing Python, Matplotlib, and Pandas. Skip down to the bottom for the best news of all: we now get these for free on Penn State’s HPC systems!
Get Python from the official download site. Pick the 32-bit installer.
Make sure your environment variables are set up correctly, too. Find your Advanced System Settings and click on Environment Variables. Your PATH should include the directory where you installed Python, as well as the scripts subdirectory. On one of the machines I use, I put Python in d:\python27_32, so this is what I added to my PATH:
Pandas depends on NumPy, and NumPy only has a 32-bit Windows version. (I think you have to compile blas yourself if you want 64-bit support. That builds character, but it’s way outside the scope of this tutorial.) So even if you have a 64-bit machine, which you probably do because it’s 2013, and a 64-bit version of Windows, which you might not because it’s 2013 and 32-bit Windows XP is still installed on everything, you need all your Python things to be 32-bit. That includes your Python interpreter, so make sure you have the right one installed. You should see something like this when you type python at the command prompt:
C:\>python Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] on win 32 Type "help", "copyright", "credits" or "license" for more information. >>>
You don’t have to be using Python version 2.7.3, but my code examples are based on Python 2 and not Python 3, so make sure you’re on a relatively modern release of Python 2.
Download Matplotlib here. Remember to get the 32-bit version.
Go to the NumPy downloads page and get the latest win32 superpack. As of this writing, it’s numpy-1.6.2-win32-superpack-python2.7.exe.
Get the latest Win32 installer from the official download page. As of this writing, it’s pandas-0.10.1.win32-py2.7.exe.
If you have problems with this version, let me know. I’ve been using 0.10.0 and haven’t upgraded yet. Pandas is still below version 1.0, so Wes McKinney is under no obligation to keep things from breaking between versions.
Most major distributions have Python, Matplotlib, NumPy, and Pandas neatly packaged up for you. Packages are usually named python, python-matplotlib, python-numpy, and python-pandas, or something like that. You may need a bit of googling to find the right names for your distribution’s package repositories.
I ran into an issue on one of my laptops at home running a Debian derivative (Crunchbang) where the supplied NumPy was too old. If you have a stern constitution, get it from github and build your own.
On the Cluster
I was getting ready to write a whole mini-tutorial on getting source tarballs and building everything from scratch, because that’s what I had to do six months ago to get these packages on the cluster. However, I just discovered that some intelligent, farsighted individual has added them to the default Python 2.7 configuration. Here’s all you have to do to get going on the cluster:
module load python/2.7.3
Stick it in your .bashrc, and you’re ready to go!