Python Data Analysis Part 1b: Setting up Matplotlib and Pandas

Installing Matplotlib and Pandas

So it turns out my two-part series of blog posts is going to be three parts, at least. This one is about getting and installing Python, Matplotlib, and Pandas. Skip down to the bottom for the best news of all: we now get these for free on Penn State’s HPC systems!

Windows

Python

Get Python from the official download site. Pick the 32-bit installer.

Make sure your environment variables are set up correctly, too. Find your Advanced System Settings and click on Environment Variables. Your PATH should include the directory where you installed Python, as well as the scripts subdirectory. On one of the machines I use, I put Python in d:\python27_32, so this is what I added to my PATH:

d:\python27_32;d:\python27_32\scripts

Why 32-bit?

Pandas depends on NumPy, and NumPy only has a 32-bit Windows version. (I think you have to compile blas yourself if you want 64-bit support. That builds character, but it’s way outside the scope of this tutorial.) So even if you have a 64-bit machine, which you probably do because it’s 2013, and a 64-bit version of Windows, which you might not because it’s 2013 and 32-bit Windows XP is still installed on everything, you need all your Python things to be 32-bit. That includes your Python interpreter, so make sure you have the right one installed. You should see something like this when you type python at the command prompt:

C:\>python
Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>>

You don’t have to be using Python version 2.7.3, but my code examples are based on Python 2 and not Python 3, so make sure you’re on a relatively modern release of Python 2.

Matplotlib

Download Matplotlib here. Remember to get the 32-bit version.

NumPy

Go to the NumPy downloads page and get the latest win32 superpack. As of this writing, it’s numpy-1.6.2-win32-superpack-python2.7.exe.

Pandas

Get the latest Win32 installer from the official download page. As of this writing, it’s pandas-0.10.1.win32-py2.7.exe.

If you have problems with this version, let me know. I’ve been using 0.10.0 and haven’t upgraded yet. Pandas is still below version 1.0, so Wes McKinney is under no obligation to keep things from breaking between versions.

Linux

Most major distributions have Python, Matplotlib, NumPy, and Pandas neatly packaged up for you.  Packages are usually named python, python-matplotlib, python-numpy, and python-pandas, or something like that. You may need a bit of googling to find the right names for your distribution’s package repositories.

I ran into an issue on one of my laptops at home running a Debian derivative (Crunchbang) where the supplied NumPy was too old. If you have a stern constitution, get it from github and build your own.

On the Cluster

I was getting ready to write a whole mini-tutorial on getting source tarballs and building everything from scratch, because that’s what I had to do six months ago to get these packages on the cluster. However, I just discovered that some intelligent, farsighted individual has added them to the default Python 2.7 configuration. Here’s all you have to do to get going on the cluster:

module load python/2.7.3

Stick it in your .bashrc, and you’re ready to go!

Advertisements

10 thoughts on “Python Data Analysis Part 1b: Setting up Matplotlib and Pandas

  1. How about installing this software on Cygwin? Will the Windows version work there or do we need to do something special? (Perhaps I’m pre-empting Jon who may have the same question!)

    • I’ve had all kinds of problems with Python (+ libraries) in Cygwin. It gets confused between the Windows installation and its own installation, and it’s also very difficult to install libraries in Cygwin. If you’re on Windows, I think it’s better to stick with the Windows Python. But I’m planning to do this either on Hammer or locally on Ubuntu.

      • Thanks that’s good to know. I guess doing something remotely using ssh on Cygwin is basically the same anyway, it’s just that the computations aren’t being done locally.

  2. Python under cygwin is a little rough. The Python available in the cygwin repos is a couple of versions old — 2.5 or 2.6 if I recall correctly. You can use Windows Python 2.7 under cygwin, but you can’t use the interactive shell, and some installed scripts don’t work right.

    I do use Windows Python under cygwin at work where I have to use Windows. Just make sure that the Python you want to use is first on your PATH. And keep a Windows CMD window open for when you need to use the interactive shell.

  3. Pingback: Python Data Analysis Part 1a: Borg Runtime Metrics Plots (Preparing the Data) | Pat Reed Group Research Tips Blog

  4. Pingback: Python Data Analysis Part 1c: Borg Runtime Metrics Plots | Pat Reed Group Research Tips Blog

  5. Pingback: Python Data Analysis Part 2: Pandas / Matplotlib Live Demo | Pat Reed Group Research Tips Blog

  6. Pingback: Water Programming Blog Guide (Part I) – Water Programming: A Collaborative Research Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s