Tips for Creating Animated Graphics (GIFs)

Tips for Creating Animated Graphics (GIFs)

Creating a silent movie to tell a narrative is a powerful yet challenging endeavor. Perhaps with inspirations to create an eye-catching graphic for your social media followers or to better visualize intertemporal and interspatial changes, one can utilize GIFs to best meet their goals. However, with great (computational functionality and) power comes great responsibility. Creating a clean, minimalistic, and standalone graphic that quickly conveys information is key, so knowing what tips to follow along with a few linked tutorials will be a good starting point for creating your own effective animated visual.

I recently utilized a GIF in a presentation to show changes in perennial crop coverage in Kern County, CA alongside changes in drought intensity. While the GIF (shown at the end) wasn’t perfect, general tips for animation creation along with a quick critique to point out specific issues in the generated animation can help you when you plan out the creation of your own animation.

Considerations and Tips for Image Quality Control

Making an effective static graphic that easily conveys information without being overwhelming is tough, but creating a GIF can lead to disaster even faster than the blink of an eye. The most straightforward way to create your animation is to generate a incremental series of individual images you will be combining to create a short movie.

The main considerations for creating an effective movie visual is ensuring it is readable, engaging, and informative. The first question you should ask is whether the story you want to tell can be fully and effectively explained with a static image. If it can be, stop and make it before wasting your time on creating an animation.

Assuming you’ve passed the first step, it’s time to storyboard your visual. Draw out what you’re wanting to show so you have a template to follow. (If you find that what you’re wanting to animate can be effectively told with a static image, please revisit the previous paragraph.)

Much like a paper, there should be an introduction or orienting feature that allows the consumer to easily understand the thesis idea of the graphic at a glance in a standalone setting. With this key point in mind, driving the central theme of your graphic (likely) allows for the visual to appear randomly in public without fear of it being without a caption to fully tell a story.

Here is an example of a well-made, informative, and engaging GIF that shows how to make effective tables:ZY8dKpA


It is clear and to the point, stepping the audience through a range of steps in a clear and entertaining manner. It is well labeled, showing the author’s website, and is very concise in its messaging. Furthermore, it has been very memorable for me after initially seeing it on Twitter years ago.

This website has plenty of examples for effective animations that you could use for inspiration. Additionally, for those Pinterest users out there, DataGIFs has plenty of good examples. On Reddit, plenty of beautiful (and ugly) animated and static graphics alike appear on /r/dataisbeautiful.  If you have a narrative to tell, there are likely similar animations that exist that you can use as a base to convey your tale.

With the big picture stuff out of the way, the next few points to consider can make or break your animation. While not every tip applies to every situation, covering them helps ensure your visual doesn’t have any glaring issues.

  • Looping and Timing
    • Make sure the timing of the gif isn’t too long or short.
      • 20 seconds is generally the maximum recommended length for audience engagement
      • Make sure there is enough time on each component of the GIF to be understood within two to three loops
    • Ensure the individual frames can be read in a timely fashion
    • Don’t have a jarring transition from the end of the GIF back to the start
      • Make sure the first and last frames match up in style
      • You can pull the user away from the original frame, but bring them back in since they’ll likely repeat the loop two to three times
      • Consider starting with a base slide
      • Individually add features so individuals are not overwhelmed
      • Much like a slide presentation, if the reader is overwhelmed with too much information from the start, they are likely not going to understand the progression of the story being conveyed
    • Slow is boring and may cause people to prematurely end their engagement with the animation
  • Content Layout
    • Consider the story you want to portray
    • Much like any figure, ensuring the animation can stand alone and be posted online while out of context without killing its meaning is essential
    • Select a “master frame” that is descriptive enough to be the “title slide” of the GIF—this is what will be seen when your slide deck is printed
  • Image Manipulation
  • Consistent sizing and coloring
    • Ensure labels are readable and are in the same location through all of the images. Having labels twitching between locations is distracting
    • Framing of the edges of objects should not drift. If making a transition between different sizing (e.g. zooming out), ensure it is incremental and not shocking
    • Customize components of a graphic to avoid bland experiences
  • Effective colors
    • Unless you’re artistically talents (i.e. not me), stick to generated color palates. ColorBrewer is a good resource
    • Consider what your image might look like when printed out—try to make sure it converts well to grayscale. Note that generally the first image of a GIF’s sequence is what will be printed
    • Watch out for colorblind individuals. They are people too and are worth catering to (use this website to ensure your images are compatible)
    • The fewer, the better: selection of individual colors to make certain messages “pop” is worthwhile
  • Fonts and Labels
    • Give credit to yourself within the graphic
    • Make sure that all words can be read at a glance
    • If you plan to post the gif online, make sure individuals can get a decent idea of what’s going on before they blow it up
    • Assume your image will be posted on twitter: consider making a dummy (private) account to make sure everything important is readable from the preview along with the actual gif being readable from a regular smartphone
    • If distributing online, ensure the animation can be linked to a central website for citations/narratives/methods
  • Resolution and Size
    • Bigger is rarely better. Try to limit your size to around 1 MB max
    • Size your GIF based on your platform. The size is not nearly as important if you’re using it in a presentation you keep on a jump drive compared to if you’re going to upload it online.
    • Test your visual on multiple platforms (i.e. PowerPoint on another computer after moving the .ppt file) prior to use.
    • If posting something that will be shared on Twitter, ensure the narrative of the GIF can be transferred when consumed on a mobile device. (~80% of all Twitter interactions come from a mobile devices)
  • Creating the GIF
    • Make sure your images aren’t too high of a resolution going into GIMP. It turns out that bigger isn’t always better
    • I have run into issues with Python-based libraries creating GIFs with artifacts.
      • imageio is commonly considered to be the best option when creating GIFs using a Python due to its compatibility with PIL/PILLOW (source)
    • I am a firm believer in spending a bit of extra time to follow this tutorial using GIMP to create your GIF
      • Photoshop is the standard (tutorial), but it is not open source
    • This post does a good job on showing how to make animated GIFs in R 

Crop-Coverage Animation Critique

As mentioned above, I created a GIF to show changes in perennial crop coverage in Kern County, CA alongside changes in drought intensity. Each frame of the map was created in ArcGIS (which will eventually be migrated over to Python using Julie’s tutorial) and exported as a PNG while the graph along the bottom is a screenshot from the U.S. Drought Monitor. These individual images were combined, a black rectangle overlaying the graph was added, and then saved as a single image for each given year. I then used GIMP to combine, order, and create a GIF (tutorial link).

2000_to_2017_export2.gifThe Good:

  1. The timing of the picture sequence (~650 ms between frames) makes for easy reading
  2. The box highlighting the given year’s drought conditions along the bottom graphic is effective at pointing out current conditions
  3. Coupling the moving box with a clear title indicating the current year shows a natural progression through time
  4. It is relatively straightforward to see where in California Kern County is located
    • The map includes a scale and north arrow to assist with orientation
    • The pop-out box for California isn’t very intrusive
  5. The title is large and easy to read, with the given year being clearly indicated
  6. The data for the drought monitor is labeled as required by the source

The Bad:

  1. The title shifts a few pixels in a couple of frames
  2. The graph does not stretch to match the size of the entire GIF, making the graphic feel unbalanced
  3. There is too much white space in random locations
    • The map itself could have been zoomed in a bit closer to leave out a lot of the white space in the eastern part of the county
    • Having too much white space on the right side of the map makes the graphic feel even further imbalanced
  4. There is not a website on the GIF to indicate the author/creator (me)

The Ugly:

  1. The choice in color schemes on the map jump out in a bad way
    • The districts drawn on the background map cannot be easily differentiated
    • Using red as a plotted color can be a bit jarring and is not colorblind friendly
  2. The labels in the legend hop around and are not consistent for three (3) of the years
    • The “master” or initial blank slide only has three categories while the rest of the slides have four
  3. The labeling for the graph is not clear (title, axes labels)
  4. The axis labels are blurry and small

Code showing how to make the drought monitor graphic graphic (with updates) is posted at this GitHub link. Note that the images were compiled into a GIF using GIMP.  The revised graphic is shown below.


High definition images can be found on imgur here.

Multivariate Distances: Mahalanobis vs. Euclidean

Some supervised and unsupervised learning algorithms, such as k-nearest neighbors and k-means clustering, depend on distance calculations. In this post, I will discuss why the Mahalanobis distance is almost always better to use than the Euclidean distance for the multivariate case. There is overlap between the ideas here and David’s post on multicollinearity, so give that one a read too!

Why you should care about multivariate distance

Synthetic time series generation methods are of interest to many of us in systems optimization and the topic been covered extensively on this blog. For an overview on that subject, check out Jon’s blog post on synthetic streamflow. It’s a great read and will give you many more resources to explore.

Among synthetic time series generation methods the k-nearest neighbors (k-NN) bootstrap resampling algorithm, developed by Lall and Sharma (1996), is a popular method for generating synthetic time series data of streamflow and weather variables. The k-NN algorithm resamples the observed data in a way that attempts to preserve the statistics of that data (e.g., mean and standard deviation at each timestep, lag-1 autocorrelation, etc.) but creates new and interesting synthetic records for the user to explore. As the name implies, this algorithm relies on finding k (generally set to be ⌊√(N)⌉, where N is the number of years of observed data) “nearest neighbors” to do its magic.

Determining the nearest neighbors

Finding those neighbors is straightforward in the univariate case (when there is only a single variable you want to simulate)—you just calculate the Euclidean distance. The shorter the distance, the “nearer” the neighbor. Well, it gets a bit more complicated in the multivariate case. There, you’ve got different units involved and correlation among variables which throws a wrench in the whole Euclidean distance thing. So, in most cases the Mahalanobis distance is preferred. Let me explain…

Example: how multivariate distance can help buy a car

Say we want to buy a four-wheel drive (4wd) car that will get us up into the mountains. We’ve got our eye set on a dream car, a 4wd Jeep, but we know we should shop around. So, let’s look at other 4wd cars on the market and compare their highway gas mileage and displacement (the total volume of all the cylinders in your engine) to find other cars we might be interested in. In other words, we are looking for the dream car’s nearest neighbors, with respect to those two measures.


Figure 1. Comparing highway gas mileage with displacement for our dream car and the others available.

Euclidean Distance

By glancing at the plot above, the distance calculation might appear trivial. In fact, you can probably roughly rank which points lie closest to the dream car just by eyeballing it. But when you try to do the calculation for Euclidean distance (equation 1), it will be skewed based on the units for gas mileage and displacement.

d(\overrightarrow{x},\overrightarrow{y})=\sqrt{(\overrightarrow{x}-\overrightarrow{y})^T(\overrightarrow{x}-\overrightarrow{y})}                                          (1)

Where: \overrightarrow{x} represents the attributes of our car and \overrightarrow{y} represents the attributes of another car.

For example, what if instead of miles per gallon, gas mileage was reported in feet per gallon? By changing those units, gas mileage would have multiple orders of magnitude more weight in the distance calculation than displacement. In that case, gas mileage would basically be the only thing that matters, which isn’t fair to poor old displacement. Therefore, when using the Euclidean distance to compare multiple variables we need to standardize the data which eliminates units and weights both measures equally. To do so, we can calculate the z-score (equation 2) for each observation:

z = \frac{x - \mu}{\sigma}                                      (2)

Where: z is the z-score (standardized variable), x is an observation, \mu and \sigma are the mean and standard deviation of the observation variable, respectively.

Visually, this is just like looking at our plot from before with no units at all.


Figure 2. Scale removed from Figure 1 to show that we need to remove the influence of units on the Euclidean distance calculation.

Now we can calculate the Euclidean distance and find the nearest neighbors!


Figure 3. The Euclidean distance and rank assigned to each car, where rank 0 is our “dream car”. If we were interested in a k-nearest neighbor algorithm with k=4, the points in the orange box would be selected as the neighbors.

Take note of the k-nearest neighbors in the orange box. Let’s see whether or not we get the same neighbors with the Mahalanobis distance.

Mahalanobis Distance

The Mahalanobis distance calculation (equation 3) differs only slightly from Euclidean distance (equation 1).

d(\overrightarrow{x},\overrightarrow{y})=\sqrt{(\overrightarrow{x}-\overrightarrow{y})^TS^{-1}(\overrightarrow{x}-\overrightarrow{y})}                                          (3)

Where: \overrightarrow{x} represents the attributes of our car, \overrightarrow{y} represents the attributes of another car, and S^{-1} is the covariance matrix of \overrightarrow{x} and \overrightarrow{y}

Unlike the Euclidean distance though, the Mahalanobis distance accounts for how correlated the variables are to one another. For example, you might have noticed that gas mileage and displacement are highly correlated. Because of this, there is a lot of redundant information in that Euclidean distance calculation. By considering the covariance between the points in the distance calculation, we remove that redundancy.


Figure 4. The Mahalanobis distance and rank assigned to each car, where rank 0 is our “dream car”.

And look! By comparing the ranks in the orange boxes in Figures 3 and 4, we can see that  although the ranks are similar between the two distance metrics, they do in fact yield different nearest neighbors. So which points get more weight when using the Mahalnobis distance vs. using the Euclidean distance?

To answer that question, I’ve standardized the distance calculations so we can compare them to one another and plotted each on a 1-to-1 line. If the distance metrics were exactly the same, all the points would end up on that line and they would each have a Mahalanobis to Euclidean ratio of 0. However, we see that certain points get more weight (i.e., a larger distance calculated) depending on the distance metric used.


Figure 5. Mahalanobis to Euclidean distances plotted for each car in the dataset. The points are colored based on the Mahalnobis to Euclidean ratio, where zero means that the distance metrics have equal weight. Purple means the Mahalanobis distance has greater weight than Euclidean and orange means the opposite.

Let’s map the Mahalanonbis to Euclidean ratio onto our gas mileage v. displacement plot.


Figure 6. The gas mileage vs. displacement of the cars as color-coded by the Mahalanobis to Euclidean ratio from Figure 5.

Notice that many of the points at the top left and bottom right part of the screen are orange, meaning that the Euclidean distance calculation would give them more weight. And then there’s that point at the bottom center of plot. That one gets far more weight when using Mahalanobis distance. To understand this let’s look at the axes of greatest variability in the data, these are also known as principle components. For a primer on that subject, check out David’s post and Ronhini’s post on principle component analysis!


When using Mahalanobis, the ellipse shown on the plot is squeezed towards circle. Along the first principle component axis, there is a lot of work to get it there! The points in the top right and bottom right corners move quite a bit to get towards a nice neat circle. Along the second principle component axis, there is not much squishing to do. The difference between these distance calculations are due to this “squishification” (a term used by the great 3blue1brown so it must be real). The Mahalnobis distance can be thought of calculating the Euclidean distance after performing this “squishification”. In fact, when the variables are completely uncorrelated, no squishing can happen, thus these two calculations are identical (i.e., S^{-1}=1).

Why you should use Mahalanobis distance (in general)

Which one should I use and when? When in doubt, Mahalanobis it out. When using the Mahalanobis distance, we don’t have to standardize the data like we did for the Euclidean distance. The covariance matrix calculation takes care of this. Also, it removes redundant information from correlated variables. Even if your variables aren’t very correlated it can’t hurt to use Mahalanobis distance, it will just be quite similar to the results you’ll get from Euclidean. You’ll notice that most recent k-NN resampling literature uses the Mahalanobis distance: Yates et al. (2003) and Sharif and Burn (2007).

One issue with the Mahalanobis distance is that it depends on taking the inverse of the covariance matrix. If this matrix is not invertible, no need to fear, you can calculate the pseudo-inverse instead to calculate the Mahalanobis distance (thanks to Philip Trettner for pointing that out!).

Code Availability

For anyone interested in the code used to create the figures in this post, I’ve created a GitHub gist.


Lall, Upmanu, and Ashish Sharma. “A Nearest Neighbor Bootstrap For Resampling Hydrologic Time Series.” Water Resources Research 32, no. 3 (March 1, 1996): 679–93.
Sharif, Mohammed, and Donald Burn. “Improved K -Nearest Neighbor Weather Generating Model.” Journal of Hydrologic Engineering 12, no. 1 (January 1, 2007): 42–51.
Yates, David, Subhrendu Gangopadhyay, Balaji Rajagopalan, and Kenneth Strzepek. “A Technique for Generating Regional Climate Scenarios Using a Nearest-Neighbor Algorithm.” Water Resources Research 39, no. 7 (2003).

Update on setting up the Borg Matlab Wrapper on Windows and tips for its use

It has been a long time since the last post about the use of Borg in Matlab, so this post pretends to be an update of Compiling the Borg Matlab Wrapper (Windows), and in an easier way.

The following steps were tested in Windows 10 and MATLAB R2017a, but should be valid for different versions too.

Step 0: Get Borg source code from Bitbucket repository

Check step 2 in the post Basic Borg MOEA use for the truly newbies (Part 1/2) (also check step 1 for a simplified explanation about Borg).

You will need to have the following files in the same directory:

  • borg.c
  • borg.h
  • mt19937ar.c
  • mt19937ar.h
  • nativeborg.cpp
  • borg.m
  • DTLZ2.m (test problem code)

You can find the first four files in the main directory you downloaded from Bitbucket, and to get the last three you have to navigate to the “plugins\Matlab” directory.

Step 1: Check compiler in Matlab

At Matlab command line run:

mex -setup

to check if there is a compiler already installed. If there is not, install the one that Matlab suggests (depends on the version).

In 2017a and previous versions there could be a bug when trying to download the suggested compiler (e.g. MIN GW in 2017a). Follow Matlab instructions. They will take you through some steps, which basically are: downloading a folder and pasting it in the installation directory of Matlab, replacing an existing one.

Alternatively (outside Matlab), you could download and install Visual Studio (Community version for free). Once installed, Matlab should recognize it immediately (check with mex -setup command anyway).

Step 2:  Compile Borg in Matlab

Make sure you are in the directory with the required files from Step 0 and run (again, at Matlab command line):

mex nativeborg.cpp borg.c mt19937ar.c

The file nativeborg.mexw64 should be created (the extension should vary according to your system). Once you do that you are done, and you are ready to run the test problem.

Step 3: Run a test problem (DTLZ2)

Inside Matlab, make sure you are in the directory that contains:

  • borg.m
  • nativeborg.mexw64
  • DTLZ2.m

The file DTLZ2.m is an example of how an objective function should be formatted for use with the Borg Matlab Wrapper. It should take a vector of decision variables as input, and return a vector of objectives (and optionally, constraints) as output.

From the command line, you can optimize the DTLZ2 function as follows:

[vars, objs, runtime] = borg(11, 2, 0, @DTLZ2, 100000, 0.01*ones(1,2), zeros(1,11), ones(1,11));

which returns the decision variables and objectives of the Pareto optimal solutions (i.e. the Pareto-approximate set), and runtime statistics. The first three inputs to the borg function are: the number of decision variables, number of objectives, and number of constraints. After that comes the handle for the objective function, the number of function evaluations to run in the optimization, the epsilon precision values for the objectives, and finally the lower and upper bounds of the decision variables. For a more detailed description of inputs and outputs you can run:

help borg

Finally, to see the resulting Pareto frontier run:

scatter(objs(:,1), objs(:,2))

it should be a very good approximation of the first quadrant of the unit circle.

Now you just need to replace DTLZ2 with an objective function or model of your choice, and you’ll be ready to go!

Note: to successfully use the Borg Matlab Wrapper in the future make sure to have borg.m and nativeborg.mexw64 in the same directory you are working on, or in a directory whose path is saved in Matlab (you can manage your paths with addpath and savepath functions).

Some tips:

Tip 1:

When calling borg(…), one of the most important inputs is the objective function (the “@DTLZ2” function handle in the test problem) since it contains the structure of the problem you are solving (i.e. the simulation model of your problem).

As mentioned before, this function should take the decision variables as input and return the value of the objectives as output. However, in some cases your function could need to take as inputs also other information in addition to the decision variables, for example:

% objective function example 
obj_values = simModel(decisions, params)

where params could be some data read from a text file or you had calculated before (e.g. streamflow scenarios) and remain unchanged during the optimization procedure. If that is your case, you can first define params and then define a function handle where params is fixed:

params = whatever_your_function_needs;

% handle of simModel function that has the variable “params” fixed, so its input is just “decisions”
simModelFixed = @(decisions) simModel(decisions, params);

% call Borg
[vars, objs, runtime] = borg(..., simModelFixed, ...);

Maybe you could be thinking on reading or generating your additional parameters inside your simulation model function, but doing that, the process will be repeated in every function evaluation when running Borg, which could be highly inefficient.

Tip 2:

Be careful with NaN and Inf values in your objective function when using Borg as they will cause Matlab to crash and close with no error message. These values could arise accidentally if you forgot to condition some particular cases in your calculations (e.g. division by zero).

Tip 3:

When you run help borg you can observe that an optional input to the function is parameters. This input lets you modify internal parameters of the algorithm. For details on how to do that you can refer to the post Setting Borg parameters from the Matlab wrapper, but make sure to understand how Borg works before thinking of tuning it.

To learn about Borg features and full details please refer to “Hadka, D., and Reed, P.M. “Borg: An Auto-Adaptive Many-Objective Evolutionary Computing Framework.” Evolutionary Computation, 21(2):231-259, 2013.” and references therein.

Guide to Your First Year in the Reed Research Group

I’m finishing up my first year as a MS/PhD student in Reed Research Group and I would like to use this blog post to formally list resources within the blog that I found especially useful and relevant to my  first year of training. We are also at the point where many of the senior students in the group are moving on to new positions, so I would also like to use this blog post to consolidate tips and tricks that I learned from them that will hopefully be helpful to future students as well.

Blog Posts

There are 315 blog posts on this Water Programming Blog. Chances are, if you have a question, it has already been answered in one of these posts. However, when I first joined the group, it was sometimes hard for me to know what I was even supposed to be searching for. Here are some blog posts that I found particularly useful when I started out or ones that I continue to regularly refer to.

Getting Oriented with the Cube

What even is a cluster? I had no idea when I first arrived but this post brought me up to speed.

Understanding the Terminal

  1. Using MobaXterm as a terminal is incredibly intuitive, especially for someone like me who had rarely touched a terminal in undergrad. MobaXterm allows you to drag and drop files from your computer directly into your directory on the Cube. Furthermore, with the MobaXterm graphical SFTP browser you can navigate through your directories similarly to a Windows environment. I found that it was easier to use other terminal environments like Cygwin after I had gotten used to the terminal through MobaXterm. See Dave’s post here.
  2. Once you are oriented with how the terminal works, the best thing to do is practice navigating using Linux commands. Linux commands can also be very helpful for file manipulation and processing. When I first started training, I was much more comfortable opening text files, for example, in Excel, and making the necessary changes. However, very quickly, I was confronted with manipulating hundreds of text files or set files at a time, which forced me to learn Linux commands. After I learned how to properly used these commands, I wished I had started using them a long time ago. You will work much more efficiently if start practicing the Linux commands listed in Bernardo’s blog post.

Using Borg and the MOEA Framework

Most of my second semester was spent reproducing Julie Quinn’s Lake Problem paper, which is when I first started to understand how to use Borg. It took me entirely too long to realize that the commands in Jazmin’s tutorials here and here are completely generalizable for any application requiring the MOEA framework or Borg. Since these tutorials are done so early in training, it is very easy to forget that they may be useful later and applied to problems other than DTLZ. I found myself referring back many times to these posts to remember the commands needed to generate a reference set from multiple seeds and how to execute Borg using the correct flags.

Using GitHub, Bitbucket, and Git commands

I had heard GitHub tossed around by CS majors in undergrad but it never occurred to me that I would be using it one day. Now, I have realized what a great tool it is for code version control. If used correctly, it makes sharing code with collaborators so much more clean and organized. However, before you can “clone” the contents of anyone’s repository to your own computer, you need an SSH key, which was not obvious to me as newbie to both Github and Bitbucket. You also need a different SSH key for every computer that you use. To generate an SSH key, refer to 2) of this post. Then you can add the generated keys in your profile settings on your Github and Bitbucket accounts.

Once you have keys, you can start cloning directories and pushing changes from your local version to the repository that you cloned from using Git commands outlined in this blog post.

Pro Tips

A consolidation of notes that I wrote down from interactions with senior students in the group that have proven to be useful:

  1.  If you can’t get your set files to merge, make sure there is a # sign at the end of each set file.
  2. If a file is too big to view, use the head or tail command to see the first few lines or last lines of a file to get an idea of what the contents of the file look like.
  3. Every time you submit a job, a file with the name of the job script and job number will appear in your directory. If your code crashes and you aren’t sure where to start, this file is a good place to see what might be going on. I was using Borg and couldn’t figure out why it was crashing after just 10 minutes of running because no errors were being returned. When I looked at this file, hundreds of outputs had been printed that I had forgotten to comment out. This had overloaded the system and caused it to crash.
  4. If you want to compile a file or series of files, use the command make. If you have multiple make files in one folder, then you’ll need to use the command make -f . If you get odd errors when using the make command, try make clean first and then recompile.
  5. Most useful Cube commands:qsub to submit a job

    qdel job number if you want to delete a job on the cube

    qsub -I to start an interactive node. If you start an interactive node, you have one node all to yourself. If you want to run something that might take a while but not necessarily warrant submitting a job, then use an interactive node (don’t run anything large on the command line). However, be aware that you won’t be able to use your terminal until your job is done. If you exit out of your terminal, then you will be kicked out of your interactive node.

In retrospect, I see just how much I have learned in just one year of being in the research group. When you start, it can seem like a daunting task. However, it is important to realize that all of the other students in the group were in your position at one point. By making use of all the resources available to you and with time and a lot of practice, you’ll get the hang of it!

Fitting Hidden Markov Models Part II: Sample Python Script

This is the second part of a two-part blog series on fitting hidden Markov models (HMMs). In Part I, I explained what HMMs are, why we might want to use them to model hydro-climatological data, and the methods traditionally used to fit them. Here I will show how to apply these methods using the Python package hmmlearn using annual streamflows in the Colorado River basin at the Colorado/Utah state line (USGS gage 09163500). First, note that to use hmmlearn on a Windows machine, I had to install it on Cygwin as a Python 2.7 library.

For this example, we will assume the state each year is either wet or dry, and the distribution of annual streamflows under each state is modeled by a Gaussian distribution. More states can be considered, as well as other distributions, but we will use a two-state, Gaussian HMM here for simplicity. Since streamflow is strictly positive, it might make sense to first log-transform the annual flows at the state line so that the Gaussian models won’t generate negative streamflows, so that’s what we do here.

After installing hmmlearn, the first step is to load the Gaussian hidden Markov model class with from hmmlearn.hmm import GaussianHMM. The fit function of this class requires as inputs the number of states (n_components, here 2 for wet and dry), the number of iterations to run of the Baum-Welch algorithm described in Part I (n_iter; I chose 1000), and the time series to which the model is fit (here a column vector, Q, of the annual or log-transformed annual flows). You can also set initial parameter estimates before fitting the model and only state those which need to be initialized with the init_params argument. This is a string of characters where ‘s’ stands for startprob (the probability of being in each state at the start), ‘t’ for transmat (the probability transition matrix), ‘m’ for means (mean vector) and ‘c’ for covars (covariance matrix). As discussed in Part I it is good to test several different initial parameter estimates to prevent convergence to a local optimum. For simplicity, here I simply use default estimates, but this tutorial shows how to pass your own. I call the model I fit on line 5 model.

Among other attributes and methods, model will have associated with it the means (means_) and covariances (covars_) of the Gaussian distributions fit to each state, the state probability transition matrix (transmat_), the log-likelihood function of the model (score) and methods for simulating from the HMM (sample) and predicting the states of observed values with the Viterbi algorithm described in Part I (predict). The score attribute could be used to compare the performance of models fit with different initial parameter estimates.

It is important to note that which state (wet or dry) is assigned a 0 and which state is assigned a 1 is arbitrary and different assignments may be made with different runs of the algorithm. To avoid confusion, I choose to reorganize the vectors of means and variances and the transition probability matrix so that state 0 is always the dry state, and state 1 is always the wet state. This is done on lines 22-26 if the mean of state 0 is greater than the mean of state 1.

from hmmlearn.hmm import GaussianHMM

def fitHMM(Q, nSamples):
    # fit Gaussian HMM to Q
    model = GaussianHMM(n_components=2, n_iter=1000).fit(np.reshape(Q,[len(Q),1]))
    # classify each observation as state 0 or 1
    hidden_states = model.predict(np.reshape(Q,[len(Q),1]))

    # find parameters of Gaussian HMM
    mus = np.array(model.means_)
    sigmas = np.array(np.sqrt(np.array([np.diag(model.covars_[0]),np.diag(model.covars_[1])])))
    P = np.array(model.transmat_)

    # find log-likelihood of Gaussian HMM
    logProb = model.score(np.reshape(Q,[len(Q),1]))

    # generate nSamples from Gaussian HMM
    samples = model.sample(nSamples)

    # re-organize mus, sigmas and P so that first row is lower mean (if not already)
    if mus[0] > mus[1]:
        mus = np.flipud(mus)
        sigmas = np.flipud(sigmas)
        P = np.fliplr(np.flipud(P))
        hidden_states = 1 - hidden_states

    return hidden_states, mus, sigmas, P, logProb, samples

# load annual flow data for the Colorado River near the Colorado/Utah state line
AnnualQ = np.loadtxt('AnnualQ.txt')

# log transform the data and fit the HMM
logQ = np.log(AnnualQ)
hidden_states, mus, sigmas, P, logProb, samples = fitHMM(logQ, 100)

Okay great, we’ve fit an HMM! What does the model look like? Let’s plot the time series of hidden states. Since we made the lower mean always represented by state 0, we know that hidden_states == 0 corresponds to the dry state and hidden_states == 1 to the wet state.

from matplotlib import pyplot as plt
import seaborn as sns
import numpy as np

def plotTimeSeries(Q, hidden_states, ylabel, filename):

    fig = plt.figure()
    ax = fig.add_subplot(111)

    xs = np.arange(len(Q))+1909
    masks = hidden_states == 0
    ax.scatter(xs[masks], Q[masks], c='r', label='Dry State')
    masks = hidden_states == 1
    ax.scatter(xs[masks], Q[masks], c='b', label='Wet State')
    ax.plot(xs, Q, c='k')
    handles, labels = plt.gca().get_legend_handles_labels()
    fig.legend(handles, labels, loc='lower center', ncol=2, frameon=True)

    return None

plt.switch_backend('agg') # turn off display when running with Cygwin
plotTimeSeries(logQ, hidden_states, 'log(Flow at State Line)', 'StateTseries_Log.png')

Wow, looks like there’s some persistence! What are the transition probabilities?


Running that we get the following:

[[ 0.6794469   0.3205531 ]
[ 0.34904974  0.65095026]]

When in a dry state, there is a 68% chance of transitioning to a dry state again in the next year, while in a wet state there is a 65% chance of transitioning to a wet state again in the next year.

What does the distribution of flows look like in the wet and dry states, and how do these compare with the overall distribution? Since the probability distribution of the wet and dry states are Gaussian in log-space, and each state has some probability of being observed, the overall probability distribution is a mixed, or weighted, Gaussian distribution in which the weight of each of the two Gaussian models is the unconditional probability of being in their respective state. These probabilities make up the stationary distribution, π, which is the vector solving the equation π = πP, where P is the probability transition matrix. As briefly mentioned in Part I, this can be found using the method described here: π = (1/ Σi[ei])e in which e is the eigenvector of PT corresponding to an eigenvalue of 1, and ei is the ith element of e. The overall distribution for our observations is then Y ~ π0N(μ0,σ02) + π1*N(μ1,σ12). We plot this distribution and the component distributions on top of a histogram of the log-space annual flows below.

from scipy import stats as ss

def plotDistribution(Q, mus, sigmas, P, filename):

    # calculate stationary distribution
    eigenvals, eigenvecs = np.linalg.eig(np.transpose(P))
    one_eigval = np.argmin(np.abs(eigenvals-1))
    pi = eigenvecs[:,one_eigval] / np.sum(eigenvecs[:,one_eigval])

    x_0 = np.linspace(mus[0]-4*sigmas[0], mus[0]+4*sigmas[0], 10000)
    fx_0 = pi[0]*ss.norm.pdf(x_0,mus[0],sigmas[0])

    x_1 = np.linspace(mus[1]-4*sigmas[1], mus[1]+4*sigmas[1], 10000)
    fx_1 = pi[1]*ss.norm.pdf(x_1,mus[1],sigmas[1])

    x = np.linspace(mus[0]-4*sigmas[0], mus[1]+4*sigmas[1], 10000)
    fx = pi[0]*ss.norm.pdf(x,mus[0],sigmas[0]) + \

    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.hist(Q, color='k', alpha=0.5, density=True)
    l1, = ax.plot(x_0, fx_0, c='r', linewidth=2, label='Dry State Distn')
    l2, = ax.plot(x_1, fx_1, c='b', linewidth=2, label='Wet State Distn')
    l3, = ax.plot(x, fx, c='k', linewidth=2, label='Combined State Distn')

    handles, labels = plt.gca().get_legend_handles_labels()
    fig.legend(handles, labels, loc='lower center', ncol=3, frameon=True)

    return None

plotDistribution(logQ, mus, sigmas, P, 'MixedGaussianFit_Log.png')

Looks like a pretty good fit – seems like a Gaussian HMM is a decent model of log-transformed annual flows in the Colorado River at the Colorado/Utah state line. Hopefully you can find relevant applications for your work too. If so, I’d recommend reading through this hmmlearn tutorial, from which I learned how to do everything I’ve shown here.

Fitting Hidden Markov Models Part I: Background and Methods

Hydro-climatological variables often exhibit long-term persistence caused by regime-shifting behavior in the climate, such as the El Niño-Southern Oscillations (ENSO). One popular way of modeling this long-term persistence is with hidden Markov models (HMMs) [Thyer and Kuczera, 2000; Akintug and Rasmussen, 2005; Bracken et al., 2014]. What is an HMM? Recall from my five blog posts on weather generators, that the occurrence of precipitation is often modeled by a (first order) Markov model in which the probability of rain on a given day depends only on whether or not it rained on the previous day. A (first order) hidden Markov model is similar in that the climate “state” (e.g., wet or dry) at a particular time step depends only on the state from the previous time step, but the state in this case is “hidden,” i.e. not observable. Instead, we only observe a random variable (discrete or continuous) that was generated under a particular state, but we don’t know what that state was.

For example, imagine you are a doctor trying to diagnose when an individual has the flu. On any given day, this person is in one of two states: sick or healthy. These states are likely to exhibit great persistence; when the person gets the flu, he/she will likely have it for several days or weeks, and when he/she is heathy, he/she will likely stay healthy for months. However, suppose you don’t have the ability to test the individual for the flu virus and can only observe his/her temperature. Different (overlapping) distributions of body temperatures may be observed depending on whether this person is sick or healthy, but the state itself is not observed. In this case, the person’s temperature can be modeled by an HMM.

So why are HMMs useful for describing hydro-climatological variables? Let’s go back to the example of ENSO. Maybe El Niño years in a particular basin tend to be wetter than La Niña years. Normally we can observe whether or not it is an El Niño year based on SST anomalies in the tropical Pacific, but suppose we only have paleodata of tree ring widths. We can infer from the tree ring data (with some error) what the total precipitation might have been in each year of the tree’s life, but we may not know what the SST anomalies were those years. Or even if we do know the SST anomalies, maybe there is another more predictive regime-shifting teleconnection we haven’t yet discovered. In either case, we can model the total annual precipitation with an HMM.

What is the benefit of modeling precipitation in these cases with an HMM as opposed to say, an autoregressive model? Well often the year to year correlation of annual precipitation may not actually be that high, but several consecutive wet or consecutive dry years are observed [Bracken et al., 2014]. Furthermore, paleodata suggests that greater persistence (e.g. megadroughts) in precipitation is often observed than would be predicted by autoregressive models [Ault et al., 2013; Ault et al., 2014]. This is where HMMs may come in handy.

Here I will explain how to fit HMMs generally, and in Part II I will show how to apply these methods using the Python package hmmlearn. To understand how to fit HMMs, we first need to define some notation. Let Yt be the observed variable at time t (e.g., annual streamflow). The distribution of Yt depends on the state at time t, Xt (e.g., wet or dry). Let’s assume for simplicity that our observations can be modeled by Gaussian distributions. Then f(Yt | Xt = i) ~ N(μi,σi 2) and f(Yt | Xt = j) ~ N(μj,σj 2) for a two-state HMM. The state at time t, Xt, depends on the state at the previous time step, Xt-1. Let P be the state transition matrix, where each element pi,j represents the probability of transitioning from state i at time t to state j at time t+1, i.e. pij = P(Xt+1 = j | Xt = i). P is a n x n matrix where n is the number of states (e.g. 2 for wet and dry). In all Markov models (hidden or not), the unconditional probability of being in each state, π can be modeled by the equation π = πP, where π is a 1 x n vector in which each element πi represents the unconditional probability of being in state i, i.e. πi = P(Xt = i). π is also called the stationary distribution and can be calculated from P as described here. Since we have no prior information on which to condition the first set of observations, we assume the initial probability of being in each state is the stationary distribution.

In fitting a two-state Gaussian HMM, we therefore need to estimate the following vector of parameters: θ = [μ0, σ0, μ1, σ1, p00, p11]. Note p01 = 1 – p00 and p10 = 1 – p11. The most common approach to estimating these parameters is through the Baum-Welch algorithm, an application of Expectation-Maximization built off of the forward-backward algorithm. The first step of this process is to set initial estimates for each of the parameters. These estimates can be random or based on an informed prior. We then begin with the forward step, which computes the joint probability of observing the first t observations and ending up in state i at time t, given the initial parameter estimates: P(Xt = i, Y1 = y1, Y2 = y2, …, Yt = yt | θ). This is computed for all t ϵ {1, …, T}. Then in the backward step, the conditional probability of observing the remaining observations after time t given the state observed at time t is computed: P(Yt+1 = yt+1, …, YT = yT | Xt=i, θ). Using Bayes’ theorem, it can shown that the product of the forward and backward probabilities is proportional to the probability of ending up in state i at time t given all of the observations, i.e. P(Xt = i | Y1 = y1,…, YT = yT, θ). This is derived below:

1) P(X_t=i \vert Y_1=y_1,..., Y_T=y_T, \theta) = \frac{P(Y_1=y_1, ..., Y_T=y_T \vert X_t=i, \theta) P(X_t=i \vert \theta)}{P(Y_1=y_1, ..., Y_T=y_t \vert \theta)}

2) P(X_t=i \vert Y_1=y_1,..., Y_T=y_T, \theta) = \frac{P(Y_1=y_1, ..., Y_t=y_t \vert X_t=i, \theta) P(Y_{t+1} = y_{t+1}, ..., Y_T=y_T \vert X_t=i, \theta) P(X_t=i \vert \theta)}{P(Y_1=y_1, ..., Y_T=y_t \vert \theta)}

3) P(X_t=i \vert Y_1=y_1,..., Y_T=y_T, \theta) = \frac{P(X_t=i, Y_1=y_1, ..., Y_t=y_t \vert \theta) P(Y_{t+1} = y_{t+1}, ..., Y_T=y_T \vert X_t=i, \theta)}{P(Y_1=y_1, ..., Y_T=y_t \vert \theta)}

4) P(X_t=i \vert Y_1=y_1,..., Y_T=y_T, \theta) \propto P(X_t=i, Y_1=y_1, ..., Y_t=y_t \vert \theta) P(Y_{t+1}=y_{t+1}, ..., Y_T=y_T \vert X_t=i, \theta)

The first equation is Bayes’ Theorem. The second equation is derived by the conditional independence of the observations up to time t (Y1, Y2, …, Yt) and the observations after time t (Yt+1, Yt+2, …, YT), given the state at time t (Xt). The third equation is derived from the definition of conditional probability, and the fourth recognizes the denominator as a normalizing constant.

Why do we care about the probability of ending up in state i at time t given all of the observations (the left hand side of the above equations)? In fitting a HMM, our goal is to find a set of parameters, θ, that maximize this probability, i.e. the likelihood function of the state trajectories given our observations. This is therefore equivalent to maximizing the product of the forward and backward probabilities. We can maximize this product using Expectation-Maximization. Expectation-Maximization is a two-step process for maximum likelihood estimation when the likelihood function cannot be computed directly, for example, because its observations are hidden as in an HMM. The first step is to calculate the expected value of the log likelihood function with respect to the conditional distribution of X given Y and θ (the left hand side of the above equations, or proportionally, the right hand side of equation 4). The second step is to find the parameters that maximize this function. These parameter estimates are then used to re-implement the forward-backward algorithm and the process repeats iteratively until convergence or some specified number of iterations. It is important to note that the maximization step is a local optimization around the current best estimate of θ. Hence, the Baum-Welch algorithm should be run multiple times with different initial parameter estimates to increase the chances of finding the global optimum.

Another interesting question beyond fitting HMMs to observations is diagnosing which states the observations were likely to have come from given the estimated parameters. This is often performed using the Viterbi algorithm, which employs dynamic programming (DP) to find the most likely state trajectory. In this case, the “decision variables” of the DP problem are the states at each time step, Xt, and the “future value function” being optimized is the probability of observing the true trajectory, (Y1, …,YT), given those alternative possible state trajectories. For example, let the probability that the first state was k be V1,k. Then V1,k = P(X1 = k) = P(Y1 = y1 | X1 = k)πk. For future time steps, Vt,k = P(Yt = yt | Xt = k)pik*Vt-1,i where i is the state in the previous time step. Thus, the Viterbi algorithm finds the state trajectory (X1, …, XT) maximizing VT,k.

Now that you know how HMMs are fit using the Baum-Welch algorithm and decoded using the Viterbi algorithm, read Part II to see how to perform these steps in practice in Python!