# Plotting trajectories and direction fields for a system of ODEs in Python

The aim of this post is to guide the reader through plotting trajectories and direction fields for a system of equations in Python. This is useful when investigating the equilibria and stability of the system, and to facilitate in understanding the general behavior of a system under study. I will use a system of predator-prey equations, that my most devoted online readers are already familiar with from my previous posts on identifying equilibria and stability, and on nondimensionalization. Specifically, I’ll be using the Lotka-Volterra set of equations with Holling’s Type II functional response:

$\frac{\mathrm{d} x}{\mathrm{d} t}=bx\left ( 1-\frac{x}{K} \right )-\frac{axy}{1+ahx}$

$\frac{\mathrm{d} y}{\mathrm{d} t}=\frac{caxy}{1+ahx}-dy$

where:

x: prey abundance

y: predator abundance

b: prey growth rate

d: predator death rate

c: rate with which consumed prey is converted to predator

a: rate with which prey is killed by a predator per unit of time

K: prey carrying capacity given the prey’s environmental conditions

h: handling time

This system has 3 equilibria: when both species are dead (0,0), when predators are dead and the prey grows to its carrying capacity (K,0) and a non-trivial equilibrium where both species coexist and is generally more interesting, given by:

$y^*=\frac{b}{a}(1+ahx^*)\left(1-\frac{x^*}{K} \right)$

$x^*=\frac{d}{a(c-dh)}$

The following code should produce both trajectories and direction fields for this system of ODEs (python virtuosos please excuse the extensive commenting, I try to comment as much as possible for people new to python):

import numpy as np
from matplotlib import pyplot as plt
from scipy import integrate

# I'm using this style for a pretier plot, but it's not actually necessary
plt.style.use('ggplot')

"""
This is to ignore RuntimeWarning: invalid value encountered in true_divide
I know that when my populations are zero there's some division by zero and
the resulting error terminates my function, which I want to avoid in this case.
"""
np.seterr(divide='ignore', invalid='ignore')

# These are the parameter values we'll be using
a = 0.005
b = 0.5
c = 0.5
d = 0.1
h = 0.1
K = 2000

# Define the system of ODEs
# P[0] is prey, P[1] is predator
def fish(P, t=0):
return ([b*P[0]*(1-P[0]/K) - (a*P[0]*P[1])/(1+a*h*P[0]),
c*(a*P[0]*P[1])/(1+a*h*P[0]) - d*P[1] ])

# Define equilibrium point
EQ = ([d/(a*(c-d*h)),b*(1+a*h*(d/(a*(c-d*h))))*(1-(d/(a*(c-d*h)))/K)/a])

"""
I need to define the possible values my initial points will take as they
relate to the equilibrium point. In this case I chose to plot 10 trajectories
ranging from 0.1 to 5
"""
values = np.linspace(0.1, 5, 10)
# I want each trajectory to have a different color
vcolors = plt.cm.autumn_r(np.linspace(0.1, 1, len(values)))

# Open figure
f = plt.figure()
"""
I need to define a range of time over which to integrate the system of ODEs
The values don't really matter in this case because our system doesn't have t
on the right hand side of dx/dt and dy/dt, but it is a necessary input for
integrate.odeint.
"""
t = np.linspace(0, 150, 1000)

# Plot trajectories by looping through the possible values
for v, col in zip(values, vcolors):
# Starting point of each trajectory
P0 = [E*v for E in EQ]
# Integrate system of ODEs to get x and y values
P = integrate.odeint(fish, P0, t)
# Plot each trajectory
plt.plot( P[:,0], P[:,1],
# Different line width for different trajectories (optional)
lw=0.5*v,
# Different color for each trajectory
color=col,
# Assign starting point to trajectory label
label='P0=(%.f, %.f)' % ( P0[0], P0[1]) )
"""
To plot the direction fields we first need to define a grid in order to
compute the direction at each point
"""
# Get limits of trajectory plot
ymax = plt.ylim(ymin=0)[1]
xmax = plt.xlim(xmin=0)[1]
# Define number of points
nb_points = 20
# Define x and y ranges
x = np.linspace(0, xmax, nb_points)
y = np.linspace(0, ymax, nb_points)
# Create meshgrid
X1 , Y1 = np.meshgrid(x,y)
# Calculate growth rate at each grid point
DX1, DY1 = fish([X1, Y1])
# Direction at each grid point is the hypotenuse of the prey direction and the
# predator direction.
M = (np.hypot(DX1, DY1))
# This is to avoid any divisions when normalizing
M[ M == 0] = 1.
# Normalize the length of each arrow (optional)
DX1 /= M
DY1 /= M

plt.title('Trajectories and direction fields')
"""
This is using the quiver function to plot the field of arrows using DX1 and
DY1 for direction and M for speed
"""
Q = plt.quiver(X1, Y1, DX1, DY1, M, pivot='mid', cmap=plt.cm.plasma)
plt.xlabel('Prey abundance')
plt.ylabel('Predator abundance')
plt.legend(bbox_to_anchor=(1.05, 1.0))
plt.grid()
plt.xlim(0, xmax)
plt.ylim(0, ymax)
plt.show()



This should produce the following plot. All P0s are the initial conditions we defined.

We can also see that this parameter combination produces limit cycles in our system. If we change the parameter values to:

a = 0.005
b = 0.5
c = 0.5
d = 0.1
h = 0.1
K = 200


i.e. reduce the available resources to the prey, our trajectories look like this:

The equilibrium becomes stable, attracting the trajectories to it.

The same can be seen if we increase the predator death rate:

a = 0.005
b = 0.5
c = 0.5
d = 1.5
h = 0.1
K = 2000


The implication of this observation is that an initially stable system, can become unstable given more resources for the prey or less efficient predators. This has been referred to as the Paradox of Enrichment and other predator-prey models have tried to address it (more on this in future posts).

P.S: I would also like to link to this scipy tutorial, that I found very helpful and that contains more plotting tips.

# Animations (2/2)

In the second part of this two-part post we’ll learn to use different tools and techniques to visualize and save animations (first part here). All the code discussed here is available on a GitHub repository, here.

This part focuses on the moviepy Python library, and all the neat things one can do with it. There actually are some nice tutorials for when we have a continuous function t -> f(t) to work with (see here). Instead, we are often working with data structures that are indexed on time in a discrete way.

Moviepy could be used from any data source dependent on time, including netCDF data such as the one manipulated by VisIt in the first part of this post. But in this second part, we are instead going to focus on how to draw time -dependent trajectories to make sense of nonlinear dynamical systems, then animate them in GIF. I will use the well-known shallow lake problem, and go through a first example with detailed explanation of the code. Then I’ll finish with a second example showing trajectories.

## Part I: using state trajectories to understand the concept of stable equilibria

The shallow lake problem is a classic problem in the management of coupled human and natural system. Some human (e.g. agriculture) produce phosphorus that eventually end up in water bodies such as lakes. Too much phosphorus in lake causes a processus called eutrophication which usually destroys lakes’ diverse ecosystems (no more fish) and lower water quality. A major problem with that is that eutrophication is difficult or even sometimes impossible to reverse: lowering phosphorus inputs to what they were pre-eutrophication simply won’t work. Simple nonlinear dynamics, first proposed by Carpenter et al. in 1999 (see here) describe the relationship between phosphorus inputs (L) and concentration (P). The first part of the code (uploaded to GitHub as movie1.py) reads:

 import attractors
import numpy as np
from moviepy.video.io.bindings import mplfig_to_npimage
from moviepy.video.VideoClip import DataVideoClip
import matplotlib.pyplot as plt
import matplotlib.lines as mlines

# Lake parameters
b = 0.65
q = 4

# One step dynamic (P increment rate)
# arguments are current state x and lake parameters b,q and input l
def Dynamics(x, b, q, l):
dp = (x ** q) / (1 + x ** q) - b * x + l
return dp 

Where the first 6 lines contain the usual library imports. Note that I am importing an auxiliary Python function “attractors” to enable me to plot the attractors (see attractors.pyon the GitHub repository). The function “Dynamics” correspond to the evolution of P given L and lake parameters b and q, also given in this bit of code. Then we introduce the time parameters:

# Time parameters
dt = 0.01 # time step
T = 40 # final horizon
nt = int(T/dt+1E-6) # number of time steps

To illustrate that lake phosphorus dynamics depend not only on the phosphorus inputs L but also on initial phosphorus levels, we are going to plot P trajectories for different constant values of L, and three cases regarding the initial P. We first introduce these initial phosphorus levels, and the different input levels, then declare the arrays in which we’ll store the different trajectories

# Initial phosphorus levels
pmin = 0
pmed = 1
pmax = 2.5

# Inputs levels
l = np.arange(0.001,0.401,0.005)

# Store trajectories
low_p = np.zeros([len(l),nt+1]) # Correspond to pmin
med_p = np.zeros([len(l),nt+1]) # Correspond to pmed
high_p = np.zeros([len(l),nt+1]) # Correspond to pmax 

Once that is done, we can use the attractor import to plot the equilibria of the lake problem. This is a bit of code that is the GitHub repository associated to this post, but that I am not going to comment on further here.

After that we can generate the trajectories for P with constant L, and store them to the appropriate arrays:

# Generating the data: trajectories
def trajectory(b,q,p0,l,dt,T):
# Declare outputs
time = np.arange(0,T+dt,dt)
traj = np.zeros(len(time))
# Initialize traj
traj[0] = p0
# Fill traj with values
for i in range(1,len(traj)):
traj[i] = traj[i-1] + dt * Dynamics(traj[i-1],b,q,l)
return traj
# Get them!
for i in range(len(l)):
low_p[i,:] = trajectory(b,q,pmin,l[i],dt,T)
med_p[i, :] = trajectory(b, q, pmed, l[i], dt, T)
high_p[i,:] = trajectory(b,q,pmax,l[i],dt,T)


Now we are getting to the interesting part of making the plots for the animation. We need to declare a figure that all the frames in our animation will use (we don’t want the axes to wobble around). For that we use matplotlib / pyplot libraries:

# Draw animated figure
fig, ax = plt.subplots(1)
ax.set_xlabel('Phosphorus inputs L')
ax.set_ylabel('Phosphorus concentration P')
ax.set_xlim(0,l[-1])
ax.set_ylim(0,pmax)
line_low, = ax.plot(l,low_p[:,0],'.', label='State, P(0)=0')
line_med, = ax.plot(l,med_p[:,0],'.', label='State, P(0)=1')
line_high, = ax.plot(l,high_p[:, 0], '.', label='State, P(0)=2.5')


Once that is done, the last things we need to do before calling the core moviepy functions are to 1) define the parameters that manage time, and 2) have a function that makes frames for the instant that is being called.

For 1), we need to be careful because we are juggling with different notions of time, a) time in the dynamics, b) the index of each instant in the dynamics (i.e., in the data, the arrays where we stored the trajectories), and c) time in the animation. We may also want to have a pause at the beginning or at the end of the GIF, rather than watch with tired eyes as the animation is ruthlessly starting again before we realized what the hell happened. So here is how I declared all of this:

# Parameters of the animation
initial_delay = 0.5 # in seconds, delay where image is fixed before the animation
final_delay = 0.5 # in seconds, time interval where image is fixed at end of animation
time_interval = 0.25 # interval of time between two snapshots in the dynamics (time unit or non-dimensional)
fps = 20 # number of frames per second on the GIF
# Translation in the data structure
data_interval = int(time_interval/dt) # interval between two snapshots in the data structure
t_initial = -initial_delay*fps*data_interval
t_final = final_delay*fps*data_interval
time = np.arange(t_initial,low_p.shape[1]+t_final,data_interval) # time in the data structure

Now for 2), the function to make the frames resets the parts of the plot that change for different time indexes(“t” below is the index in the data). If we don’t do that, the plot will keep the previous plotted elements, and will grow messier at the animation goes on.

# Making frames
def make_frame(t):
t = int(t)
if t<0:
return make_frame(0)
elif t>nt:
return make_frame(nt)
else:
line_low.set_ydata(low_p[:,t])
line_med.set_ydata(med_p[:,t])
line_high.set_ydata(high_p[:, t])
ax.set_title(' Lake attractors, and dynamics at t=' + str(int(t*dt)), loc='left', x=0.2)
if t > 0.25*nt:
alpha = (t-0.25*nt) / (1.5*nt)
lakeAttBase(eqList, 0.001, alpha=alpha)
plt.legend(handles=[stable, unstable], loc=2)
return mplfig_to_npimage(fig) 

In the above mplfig_to_npimage(fig) is a moviepy function that turns a figure into a frame of our GIF. Now we just have to call the function to do frames using the data, and to turn it into a GIF:

# Animating
animation = DataVideoClip(time,make_frame,fps=fps)
animation.write_gif("lake_attractors.gif",fps=fps)

Where the moviepy function DataVideoClip takes as arguments the sequences of indexes defined by the vector “time” defined in the parameters of the animation, the “make_frame” routine we defined, and the number of frame per second we want to output. The last lines integrates each frame to the GIF that is plotted below:

Each point on the plot represent a different world (different constant input level, different initial phosphorus concentration), and the animation shows how these states converge towards an stable equilibriun point. The nonlinear lake dynamics make the initial concentration important towards knowing if the final concentration is low (lower set of stable equilibria), or if the lake is in a eutrophic state (upper set of stable equilibria).

## Part II: plotting trajectories in the 2-D plane

Many trajectories can be plotted at the same time to understand the behavior of attractors, and visualize system dynamics for fixed human-controlled parameters (here, the phosphorus inputs L). Alternatively, if one changes the policy, trajectories evolve depending on both L and P. This redefines how trajectories are defined.

I did a similar bit of code to show how one could plot trajectories in the 2D plane. It is also uploaded on the GitHub repository (under movie2.py), and is similar in its structure to the code above. The definition of the trajectories and where to store them change. We define trajectories where inputs are lowered at a constant rate, with a minimum input of 0.08. For three different initial states, that gives us the following animation that illustrates how the system’s nonlinearity leads to very different trajectories even though the starting positions are close and the management policy, identical:

This could easily be extended to trajectories in higher dimensional planes, with and without sets of equilibria to guide our eyes.

# Animations (1/2)

In the first part of this two-part post we’ll learn to use different tools and techniques to visualize and save animations (second part here). This first part focuses on VisIt, an open-source visualization software developed by the Lawrence Livermore Laboratory – and many others (…that’s an advantage of being open-source). There are a lot of cool things about VisIt, and you can read about that here or here, and probably in future posts. But for today, we’ll just learn how to use VisIt to play and save animations. VisIt supports over a hundred file formats, and this post focuses on reading spatially distributed and time-varying data stored in the netCDF format (more about netCDF here).

Let us assume we have VisIt installed and the VisIt path has been added to our bash profile. Then we just need to cd into the directory where our data are stored, and type visit in the command line to launch the program. VisIt comes with two windows, one for manipulating files and the data stored in them, and another where plots are drawn (it is possible to draw more plots in additional windows, but in this tutorial we’ll stick to one window). Both windows are visible in the screenshot below.

It may be difficult to see, but the left-hand window is the space for managing files and plots. The active “Plot” is also displayed. Here the active file is “air.2m.2000.nc” where nc is th extension for netCDF files. The plot is a “Pseudocolor” of that data, the equivalent of “contourf” in Matlab or Python – Matplotlib. It is plotted in the right-hand window. It is the worldwide distribution of daily air temperature (in Kelvin) with a 1 degree resolution for January 1, 2000 on the left pane (Data from NCEP). Note the toolbar above the plot: it contains “play / stop” buttons that can allow us to play the 365 other days of year 2000 (the cursor is on “stop” in the screenshot). We are going to save the animation of the whole year with the VisIt movie wizard.

But first, let us change the specifications of our plot. In the left-hand window, we go to PlotAtts – > Pseudocolor (recall that is the plot type that we have). We can change the colorbar, set minima and maxima so that the colorbar does not change for each day of the year when we save the movie, and even make the scale skewed so we see more easily the differences in temperatures in non-polar regions where people live (on the image above the temperature in the middle of the colorbar is 254.9 K, which corresponds to -18.1C or roughly -1F). The screenshot below shows our preferences. We can click “Apply” for the changes to take effect, and exit the window.

Now we can save the movie. We go to File – > Save Movie and follow the VisIt Movie Wizard. You will find it very intuitive and easy. We save a “New Simple Movie” in DVIX (other formarts are available), with 20 frames (i.e. 20 days) per second. The resulting movie is produced within 2 minutes, but since WordPress does not support DIVX unless I am willing to pay for premium access, I had to convert the result as a GIF. To be honest, this is a huge letdown as the quality suffers compared to a video (unless I want to upload a 1GB monster). But the point of this post is that VisIt is really easy to use, not that you should convert its results into GIF.

In this animation it is easy to see the outline of land masses as they get either warmer or cooler than the surrounding oceans! Videos would look neater: you can try them for yourself! (at home, wherever you want).

It is actually possible to command VisIt using Python scripts, but I haven’t mastered that yet so it will be a tale for another post.

# Generating Interactive Visuals in R

Visuals vs. Visual Analytics

How do visuals differ from visual analytics? In a scientific sense, a visual is a broad term for any picture, illustration, or graph that can be used to convey an idea. However, visual analytics is more than just generating a graph of complex data and handing it to a decision maker. Visual analytic tools help create graphs that allow the user to interact with the data, whether that involves manipulating a graph in three-dimensional space or allowing users to filter or brush for solutions that match certain criteria. Ultimately, visual analytics seeks to help in making decisions as fast as possible and to “enable learning through continual [problem] reformulation” (Woodruff et al., 2013) by presenting large data sets in an organized way so that the user can better recognize patterns and make inferences.

My goal with this blog post is to introduce two R libraries that are particularly useful to develop interactive graphs that will allow for better exploration of a three-dimensional space. I have found that documentation on these libraries and potential errors was sparse, so this post will consolidate my hours of Stack Overflow searching into a step-by-step process to produce beautiful graphs!

R Libraries

Use rgl to create a GIF of a 3D graph

Spinning graphs can be especially useful to visualize a 3D Pareto front and a nice visualization for a Power Point presentation. I will be using an example three-objective Pareto set from Julie’s work on the Red River Basin for this tutorial. The script has been broken down and explained in the following sections.


#Set working directory
setwd("C:/Users/Me/Folder/Blog_Post_1")

#Read in in csv of pareto set

#Create three vectors for the three objectives
hydropower=data$WcAvgHydro deficit=data$WcAvgDeficit
flood=data$WcAvgFlood  In this first block of code, the working directory is set, the data set is imported from a CSV file, and each column of the data frame is saved as a vector that is conveniently named. Now we will generate the plot.  #call the rgl library library(rgl) #Adjust the size of the window par3d(windowRect=c(0,0,500,500))  If the rgl package isn’t installed on your computer yet, simply type install.packages(“rgl”) into the console. Otherwise, use the library function in line 2 to call the rgl package. The next line of code is used to adjust the window that the graph will pop up in. The default window is very small and as such, the movie will have a small resolution if the window is not adjusted!  #Plot the set in 3D space plot3d(hydropower,deficit,flood,col=brewer.pal(8,"Blues"), size=2, type='s', alpha=0.75)  Let’s plot these data in 3D space. The first three components of the plot3d function are the x,y, and z vectors respectively. The rest of the parameters are subject to your personal preference. I used the Color Brewer (install package “RColorBrewer”) to color the data points in different blue gradients. The first value is the number of colors that you want, and the second value is the color set. Color Brewer sets can be found here: http://www.datavis.ca/sasmac/brewerpal.html. My choice of colors is random, so I opted not to create a color scale. Creating a color scale is more involved in rgl. One option is to split your data into classes and to use legend3d and the cut function to cut your legend into color levels. Unfortunately, there simply isn’t an easy way to create a color scale in rgl. Finally, I wanted my data points to be spheres, of size 2, that were 50% transparent, which is specified with type, size, and alpha respectively. Plot3d will open a window with your graph. You can use your mouse to rotate it. Now, let’s make a movie of the graph. The movie3d function requires that you install ImageMagick, a software that allows you to create a GIF from stitching together multiple pictures. ImageMagick also has cool functionalities like editing, resizing, and layering pictures. It can be installed into your computer through R using the first two lines of code below. Make sure not to re-run these lines once ImageMagick is installed. Note that ImageMagick doesn’t have to be installed in your directory, just on your computer.  require(installr) install.ImageMagick() #Create a spinning movie of your plot movie3d(spin3d(axis = c(0, 0, 1)), duration = 20, dir = getwd())  Finally, the last line of code is used to generate the movie. I have specified that I want the plot to spin about the z axis, specified a duration (you can play around with the number to see what suits your data), and that I want the movie to be saved in my current working directory. The resulting GIF is below. If the GIF has stopped running, reload the page and scroll down to this section again. I have found that creating the movie can be a bit finicky and the last step is where errors usually occur. When you execute your code, make sure that you keep the plot window open while ImageMagick stitches together the snapshots otherwise you will get an error. If you have errors, please feel free to share because I most likely had them at one point and was able to ultimately fix them. Overall, I found this package to be useful for a quick overview of the 3D space, but I wasn’t pleased with the way the axes values and titles overlap sometimes when the graph spins. The way to work around this is to set the labels and title to NULL and insert your own non-moving labels and title when you add the GIF to a PowerPoint presentation. Use plotly to create an interactive scatter I much prefer the plotly package to rgl for the aesthetic value, ease of creating a color scale, and the ability to mouse-over points to obtain coordinate values in a scatter plot. Plotly is an open source JavaScript graphing library but has an R API. The first step is to create a Plotly account at: https://plot.ly/. Once you have confirmed your email address, head to https://plot.ly/settings/api to get an API key. Click the “regenerate key” button and you’ll get a 20 character key that will be used to create a shareable link to your chart. Perfect, now we’re ready to get started! setwd("C:/Users/Me/Folder/Blog_Post_1") library(plotly) library(ggplot2) #Set environment variables Sys.setenv("plotly_username"="rg727") Sys.setenv("plotly_api_key"="insert key here") #Read in pareto set data pareto=read.csv("ieee_synthetic_thinned.csv")  Set the working directory, install the relevant libraries, set the environment variables and load in the data set. Be sure to insert your API key. You will need to regenerate a new key every time you make a new graph. Also, note that your data must be in the form of a data frame for plotly to work. #Plot your data plot= plot_ly(pareto, x = ~WcAvgHydro, y = ~WcAvgDeficit, z = ~WcAvgFlood, color = ~WcAvgFlood, colors = c('#00d6f7', '#ebfc2f')) add_markers() #Add axes titles layout(title="Pareto Set", scene = list(xaxis = list(title = 'Hydropower'),yaxis = list(title = 'Deficit'), zaxis = list(title = 'Flood'))) #call the name of your plot to appear in the viewer plot  To correctly use the plotly command, the first input needed is the data frame, followed by the column names of the x,y, and z columns in the data frame. Precede each column name with a “~”. I decided that I wanted the colors to scale with the value of the z variable. The colors were defined using color codes available at http://www.color-hex.com/. Use the layout function to add a main title and axis labels. Finally call the name of your plot and you will see it appear in the viewer at the lower right of your screen.If your viewer shows up blank with only the color scale, click on the viewer or click “zoom”. Depending on how large the data set is, it may take some time for the graph to load.  #Create a link to your chart and call it to launch the window chart_link = api_create(plot, filename = "public-graph") chart_link  Finally create the chart link using the first line of code above and the next line will launch the graph in Plotly. Copy and save the URL and anyone with it can access your graph, even if they don’t have a Plotly account. Play around with the cool capabilities of my Plotly graph, like mousing over points, rotating, and zooming! https://plot.ly/~rg727/1/ Sources: http://www.sthda.com/english/wiki/a-complete-guide-to-3d-visualization-device-system-in-r-r-software-and-data-visualization https://plot.ly/r/3d-scatter-plots/ Woodruff, M.J., Reed, P.M. & Simpson, T.W. Struct Multidisc Optim (2013) 48: 201. https://doi.org/10.1007/s00158-013-0891-z James J. Thomas and Kristin A. Cook (Ed.) (2005). Illuminating the Path: The R&D Agenda for Visual Analytics National Visualization and Analytics Center. # Map making in Matlab Greetings, This weeks post will cover the basics of generating maps in Matlab. Julie’s recent post showed how to do some of this in Python, but, Matlab is also widely used by the community. You can get a lot done with Matlab, but in this post we’ll just cover a few of the basics. We’ll start off by plotting a map of the continental United States, with the states. We used three this with three commands: usamap, shaperead, and geoshow. usamap creates an empty map axes having the Lambert Projection covering the area of the US, or any state or collection of states. shaperead reads shapefiles (duh) and returns a Matlab geographic data structure, composed of both geographic data and attributes. This Matlab data structure then interfaces really well with various Matlab functions (duh). Finally, geoshow plots geographic data, in our case on the map axes we defined. Here’s some code putting it all together. hold on figure1 = figure; ax = usamap('conus'); set(ax, 'Visible', 'off') latlim = getm(ax, 'MapLatLimit'); lonlim = getm(ax, 'MapLonLimit'); states = shaperead('usastatehi',... 'UseGeoCoords', true, 'BoundingBox', [lonlim', latlim']); geoshow(ax, states, 'FaceColor', [0.5 0.5 0.5]) tightmap hold off  Note that ‘usastatehi’ is a shapefile containing the US states (duh) that’s distributed with Matlab. The above code generates this figure: Now, suppose we wanted to plot some data, say a precipitation forecast, on our CONUS map. Let’s assume our forecast is being made at many points (lat,long). To interpolate between the points for plotting we’ll use Matlab’s griddata function. Once we’ve done this, we use the Matlab’s contourm command. This works exactly like the normal contour function, but the ‘m’ indicates it plots map data. xi = min(x):0.5:max(x); yi = min(y):0.5:max(y); [XI, YI] = meshgrid(xi,yi); ZI = griddata(x,y,V,XI,YI); hold on figure2 = figure; ax = usamap('conus'); set(ax, 'Visible', 'off') latlim = getm(ax, 'MapLatLimit'); lonlim = getm(ax, 'MapLonLimit'); states = shaperead('usastatehi',... 'UseGeoCoords', true, 'BoundingBox', [lonlim', latlim']); geoshow(ax, states, 'FaceColor', [0.5 0.5 0.5]) contourm(YI,-1*XI,ZI) tightmap hold off  Here x, y, and V are vectors of long, lat, and foretasted precipitation respectively. This code generates the following figure: Wow! Louisiana is really getting hammered! Let’s take a closer look. We can do this by changing the entry to usamap to indicate we want to consider only Louisiana. Note, usamap accepts US postal code abbreviations. ax = usamap('LA');  Making that change results in this figure: Neat! We can also look at two states and add annotations. Suppose, for no reason in particular, you’re interested in the location of Tufts University relative to Cornell. We can make a map to look at this with the textm and scatterm functions. As before, the ‘m’ indicates the functions plot on a map axes. hold on figure4 = figure; ax = usamap({'MA','NY'}); set(ax, 'Visible', 'off') latlim = getm(ax, 'MapLatLimit'); lonlim = getm(ax, 'MapLonLimit'); states = shaperead('usastatehi',... 'UseGeoCoords', true, 'BoundingBox', [lonlim', latlim']); geoshow(ax, states, 'FaceColor', [0.5 0.5 0.5]) scatterm(42.4075,-71.1190,100,'k','filled') textm(42.4075+0.2,-71.1190+0.2,'Tufts','FontSize',30) scatterm(42.4491,-76.4842,100,'k','filled') textm(42.4491+0.2,-76.4842+0.2,'Cornell','FontSize',30) tightmap hold off  This code generates the following figure. Cool! Now back to forecasts. NOAA distributes short term Quantitative Precipitation Forecasts (QPFs) for different durations every six hours. You can download these forecasts in the form of shapefiles from a NOAA server. Here’s an example of a 24-hour rainfall forecast made at 8:22 AM UTC on April 29. Wow, that’s a lot of rain! Can we plot our own version of this map using Matlab! You bet! Again we’ll use usamap, shaperead, and geoshow. The for loop, (0,1) scaling, and log transform are simply to make the color map more visually appealing for the post. There’s probably a cleaner way to do this, but this got the job done! figure5 = figure; ax = usamap('conus'); S=shaperead('94q2912','UseGeoCoords',true); set(ax, 'Visible', 'off') latlim = getm(ax, 'MapLatLimit'); lonlim = getm(ax, 'MapLonLimit'); states = shaperead('usastatehi',... 'UseGeoCoords', true, 'BoundingBox', [lonlim', latlim']); geoshow(ax, states, 'FaceColor', [0.5 0.5 0.5]) p = colormap(jet); N = max(size(S)); d = zeros(N,1); for i = 1:N d(i) = log(S(i).QPF); end y=floor(((d-min(d))/range(d))*63)+1; col = p(y,:); for i = 1:N geoshow(S(i),'FaceColor',col(i,:),'FaceAlpha',0.5)%,'SymbolSpec', faceColors) end  This code generates the following figure: If you are not plotting in the US, Matlab also has a worldmap command. This works exactly the same as usamap, but now for the world (duh). Matlab is distibuted with a shapefile ‘landareas.shp’ which contains all of the land areas in the world (duh). Generating a global map is then trivial: figure6 = figure; worldmap('World') land = shaperead('landareas.shp', 'UseGeoCoords', true); geoshow(land, 'FaceColor', [0.15 0.5 0.15])  Which generates this figure. Matlab also comes with a number of other included that might be of interest. For instance, shapefiles detailing the locations of major world cities, lakes, and rivers. We can plot those with the following code: figure7 = figure; worldmap('World') land = shaperead('landareas.shp', 'UseGeoCoords', true); geoshow(land, 'FaceColor', [0.15 0.5 0.15]) lakes = shaperead('worldlakes', 'UseGeoCoords', true); geoshow(lakes, 'FaceColor', 'blue') rivers = shaperead('worldrivers', 'UseGeoCoords', true); geoshow(rivers, 'Color', 'blue') cities = shaperead('worldcities', 'UseGeoCoords', true); geoshow(cities, 'Marker', '.', 'Color', 'red')  Which generates the figure: But suppose we’re interested in one country or a group of countries. worldmap works in the same usamap does. Also, you can plot continents, for instance Europe. worldmap('Europe')  Those are the basics, but there are many other capabilities, including 3-D projections. I can cover this in a later post if there is interest. That’s it for now! # A visual introduction to data compression through Principle Component Analysis Principle Component Analysis (PCA) is a powerful tool that can be used to create parsimonious representations of a multivariate data set. In this post I’ll code up an example from Dan Wilks’ book Statistical Methods in the Atmospheric Sciences to visually illustrate the PCA process. All code can be found at the bottom of this post. As with many of the examples in Dr. Wilks’ excellent textbook, we’ll be looking at minimum temperature data from Ithaca and Canandaigua, New York (if anyone is interested, here is the distance between the two towns). Figure 1 is a scatter plot of the minimum temperature anomalies at each location for the month of January 1987. Figure 1: Minimum temperature anomalies in Ithaca and Canandaigua, New York in January 1987 As you can observe from Figure 1, the two data sets are highly correlated, in fact, they have a Pearson correlation coefficient of 0.924. Through PCA, we can identify the primary mode of variability within this data set (its largest “principle component”) and use it to create a single variable which describes the majority of variation in both locations. Let x define the matrix of our minimum temperature anomalies in both locations. The eigenvectors (E) of the covariance matrix of x describe the primary modes variability within the data set. PCA uses these eigenvectors to create a new matrix, u, whose columns contain the principle components of the variability in x. $u = xE$ Each element in u is a linear combination of the original data, with eigenvectors in E serving as a kind of weighting for each data point. The first column of u corresponds to the eigenvector associated with the largest eigenvalue of the covariance matrix. Each successive column of u represents a different level of variability within the data set, with u1 describing the direction of highest variability, u2 describing the direction of the second highest variability and so on and so forth. The transformation resulting from PCA can be visualized as a rotation of the coordinate system (or change of basis) for the data set, this rotation is shown in Figure 2. Figure 2: Geometric interpretation of PCA As can be observed in Figure 2, each data point can now be described by its location along the newly rotated axes which correspond to its corresponding value in the newly created matrix u. The point (16, 17.8), highlighted in Figure 2, can now be described as (23, 6.6) meaning that it is 23 units away from the origin in the direction of highest variability and 6.6 in the direction of second highest variability. As shown in Figure 2, the question of “how different from the mean” each data point is can mostly be answered by looking at its corresponding u1 value. Once transformed, the original data can be recovered through a process known as synthesis. Synthesis can be thought of as PCA in reverse. The elements in the original data set x, can be approximated using the eigenvalues of the covariance matrix and the first principle component, u1. $\tilde{x} = \tilde{u}\tilde{E}^T$ Where: $\tilde{x}\hspace{.1cm} is\hspace{.1cm} the\hspace{.1cm} reconstructed\hspace{.1cm} data\hspace{.1cm} set$ $\tilde{u}\hspace{.1cm} is\hspace{.1cm} the\hspace{.1cm} PCs\hspace{.1cm} used \hspace{.1cm} for \hspace{.1cm} reconstruction\hspace{.1cm} (in\hspace{.1cm} our\hspace{.1cm} case\hspace{.1cm} the\hspace{.1cm} first\hspace{.1cm} column)$ $\tilde{E}\hspace{.1cm} is \hspace{.1cm} the \hspace{.1cm} eigenvector\hspace{.1cm} of \hspace{.1cm} the \hspace{.1cm} PCs \hspace{.1cm} used$ For our data set, these reconstructions seem to work quite well, as can be observed in Figure 3. Data compression through PCA can be a useful alternative tolerant methods for dealing with multicollinearity, which I discussed in my previous post. Rather than running a constrained regression, one can simply compress the data set to eliminate sources of multicollinearity. PCA can also be a helpful tool for identifying patterns within your data set or simply creating more parsimonious representations of a complex set of data. Matlab code used to create the above plots can be found below. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Ithaca_Canandagua_PCA % By: D. Gold % Created: 3/20/17 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % This script will perform Principle Component analysis on minimum % temperature data from Ithaca and Canadaigua in January, 1987 provided in % Appendix A of Wilks (2011). It will then estimate minimum temperature % values of both locations using the first principle component. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% create data sets clear all % data from appendix Wilks (2011) Appendix A.1 Ith = [19, 25, 22, -1, 4, 14, 21, 22, 23, 27, 29, 25, 29, 15, 29, 24, 0,... 2, 26, 17, 19, 9, 20, -6, -13, -13, -11, -4, -4, 11, 23]'; Can = [28, 28, 26, 19, 16, 24, 26, 24, 24, 29, 29, 27, 31, 26, 38, 23,... 13, 14, 28, 19, 19, 17, 22, 2, 4, 5, 7, 8, 14, 14, 23]'; %% center the data, plot temperature anomalies, calculate covariance and eigs % center the data x(:,1) = Ith - mean(Ith); x(:,2) = Can - mean(Can); % plot anomalies figure scatter(x(:,1),x(:,2),'Filled') xlabel('Ithaca min temp anomaly ({\circ}F)') ylabel('Canandagua min temp anomaly ({\circ}F)') % calculate covariance matrix and it's corresponding Eigenvalues & vectors S = cov(x(:,1),x(:,2)); [E, Lambda]=eigs(S); % Identify maximum eigenvalue, it's column will be the first eigenvector max_lambda = find(max(Lambda)); % index of maximum eigenvalue in Lambda idx = max_lambda(2); % column of max eigenvalue %% PCA U = x*E(:,idx); %% synthesis x_syn = E(:,idx)*U'; % reconstructed values of x % plot the reconstructed values against the original data figure subplot(2,1,1) plot(1:31,x(:,1) ,1:31, x_syn(1,:),'--') xlim([1 31]) xlabel('Time (days)') ylabel('Ithaca min. temp. anomalies ({\circ}F)') legend('Original', 'Reconstruction') subplot(2,1,2) plot(1:31, x(:,2),1:31, x_syn(2,:)','--') xlim([1 31]) xlabel('Time (days)') ylabel('Canadaigua min. temp. anomalies ({\circ}F)') legend('Original', 'Reconstruction') Sources: Wilks, D. S. (2011). Statistical methods in the atmospheric sciences. Amsterdam: Elsevier Academic Press. # Alluvial Plots We all love parallel coordinates plots and use them all the time to display our high dimensional data and tell our audience a good story. But sometimes we may have large amounts of data points whose tradeoffs’ existence or lack thereof cannot be clearly verified, or the data to be plotted is categorical and therefore awkwardly displayed in a parallel coordinates plot. One possible solution to both issues is the use of alluvial plots. Alluvial plots work similarly to parallel coordinates plots, but instead of having ranges of values in the axes, it contains bins whose sizes in an axis depends on how many data points belong to that bin. Data points that fall within the same categories in all axes are grouped into alluvia (stripes), whose thicknesses reflect the number of data points in each alluvium. Next are two examples of alluvial plots, the fist displaying categorical data and the second displaying continuous data that would normally be plotted in a parallel coordinates plot. After the examples, there is code available to generate alluvial plots in R (I know, I don’t like using R, but creating alluvial plots in R is easier than you think). Categorical data The first example (Figure 1) comes from the cran page for the alluvial plots package page. It uses alluvial plots to display data about all Titanic’s passengers/crew and group them into categories according to class, sex, age, and survival status. Figure 1 – Titanic passenger/crew data. Yellow alluvia correspond to survivors and gray correspond to deceased. The size of each bin represents how many data points (people) belong to that category in a given axis, while the thickness of each alluvium represent how many people fall within the same categories in all axes. Source: https://cran.r-project.org/web/packages/alluvial/vignettes/alluvial.html. Figure 1 shows that most of the passengers were male and adults, that the crew represented a substantial amount of the total amount of people in the Titanic, and that, unfortunately, there were more deceased than survivors. We can also see that a substantial amount of the people in the boat were male adult crew members who did not survive, which can be inferred by the thickness of the grey alluvium that goes through all these categories — it can also be seen by the lack of an alluvia hitting the Crew and Child bins, that (obviously) there were no children crew members. It can be also seen that 1st class female passengers was the group with the greatest survival rate (100%, according to the plot), while 3rd class males had the lowest (ballpark 15%, comparing the yellow and gray alluvia for 3rd class males). Continuous data The following example shows the results of policy modeling for a fictitious water utility using three different policy formulations. Each data point represents the modeled performance of a given candidate policy in six objectives, one in each axis. Given the uncertainties associated with the models used to generate this data, the client utility company is more concerned about whether or not a candidate policy would meet certain performance criteria according to the model (Reliability > 99%, Restriction Frequency < 20%, and Financial Risk < 10%) than about the actual objective values. The utility also wants to have a general idea of the tradeoffs between objectives. Figure 2 was created to present the modeling results to the client water utility. The colored alluvia represent candidate policies that meet the utility’s criteria, and grey lines represent otherwise. The continuous raw data used to generate this plot was categorized following ranges whose values are meaningful to the client utility, with the best performing bin always put in the bottom of the plot. It is important to notice that the height of the bins represent the number of policies that belong to that bin, meaning that the position of the gap between two stacked bins does not represent a value in an axis, but the fraction of the policies that belong to each bin. It can be noticed from Figure 2 that it is relatively difficult for any of the formulations to meet the Reliability > 99% criteria established by the utility. It is also striking that a remarkably small number of policies from the first two formulations and none of the policies from the third formulation meet the criteria established by the utilities. It can also be easily seen by following the right alluvia that the vast majority of the solutions with smaller net present costs of infrastructure investment obtained with all three formulations perform poorly in the reliability and restriction frequency objectives, which denotes a strong tradeoff. The fact that such tradeoffs could be seen when the former axis is on the opposite side of the plot to the latter two is a remarkable feature of alluvial plots. Figure 2 – Alluvial plot displaying modeled performance of candidate long-term planning policies. The different subplots show different formulations (1 in the top, 3 in the bottom). The parallel coordinates plots in Figure 3 displays the same information as the alluvial plot in Figure 2. It can be readily seen that the analysis performed above, especially when it comes to the tradeoffs, would be more easily done with Figure 2 than with Figure 3. However, if the actual objective values were important for the analysis, Figure 3 would be needed either by itself or in addition to Figure 2, the latter being used likely as a pre-screening or for a higher level analysis of the results. Figure 3 – Parallel coordinates plot displaying modeled performance of candidate long-term planning policies. The different subplots show different formulations (1 in the top, 3 in the bottom). The R code used to create Figure 1 can be found here. The code below was used to create Figure 2 — The packages “alluvia”l and “dplyr” need to be installed before attempting to use the provided code, for example using the R command install.packages(package_name). Also, the user needs to convert its continuous data into categorical data, so that each row corresponds to a possible combination of bins in all axis (one column per axis) plus a column (freqs) representing the frequencies with which each combination of bins is seen in the data. # Example datafile: snippet of file "infra_tradeoffs_strong_freqs.csv" Reliability, Net Present Cost of Inf. Investment, Peak Financial Costs, Financial Risk, Restriction Frequency, Jordan Lake Allocation, freqs 2<99,0<60,0<25,0<10,2>20,0<70,229 0>99,2>60,0<25,0<10,2>20,0<70,0 2<99,2>60,0<25,0<10,2>20,0<70,168 0>99,0<60,2>25,0<10,2>20,0<70,0 2<99,0<60,2>25,0<10,2>20,0<70,3 0>99,2>60,2>25,0<10,2>20,0<70,2 2<99,2>60,2>25,0<10,2>20,0<70,45 0>99,0<60,0<25,2>10,2>20,0<70,0 2<99,0<60,0<25,2>10,2>20,0<70,317 0>99,2>60,0<25,2>10,2>20,0<70,0 2<99,2>60,0<25,2>10,2>20,0<70,114 # load packages and prepare data library(alluvial) library(dplyr) itss <- read.csv('infra_tradeoffs_strong_freqs.csv') itsw <- read.csv('infra_tradeoffs_weak_freqs.csv') itsn <- read.csv('infra_tradeoffs_no_freqs.csv') # preprocess the data (convert do dataframe) itss %>% group_by(Reliability, Restriction.Frequency, Financial.Risk, Peak.Financial.Costs, Net.Present.Cost.of.Inf..Investment, Jordan.Lake.Allocation) %>% summarise(n = sum(freqs)) -> its_strong itsw %>% group_by(Reliability, Restriction.Frequency, Financial.Risk, Peak.Financial.Costs, Net.Present.Cost.of.Inf..Investment, Jordan.Lake.Allocation) %>% summarise(n = sum(freqs)) -> its_weak itsn %>% group_by(Reliability, Restriction.Frequency, Financial.Risk, Peak.Financial.Costs, Net.Present.Cost.of.Inf..Investment, Jordan.Lake.Allocation) %>% summarise(n = sum(freqs)) -> its_no # setup output file svg(filename="tradeoffs_3_formulations.svg", width=8, height=8, pointsize=18) p <- par(mfrow=c(3,1)) par(bg = 'white') # create the plots alluvial( its_strong[,1:6], freq=its_strong$n,
col = ifelse(its_strong$Reliability == "0>99" & its_strong$Restriction.Frequency == "0<20" &
its_strong$Financial.Risk == "0<10", "blue", "grey"), border = ifelse(its_strong$Reliability == "0>99" &
its_strong$Restriction.Frequency == "0<20" & its_strong$Financial.Risk == "0<10", "blue", "grey"),
# border = "grey",
alpha = 0.5,
hide=its_strong$n < 1 ) alluvial( its_weak[,1:6], freq=its_weak$n,
col = ifelse(its_strong$Reliability == "0>99" & its_strong$Restriction.Frequency == "0<20" &
its_weak$Financial.Risk == "0<10", "chartreuse2", "grey"), border = ifelse(its_strong$Reliability == "0>99" &
its_strong$Restriction.Frequency == "0<20" & its_weak$Financial.Risk == "0<10", "chartreuse2", "grey"),
# border = "grey",
alpha = 0.5,
hide=its_weak$n < 1 ) alluvial( its_no[,1:6], freq=its_no$n,
col = ifelse(its_strong$Reliability == "0>99" & its_strong$Restriction.Frequency == "0<20" &
its_no$Financial.Risk == "0<10", "red", "grey"), border = ifelse(its_strong$Reliability == "0>99" &
its_strong$Restriction.Frequency == "0<20" & its_no$Financial.Risk == "0<10", "red", "grey"),
# border = "grey",
alpha = 0.5,
hide=its_no\$n < 1
)
dev.off()