Introduction to Wavelets

The post is a brief introduction and overview of wavelets. Wavelets are a powerful tool for time series analysis, de-noising, and data compression, but have recently exploded in fields like hydrology and geophysics. We are often interested in discovering periodic phenomena or underlying frequencies that are characteristic in a signal of interest, such as low-frequency teleconnections. We may also want to better understand very rapidly changing seismic signals during an earthquake or ocean wave dispersion. Wavelets can help us answer many questions in these applications!

Basic Background

The typical approach to understanding what frequencies are present in a signal involves transforming the time-domain signal into the frequency domain by means of the Fourier Transform. A well-known deficiency of the Fourier Transform is that is provides information about the frequencies that are present, but all information about the time at which these frequencies are present is lost. Thus, signals that are non-stationary, or exhibit changing frequencies over time cannot be analyzed by the Fourier Transform (Deshpande, 2015). One solution that was proposed to address this limitation was the Short-time Fourier Transform (STFT). In a very basic sense, the STFT involves applying a window function to segment the signal in time and then performing a Fourier transform on each segment. Then, the frequencies for each segment can be plotted to better understand the changing spectra (Smith, 2011). A considerable amount of research was spent on determining appropriate windowing functions between the 1940s and 1970s. However, limitations still existed with this approach. The STFT utilizes the same windowing function across the whole time series, which may not be appropriate for all the different frequencies that may be characterize a signal. When a signal has very high frequency content, a narrow window is preferable, but this results in poor frequency resolution. When a signal has lower frequency content, a wider window is preferable, but this results in poor time resolution. This tradeoff is often termed an example of the Heisenberg Uncertainty Principle (Marković et al, 2012).


The Short-Term Fourier Transform assumes that frequencies are present uniformly across the time series. Wavelet bases of different scales (frequency bands) can be influential at select times in the series.

In order to properly analyze signals that are non-stationary or transient (most signals of interest) and to appropriately address both large and small scale features in the frequency space, we can make use of wavelets! Wavelet transforms are especially useful for analyzing signals which have low frequencies for long durations and short frequencies for short durations. Furthermore, the basis functions are not restricted to only sinusoids, which can make it easier to approximate functions with sharper peaks or corners.

In a continuous wavelet transform, expressed below in Equation (1), the basis function or window is termed the “mother” wavelet, which is designated by Ψ .


This mother wavelet can be scaled and translated with the s and τ  parameters, respectively, to produce “daughter” or “child” wavelets, which are simply variations of the mother wavelet. The wavelets are not only translated across the signal but can be dilated or contracted to capture longer or shorter term events.

By definition, a continuous wavelet transform is a convolution of the input signal with the daughter wavelets, which is inherently a measure of similarity. The wavelet is “slid” over the signal of interest and similarities at each time point are measured. Therefore, the wavelet will be correlated with the parts of the series that contain a similar frequency.

The convolution with the selection of wavelets can lead to redundancy of information when the basis functions are not orthogonal. A family of wavelet basis functions have been developed to address this limitation. The simplest orthonormal wavelet is the Haar wavelet, which is a discrete, square shaped wavelet. The Haar wavelet is not continuous or differentiable, however it is particularly useful for approximating the response of systems that may experience a sudden transition. A Morlet wavelet is a complex sinusoid enclosed in a Gaussian envelope and may be more useful in applications such as music, hearing/vision, and for analyzing electrocardiograms.


Common Wavelets : a) Haar b) Gaussian c) Daubechies d) Morlet (Baker, 2007)

In a discrete wavelet transform (DWT), the translation and scale parameters, s and τ are discretized in such a way that each successive wavelet is twice the dimension as the one before, to cover all but very low frequencies. This is termed a dyadic filter bank. Each daughter wavelet can therefore be thought of as a bandpass filter that represents a specific frequency of interest or scale.


Dyadic filter bank frequency response magnitudes (Smith, 2011)

The correlation across time associated with the signal and each daughter wavelet can be plotted in a scalogram as shown below.


Mapping the wavelet scalogram (Shoeb & Clifford, 2006)

The coefficients of the wavelet indicate the correlation between the wavelet and the signal of interest at a specific time and scale. The amplitude squared of the wavelet coefficient,|Wi|2 defines the wavelet power and can be used to create a Wavelet Power Spectrum.  A larger power corresponds to a higher correlation, therefore, the regions of high power in the spectrum correspond to areas of interest5 .


We will use the library WaveletComp in R to demonstrate how to find the Wavelet Power Spectrum. Many wavelet libraries exist in Python and MATLAB work equivalently and just as simply, but I like WaveletComp due the huge supplement in the package repository that contains many examples on how to use the functions. For this post, I took quarterly El Niño 3 Region (NINO3) SST surface anomalies from NOAA recorded over 1871-1996 and applied a single function from the package to create the spectrum.

my.w = analyze.wavelet(mydata, "Index",
loess.span = 0,
dt = 0.25, dj = 1/250,
lowerPeriod = 0.25,
upperPeriod = 32,
make.pval = TRUE, n.sim = 30)
wt.image(my.w, n.levels = 250,
legend.params = list(lab = "wavelet power levels") )

Here we specify the name of the dataframe that contains the data (mydata), the column that contains the index (“Index”), the number of observations per unit time (dt=0.25 due to the quarterly resolution) and the upper and lower period that bounds the search area. The range of periods is generally represented as a series of octaves of 2j and each octave is divided into 250 sub-octaves based on the dj term. The “make.pvalue=TRUE” argument draws a white line around the areas that are deemed significant. Finally, we plot the wavelet objecive using wt.image.


Wavelet Power Spectrum

As seen in the power spectrum, the highest powers across the time series are recorded within the 2-7 year frequency bands, which matches with the period that El Niño events tend to occur. Interestingly, the El Niño signal seems strongest at the earliest and later parts of the decade and is notably less prominent between the years of 1920-1960, which has been observed in other work (Torrence & Webster, 1997). The wavelet spectrum allows us to see how the periodicity of the El Niño signal has changed over time, with longer periods being observed around 1920 and shorter periods closer to the beginning and end of the century.

Because wavelets are shifted across each time-point in the convolution operation, the coefficients at both edges (half of  the wavelet duration at each frequency) are not accurate. The smaller frequencies utilize smaller wavelet durations, creating a “cone of influence” that is shown by the white shading on the edges of the plot. The cone of influence designates the areas of the plot that should be disregarded.

Coherence is a measure of the common oscillations that two signals share. Generally, coherence or cross-correlation is used to assess similarity in the time or frequency domain. However, if the two signals being compared are non-stationary, the correlation can change over time. Therefore, the coherence must be represented in a way to show changes across frequency and time. We can create cross-correlation plots in WaveletComp as well. I have extracted some data supplied by MathWorks which contains a monthly Niño3 index along with an average All-India Rainfall Index6. In order to assess the how these two time series are linked, we use the analyze.coherency function in WaveletComp.

my.wc = analyze.coherency(data,
my.pair = c("Nino_Index", "Rainfall"),
loess.span = 0,
dt = 1/12, dj = 1/50,
lowerPeriod = 0.25, upperPeriod = 32,
make.pval = TRUE, n.sim = 10)

wc.image(my.wc, n.levels = 250,
legend.params = list(lab = "cross-wavelet power levels"),
color.key = "interval",


Cross-Wavelet Power Spectrum

The maximum correlation aligns well within the 2-7 year bands that were observed in the above plot. The orientation of the arrows in the figure signify the delay between the two signals at that time period, The vertical to horizontal arrows denote a ¼ – ½ cycle delay within the significant areas or ½-3.5 years of delay between the El Niño SST’s observed off the coast of South America to influence rainfall in the Indian subcontinent. Pretty cool!

There is quite a bit of solid math and proofs that one should go through to truly understand the theory behind wavelets, but hopefully this post serves as a good introduction to show how wavelets can be useful for your respective analysis. When I first learned about wavelets and tried out WaveletComp, I immediately began conducting a wavelet analysis on every time series that I had on hand to look for hidden frequencies! I hope that this tutorial motivates you to explore wavelets for analyzing your non-stationary signals.


Baker, J. W. “Quantitative Classification of Near-Fault Ground Motions Using Wavelet Analysis.” Bulletin of the Seismological Society of America, vol. 97, no. 5, 2007, pp. 1486–1501., doi:10.1785/0120060255.

Deshpande, Jaidev. “Limitations of the Fourier Transform: Need For a Data Driven Approach.” Limitations of the Fourier Transform: Need For a Data Driven Approach – Pyhht 0.0.1 Documentation, 2015,

Marković, Dejan, et al. “Time-Frequency Analysis: FFT and Wavelets.” DSP Architecture Design Essentials, 2012, pp. 145–170., doi:10.1007/978-1-4419-9660-2_8.

Smith, Julius O. Spectral Audio Signal Processing. W3K, 2011.

Torrence, Christopher, and Peter J. Webster. “Interdecadal Changes in the ENSO–Monsoon System.” Journal of Climate, vol. 12, no. 8, 1999, pp. 2679–2690., doi:10.1175/1520-0442(1999);2.

[5] “Wavelet Power Spectrum.” Wavelet Toolkit Architecture, Dartmouth, 16 June 2005,

[6] “Wcoherence.” MATLAB & Simulink Example, The MathWorks, Inc., 2020,

Everything you want to know about subplots in Python’s Matplotlib

Sometimes its helpful to get back to basics. I recently created a summer course on data visualization with Python, and the experience made me realize that the workings of Python’s main visualization library, Matplotlib, are often left of out formal Python courses. Over the course of my graduate career I’ve taken several courses on programming and Python, and each time visualization or plotting was viewed as an afterthought. I don’t think my experience is uncommon, creating basic visualizations in Python is fairly straight forward, and a quick Google search will yield tutorials on how to make most common plot types. While constructing the summer course, I realized how much time I could have saved if I had learned how Matplotlib worked earlier in my PhD. With this in mind, I’m writing this post to serve as a guide for those new to plotting with Matplotlib, and to help fill in some gaps for those who are already experienced Matplotlib users.

I’ve chosen to devote this post to making subplots, a task often necessary for scientific visualizations that can be surprisingly frustrating if you don’t understand Matplotlib’s structure. Since Python is an open source language, there are multiple ways of creating and working with subplots, so in this post I’ll outline a few ways that work for me, and provide some context about how things work behind the scenes in Matplotlib.

A brief introduction to Matplotlib’s object oriented syntax

Let’s start with some history. As it’s name suggests, Matplotlib was originally created as a means of replicating Matlab style plotting functionality in Python. As such, one way we can create visualizations using Matplotlib is through “Matlab style” syntax which is contained within Matplotlib’s flagship module, Pyplot. For example, to create a simple line plot we can use the following code:

import matplotlib.pyplot as plt
import numpy as np
x = np.arange(0,10)
y = np.arange(0, 10)


Which will generate this plot:

While the Matlab style syntax is easy to use, it is actually quite limiting in its ability to create custom visualizations. Luckily, the Matlab style syntax is simply hiding an object oriented code structure that is at the core of Matplotlib. By directly accessing and working with this object oriented structure we can create highly customized visualizations.

For the purposes of this blog, there are two Matplotlib objects that are important: Figure objects and Axes objects. Figure objects represent the containers that hold a visualization. I think of this as the blank canvas that the plots will be generated on. Figure objects can contain one or multiple Axes objects, each representing an individual chart (this can be a big source of confusion when learning Matplotlib, the term “axes object” is inherited from Matlab and refers to an entire chart rather than a x or y axis) . To create the plot above using Matplotlib’s object oriented syntax we need to make a few small modifications to the original code. First, we’ll generate a Figure object using Pyplot which will automatically create an axes object behind the scenes. Next we access this Axes object using the Figure object’s “gca” method, which stands for “get current axes”. Finally, we’ll use the Axes object to generate the plot:

fig = plt.figure()
ax = plt.gca()

This will make the identical plot as above:

Note that we can use the Axes object to create any kind of plot that can be created with the Matlab style syntax. Examples include ax.plot (line plot), ax.scatter (scatter plot), (bar plot), ax.imshow (heatmaps and color mesh) etc.

Creating Basic subplots

Now that we have some insight into the object oriented structure of Matplotlib, we can start making some subplots. There are two main ways we can make subplots with Matplotlib, the first is to use Pyplot’s “subplots” function, which allows us to specify the number of rows and columns of plots in your figure. This function will return both a Figure object and a list of Axes objects. Note that the list of Axes objects is arranged in the same way as the subplots are within the figure (i.e. list entry [0,0] is the top left subplot):

fig, [[ax0, ax1],[ax2, ax3]] = plt.subplots(nrows=2, ncols=2)
ax0.text(0.5, 0.5, "This is axes object 0", ha='center')
ax1.text(0.5, 0.5, "This is axes object 1", ha='center')
ax2.text(0.5, 0.5, "This is axes object 2", ha='center')
ax3.text(0.5, 0.5, "This is axes object 3", ha='center')

Which will make the following set of subplots:

Instead of specifying individual names of each axes, we can alternatively store them in a single named list and axes them via indices like this:

fig, axes = plt.subplots(nrows=2, ncols=2)
axes[0,0].text(0.5, 0.5, "This is axes object 0", ha='center')
axes[0,1].text(0.5, 0.5, "This is axes object 1", ha='center')
axes[1,0].text(0.5, 0.5, "This is axes object 2", ha='center')
axes[1,1].text(0.5, 0.5, "This is axes object 3", ha='center')

An alternative way to create subplots is to first make the Figure object on its own, and then add subplots one at a time using the method “add_subplot” from the figure object. This method takes one argument, a three digit number. The first digit represents the number of rows, the second represents the number of columns and the third represents the placement of the individual subplot (the location from left to right, top to bottom, with the first placement being the top left and the last being the lower right. Oddly, this is 1 indexed unlike everything else in the entire Python language):

fig = plt.figure()
ax0 =fig.add_subplot(221)
ax0.text(0.5, 0.5, "This is axes object 0", ha='center')
ax1 =fig.add_subplot(222)
ax1.text(0.5, 0.5, "This is axes object 1", ha='center')
ax2 =fig.add_subplot(223)
ax2.text(0.5, 0.5, "This is axes object 2", ha='center')
ax3 =fig.add_subplot(224)
ax3.text(0.5, 0.5, "This is axes object 3", ha='center')

This script will make the identical set of subplots as shown above:

Giving your plots some room to breath

You may have noticed that the plots above are very cramped. By default, there is very little room between adjacent subplots. One remedy is to use Matplotlib’s tight_layout function, which will automatically fit your subplots into the figure. Just add this one line after creating your subplots. For demonstration, I’ll also add titles and axes labels to each subplot:

fig, [[ax0, ax1],[ax2, ax3]] = plt.subplots(nrows=2, ncols=2)
ax0.text(0.5, 0.5, "This is axes object 0", ha='center')
ax0.set_title('Title 0')
ax1.text(0.5, 0.5, "This is axes object 1", ha='center')
ax1.set_title('Title 1')
ax2.text(0.5, 0.5, "This is axes object 2", ha='center')
ax2.set_title('Title 2')
ax3.text(0.5, 0.5, "This is axes object 3", ha='center')
ax3.set_title('Title 3')

This will create the following set of plots:

If we want to add some extra padding between subplots, we can add some arguments to override the default tight_layout parameters. The argument “pad” adds padding between subplots and the figure boarders, while “h_pad” and “w_pad” add height and width padding between subplots. The units of this padding are in percentages of the default font size.

plt.tight_layout(pad=1.5, h_pad=2.5, w_pad=2)

Note that to add the necessary padding, tight_layout will make the subplots themselves smaller and smaller. To fix this, we can increase the size of the figure object using the “figsize” argument when we create the Figure object. The units of this function are in inches:

fig, [[ax0, ax1],[ax2, ax3]] = plt.subplots(nrows=2, ncols=2, figsize=(8,6))

Before moving on, I should note that the function: plt.subplots_adjust() has very similar functionality to tight_layout and can allow you to adjust left, right, bottom and top paddings individually. For the sake of brevity I’m omitting it here.

Getting fancy: creating subplots of different sizes

So far we’ve been creating subplots of uniform size. In practice however, it can be helpful to generate plots of varying sizes. We can do this using the Gridspec class. Gridspec will create a grid of subplot locations within a Figure object. When generating subplots, we can assign each a location and size in Gridspec coordinates. Importantly, a single subplot can span multiple rows or columns of Gridspec coordinates. Below, I’ll make a 2×3 Gridspec and use it to create different sized subplots.

fig = plt.figure(figsize=(8,6))
gspec = fig.add_gridspec(nrows=2, ncols=3)

# the first subplot will span one row and two columns
# it will start at the top left
ax0 = fig.add_subplot(gspec[0,:2])
ax0.text(0.5, 0.5, "This is axes object 0, \ngspec coordinates [0,:2]", ha='center')

# the second subplot will span one row and one column
# it will start at the bottom left
ax1 = fig.add_subplot(gspec[1,0])
ax1.text(0.5, 0.5, "This is axes object 1, \ngspec coordinates [1,0]", ha='center')

# the third subplot will span one row and one column
# it will start at the bottom middle
ax2 = fig.add_subplot(gspec[1,1])
ax2.text(0.5, 0.5, "This is axes object 2, \ngspec coordinates [1,1]", ha='center')

# the fourth subplot will span two rows and one column
# it will start at the top right
ax3 = fig.add_subplot(gspec[:,2])
ax3.text(0.5, 0.5, "This is axes object 3, \ngspec coordinates [:,2]", ha='center')


Which will make this plot:

We can also customize the height and width of each Gridspec coordinate using the arguments “height_ratios” and “width_ratios” respectively when creating the Gridspec object.

fig = plt.figure(figsize=(8,6))
# parameters to specify the width and height ratios between rows and columns
widths= [1, 1.5, 2]
heights = [1, .5]

gspec = fig.add_gridspec(ncols=3, nrows=2, width_ratios = widths, height_ratios = heights)

# the first subplot will span one row and two columns
# it will start at the top left
ax0 = fig.add_subplot(gspec[0,:2])
ax0.text(0.5, 0.5, "This is axes object 0, \ngspec coordinates [0,:2]", ha='center')

# the second subplot will span one row and one column
# it will start at the bottom left
ax1 = fig.add_subplot(gspec[1,0])
ax1.text(0.5, 0.5, "This is axes object 1, \ngspec coordinates [1,0]", ha='center')

# the third subplot will span one row and one column
# it will start at the bottom middle
ax2 = fig.add_subplot(gspec[1,1])
ax2.text(0.5, 0.5, "This is axes object 2, \ngspec coordinates [1,1]", ha='center')

# the fourth subplot will span two rows and one column
# it will start at the top right
ax3 = fig.add_subplot(gspec[:,2])
ax3.text(0.5, 0.5, "This is axes object 3, \ngspec coordinates [:,2]", ha='center')


Final thoughts

There are many more ways we can customize subplots in Matplotlib, but the material in this post suits my needs 99% of the time. For further reading, check out the Matplotlib documentation and examples:

Spatial statistics (Part 2): Spatial Regression Models

Regression is one of the main techniques of data analysis. A regression model that can incorporate spatial dependency in a dependent variable is called a spatial regression model. It can be used as a simple surrogate model for prediction when the data are not available for some locations, or for understanding the factors behind patterns. In this blogpost, I am going to create a simple regression model for a crop yield, check the residuals for signs of relationships with nearby areas, and try to remove the potential spatial dependencies in the residuals by applying a spatial regression model. The autocorrelation in the residuals is a sign that the underlying process being studied varies systematically across the study area. In this situation, the resulting estimates of a fitted model are biased. Spatial regression models have applications in different fields such as agriculture (e.g., farm management, policy issues), natural sciences (e.g., species patterns), public health (e.g., air pollution), and social sciences (e.g., forecast population).The datasets that I am going to use (ww.* and WW_ave_hist.txt ) are available here. The .txt file includes historical winter wheat yield for some locations (4*4 km grid cells) with distinct IDs, and the “ww” shapefile (which includes 6 files) has some information for each location based on its ID as well. We will merge these two files, apply linear regression, and check whether we can use some explanatory variables from our data (predictors) to explain the variation in yield (dependent variable) across the region. We assume that we can predict yield by knowing annual potential evapotranspiration, precipitation, and available water in the soil profile.

setwd("---your path--- ")
Annual_var<- readOGR(".","ww") # This is  SpatialPolygonsDataFrame objects that brings the spatial representations of the polygons with the  data.
yield_ww<- read.table("---your path---/ww_ave_hist.txt",header = T)
yield<- merge(Annual_var,yield_ww,by="ID")
names (yield)
# fit the linear model
lm_yield <- lm(ww_ave_yield ~  ET_pot  +  precipitat +soil_water, data=yield) 
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 5058.6326   291.9022   17.33   <2e-16 ***
ET_pot        -5.0868     0.2147  -23.70   <2e-16 ***
precipitat    12.8215     0.1626   78.86   <2e-16 ***
soil_water    -5.9440     0.5295  -11.23   <2e-16 ***
Adjusted R-squared:  0.9035

After fitting a regression, and checking the coefficients to ensure that all the variables are statistically significant, one should check the the residuals to make sure they are independent. If there is any correlation, it means that our regression’s coefficient estimates could be wrong and they can not be  assumed constant across the study area. Therefore, the fitted model is not a good representation of the dependent variable.From our fitted model, we can extract the residuals:

yield$residuals_lm <- residuals(lm_yield)

Here, we look at the spatial patterning in the distribution of the residuals:

yield %>% %>% 
  ggplot(aes(LON, LAT)) + geom_tile(aes(fill=residuals_lm), alpha=3/4) + 
  ggtitle("") + coord_equal() + theme_bw()+scale_fill_gradientn(colours=c("black","yellow","red"),breaks=seq(-1660,1360,400)) 

Beside the visual inspection of the residuals, a more formal test would be required to decide whether spatial autocorrelation is present. In the first part of this blogpost, we used the linkages based on the physical distance to examine the spatial autocorrelation. Similar as before, we are going to create a list of neighbors using the Queen criteria. In my previous blog post, we calculated the local Moran’s I test statistic for the actual data. Here, we want to apply it on the residuals, so we need to use another function that takes into account that the variable under consideration is a residual of a regression. Also, in this example, we look at the global rather than the local Moran’s which is based on both feature locations and feature values simultaneously.

neighbor <- poly2nb(yield)
lw <- nb2listw(neighbor)
lm.morantest(lm_yield, lw) 
# 0.7908136752

The result shows statistically significant value for Moran’s I. We can also use the “neighbor” object to get the average value for the neighbors of each location (grid cell), and look at the correlations between these and the residuals or create a scatter plot for visual inspection.

mean_function <- sapply(neighbor, function(x) mean(yield$residuals_lm[x]))
cor(yield$residuals_lm, mean_function) 
# 0.8847948
plot(yield$residuals_lm, mean_function, xlab='Residuals', ylab='Mean adjacent residuals')

We clearly see that the spatial dependencies in the residuals are significant. This means that, if we use this model, the predicted values are systematically underestimated or overestimated. Therefore, we need to use spatial autoregressive models that account for spatial dependencies. There are two models used as spatial regression models: the spatially lagged model and the spatial error model. The first model uses a spatial lag variable that averages the neighboring values for each location and accounts for autocorrelation using a weights matrix. In the second model, the spatial dependence is handled through the errors rather than through the systematic component of the model. In order to decide whether to fit the spatial error or lagged model, the Lagrange Multiplier (LM) test is used to distinguish which is more appropriate. The R function lm.LMtests() would perform this test by considering these statistics: the LM test for the error dependence (LMerr) and the spatially lagged dependent variable (LMlag), as well as for their robust forms (RLMerr and RLMlag; e.g., RLMerr examines the spatially autocorrelated residuals in the possible presence of an omitted lagged dependent variable).

 lm.LMtests(lm_yield, lw, test = c("LMerr","LMlag","RLMerr","RLMlag"))
#LMerr = 2142.1, df = 1, p-value < 2.2e-16
#LMlag = 1025, df = 1, p-value < 2.2e-16
#RLMerr = 1121.8, df = 1, p-value < 2.2e-16
#RLMlag = 4.6952, df = 1, p-value = 0.03025

Since both LMerr and LMlag have significant p-values, we compare the p-values of the robust forms RLMerr and RLMlag. In doing so, it can be seen that RLMerr is significant. Therefore, the LM test suggests that we should run a spatial error model.

fit_err <- errorsarlm(ww_ave_yield ~  ET_pot  +  precipitat +soil_water, data=yield, lw)
# lagsarlm() is a function that creates a  spatial lag model
#Lambda: 0.93674, LR test value: 1912.7, p-value: < 2.22e-16
#AIC: 15166, (AIC for lm: 17077)

The results show that the Likelihood Ratio (LR) test is highly significant (p value 2.22e-16).This shows further evidence that the spatial error model is a good fit. Also, the  Akaike Information Criterion (AIC:an estimate of out-of-sample prediction error and therefore the relative quality of statistical models for a given dataset) in this new model has a AIC of 15166 and has a better fit compared to the original linear model, with no spatial error dependencies (AIC of 17077).

We can now check the residuals of this new model. In addition, we can take a look at the Moran’s I statistic one more time. Note that we previously used a Moran’s I test for spatial autocorrelation in residuals from an estimated linear model. Now, we don’t have a linear model, so we can use a Permutation test for the Moran’s I statistic: the function uses random permutations of x for the given spatial weighting scheme. The residuals graph and Moran’s I statistic both show that there is no correlation in the residuals:

yield$residuals_error_model <- residuals(fit_err)
mean_function_error_model <- sapply(neighbor, function(x) mean(yield$residuals_error_model[x]))
cor(yield$residuals_error_model, mean_function_error_model)
plot(yield$residuals_error_model, mean_function_error_model, xlab='Residuals', ylab='Mean adjacent residuals')$residuals_error_model, lw, 1000)   
# 1000 is a number of permutations

If we used the other model, the residuals would show some correlations:


Srinivasan, S., 2008. Spatial Regression Models, in: Shekhar, S., Xiong, H. (Eds.), Encyclopedia of GIS. Springer US, Boston, MA, pp. 1102–1105.

Using Rhodium for exploratory modeling

Rhodium is a powerful, simple, open source Python library for multiobjective robust decision making. As part of Project Platypus, Rhodium is compatible with Platypus (a MOEA optimization library) and PRIM (the Patent Rule Induction Method for Python), making it a valuable tool for bridging optimization and analysis. 

In the Rhodium documentation, a simple example of optimization and analysis uses the Lake Problem (DPS formulation). The actual optimization is performed in the line:

optimize(model, "NSGAII", 10000)

This optimize function uses the Platypus library directly for optimization; here the NSGAII algorithm is used for 10,000 function evaluations on the defined Lake Problem (model). This optimization call is concise and simple, but there are a few reasons why it may not be ideal.

  1. Speed. Python, an interpreted language, is inherently slower than compiled languages (Java, C/C++, etc.) The Platypus library is built entirely in Python, making optimization slow.
  2. Scalability. Platypus has support for parallelizing optimization, but this method is not ideal for large-scale computational experiments on computing clusters. 
  3. MOEA Suite. State of the art MOEAs such as the Borg MOEA are not implemented in Platypus for licensing reasons, so it is not usable directly by Rhodium.

Thus, external optimization is necessary for computationally demanding Borg runs. Luckily, Rhodium is easily compatible with external data files, so analysis with Rhodium of independent optimizations is simple. In this post, I’ll use a sample dataset obtained from a parallel Borg run of the Lake Problem, using the Borg wrapper.

The code and data used in this post can be found in this GitHub repository. lakeset.csv contains a Pareto approximate Lake Problem set. Each line is a solution, where the first six values are the decision variables and the last four are the corresponding objectives values. 

We’ll use Pandas for data manipulation. The script below reads the sample .csv file with Pandas, converts it to a list of Python dictionaries, and creates a Rhodium DataSet. There are a few important elements to note. First, the Pandas to_dict function takes in an optional argument ‘records’ to specify the format of the output. This specific format creates a list of Python dictionaries, where each element of the list is an individual solution (i.e. a line from the .csv file) with dictionary keys corresponding to the decision / objective value names and dictionary values as each line’s data. This is the format necessary for making a Rhodium DataSetwhich we create by calling the constructor with the dictionary as input.

import pandas as pd
from rhodium import *

# use pandas to read the csv file
frame = pd.read_csv("lakeset.csv")

# convert the pandas data frame to a Python dict in record format
dictionary = frame.to_dict('records')

# create a Rhodium DataSet instance from the Python dictionary
dataset = DataSet(dictionary)

Printing the Rhodium DataSet with print(dataset) yields:

Index 204:
   c1: 0.286373779
   r1: 0.126801547
   w1: 0.6265428129999999
   c2: -0.133307575
   r2: 1.3584425430000002
   w2: 0.10987546599999999
   benefit: -0.412053431
   concentration: 0.359441661
   inertia: -0.98979798
   reliability: -0.9563

Once we have a Rhodium DataSet instantiated, we access many of the library’s functionalities, without performing direct optimization with Platypus. For example, if we want the policy with the lowest Phosphorus concentration (denoted by the ‘concentration’ field), the following code outputs:

policy = dataset.find_min('concentration')
{'c1': 0.44744488600000004, 'r1': 0.9600368159999999, 'w1': 0.260339899, 'c2': 0.283860122, 'r2': 1.246763577, 'w2': 0.5300663529999999, 'benefit': -0.213267399, 'concentration': 0.149320863, 'inertia': -1.0, 'reliability': -1.0}

Rhodium also offers powerful plotting functionalities. For example, we can easily create a Parallel Axis plot of our data to visualize the trade-offs between objectives. The following script uses the parallel_coordinates function in Rhodium on our external dataset. Here, since parallel_coordinates takes a Rhodium model as input, we can: 1) define the external optimization problem as a Rhodium model, or 2) define a ‘dummy’ model that gives us just enough information to create plots. For the sake of simplicity, we will use the latter, but the first option is simple to set up if there exists a Python translation of your problem/model. Note, to access the scenario discovery and sensitivity analysis functionalities of Rhodium, it is necessary to create a real Rhodium Model.

# define a trivial "dummy" model in Rhodium with an arbitrary function
model = Model(lambda x: x)

# set up the model's objective responses to match the keys in your dataset
# here, all objectives are minimized
# this is the only information needed to create a parallel coordinate plot
model.responses = [Response("benefit", Response.MINIMIZE),
                   Response("concentration", Response.MINIMIZE),
                   Response("inertia", Response.MINIMIZE),
                   Response("reliability", Response.MINIMIZE)]

# create the parallel coordinate plot from the results of our external optimization
fig = parallel_coordinates(model, dataset, target="bottom",
                           brush=[Brush("reliability < -0.95"), Brush("reliability >= -0.95")])

How to make horizon plots in Python

Horizon plots were invented about a decade ago to facilitate visual comparison between two time series. They are not intuitive to read right away, but they are great for comparing and presenting many sets of timeseries together. They can take advantage of a minimal design by avoiding titles and ticks on every axis and packing them close together to convey a bigger picture. The example below shows percent changes in the price of various food items in 25 years.

The way they are produced and read is by dividing the values along the y axis in bands based on ranges. The color of each band is given by a divergent color map. By collapsing the bands to the zero axis and layering the higher bands on top, one can create a time-varying heatmap of sorts.


I wasn’t able to find a script that could produce this in Python, besides some code in this github repository, that is about a decade old and cannot really run in Python 3. I cleaned it up and updated the scripts with some additional features. I also added example data comparing USGS streamflow data with model simulation data for the same locations for 38 years. The code can be found here and can be used with any two datasets that one would like to compare with as many points of comparison as needed (I used eight below, but the script can accept larger csv files with more or less comparison points, which will be detected automatically). The script handles the transformation of the data to uniform bands and produces the following figure, with every subplot comparing model output with observations at eight gauges, i.e. model prediction error. When the model is over predicting the area is colored blue, when the area is underpredicting, the area is colored red. Darker shades indicate further divergence from the zero axis. The script automatically uses three bands for both positive or negative divergence, but more can be added, as long as the user defines additional colors to be used.

Using this type of visualization for these data allows for time-varying comparisons of multiple locations in the same basin. The benefit of it is most exploited with many subplots that make up a bigger picture.

Future extensions in this repository will include code to accept more file types than csv, more flexibility in how the data is presented and options to select different colormaps when executing.