Causal Inference Using Information-Theoretic Approaches

In my previous blog post, I explained information theory. I also talked about the application of information theory in sensitivity analysis. In this blog post, I briefly explain how information theory can be used in causal inference of time-series data.

First, I’ll offer a little perspective on causal analysis. We’re often interested to know what factors can cause a specific phenomenon. For example, let’s say that we want to understand what environmental conditions can cause a significant flood event in a specific region. Also, let’s assume that we have a time series of flood, precipitation, soil moisture, snow cover in upstream mountains, and temperature. There are many ways that this problem can be addressed. We can use regression analysis and correlation, principle component analysis, and variance-based analysis approaches to find out which situation best explains the flood events. However, these methods can potentially overlook many of the relationships that might exist in complex systems. Although the methods may give you a statistically meaningful equation, they cannot provide any causal insights. For example, there could be a strong relationship between the flood and precipitation two weeks prior to the flood event, or there might even be a relationship between snowfalls in winter and flood. Information-theoretic causal analysis can provide an alternative approach to exploring questions like these. Such methods have been used in many research areas, including environmental science (Rouge et al., 2019), neuroscience (Tononi et al., 2016), and economics (Yao and Li 2020).

Readers of this blog post can refer to these papers (Rouge et al., 2019, Goodwell et al., 2020, and Schreiber 2000) for further details about the basics and applications of information-theoretic causal analysis in different areas of science.

There are two schools of thought in these causal analyses (Goodwell et al., 2020): (1) Pearl causality and (2) Granger causality. Pearl causality was introduced by Judea Pearl (here) and is based on the idea that interventions in a complex system provide information that can potentially lead to a causal inference. However, Granger causality (here) focuses on exploring the transfer of information from the past to current states of the system. This blog post concentrates on Granger causality.

Many methods have been developed and used to study how factors can Granger-cause other variables. For example, the simplest way, which I discussed in the previous blog post, is mutual information. It is a pairwise causal-analysis method and basically tells us which lag times of variable X provide information to the current state of variable Y. In other words, we gain information about which time lag of variable X can improve the prediction of variable Y. However, mutual information alone is not informative enough for problems that deal with multiple variables and variables that depend on their own conditions at previous time-steps—conditions that usually exist in the real world. Partial information decomposition provides a way to incorporate unique information that each variable provides as well as mutual and redundant information that each variable can provide. Transfer entropy (Schreiber 2000), on the other hand, is for a pairwise analysis that also takes into account the information transfer from the previous time-step of that variable. In mathematical language,

Where, Y and X are time series, t denotes time-step, and τ is the time lag.


In this example, I am using an R package called “TransferEntropy” that calculates transfer entropy. There is a Python package called “TIGRAMITE” that has been developed for causal analysis of time-series data. The following will install and load the R library.


Then, you can use the following to activate an R dataset that includes information about the European stock market. This is the dataset that we are using to see if data about past states of two stock indicators—FTSE and SMI—were systematically related.


The following code can be used to calculate the transfer entropy:, header=T)
TE<-transfer_entropy(x=data_TE$SMI, y=data_TE$FTSE, 
                 lx = 1, ly = 1, q = 0.1, 
                 entropy = c('Shannon', 'Renyi'), shuffles = 100, 
                 type = c('quantiles', 'bins', 'limits'),
                 quantiles = c(5, 95), bins = NULL, limits = NULL,
                 nboot = 300, burn = 50, quiet = FALSE, seed = NULL)

And here is the output that I got:

The results show that there is a significant relationship between the two variables (in both directions), as the following figure also indicates.

Finally, studies have also discussed potential issues that causal inference using these methods could create. For example, James et al. (2016) reported a poor level of reliability of these methods and suggested network science–oriented analysis as an alternative for causal inference.

Getting started with API requests in Python

This is an introductory blogpost on how to work with Application Programming Interfaces (APIs) in Python. Seeing as this is our blog’s first post on this topic I’ll spend some time explaining some basic information I’ve had to learn in the process of doing this and why it might be useful in your research. There are several blogs and websites with tutorials online and there’s no point repeating, but I’ll try to explain this for an audience like me, i.e., (a) has no formal training in computer science/web servers/HTTP but competent with scripting, and (b) interested in using this for research purposes.

What is an API?

Many sites (e.g., Facebook, Twitter, many many more) make their data available through what’s called Application Programming Interfaces or APIs. APIs basically allow software to interact, and send and receive data from servers so as to typically provide additional services to businesses, mobile apps and the like, or allow for additional analysis of the data for research or commercial purposes (e.g., collecting all trending Twitter hashtags per location to analyze how news and information propagates).

I am a civil/environmental engineer, what can APIs do for me?

APIs are particularly useful for data that changes often or that involves repeated computation (e.g., daily precipitation measurements used to forecast lake levels). There’s a lot of this kind of data relevant for water resources systems analysts, easily accessible through APIs:

I’m interested, how do I do it?

I’ll demonstrate some basic information scripts using Python and the Requests library in the section below. There are many different API requests one can perform, but the most common one is GET, which is a request to retrieve data (not to modify it in any way). To retrieve data from an API you need two things: a base URL and an endpoint. The base URL is basically the static address to the API you’re interested in and an endpoint is appended to the end of it as the server route used to collect a specific set of data from the API. Collecting different kinds of data from an API is basically a process of manipulating that endpoint to retrieve exactly what is needed for a specific computation. I’ll demo this using the USGS’s API, but be aware that it varies for each one, so you’d need to figure out how to construct your URLs for the specific API you’re trying to access. They most often come with a documentation page and example URL generators that help you figure out how to construct them.

The base URL for the USGS API is

I am, for instance, interested in collecting streamflow data from a gage near my house for last May. In Python I would set up the specific URL for these data like so:

response = requests.get("¶meterCd=00060&siteType=ST&siteStatus=all")

For some interpretation of how this was constructed, every parameter option is separated by &, and they can be read like so: data in json format; indented so I can read them more easily; site number is the USGS gage number; start and end dates; a USGS parameter code to indicate streamflow; site type ‘stream’; any status (active or inactive). This is obviously the format that works for this API, for a different API you’d have to figure out how to structure these arguments for the data you need. If you paste the URL in your browser you’ll see the same data I’m using Python to retrieve.

Object response now contains several attributes, the most useful of which are response.content and response.status_code. content contains the content of the URL, i.e. your data and other stuff, as a bytes object. status_code contains the status of your request, with regards to whether the server couldn’t find what you asked for or you weren’t authenticated to access that data, etc. You can find what the codes mean here.

To access the data contained in your request (most usually in json format) you can use the json function contained in the library to return a dictionary object:

data = response.json()

The Python json library is also very useful here to help you manipulate the data you retrieve.

How can I use this for multiple datasets with different arguments?

So obviously, the utility of APIs comes when one needs multiple datasets of something. Using a Python script we can iterate through the arguments needed and generate the respective endpoints to retrieve our data. An easier way of doing this without manipulating and appending strings is to set up a dictionary with the parameter values and pass that to the get function:

parameters = {"format": 'json', "indent": 'on',
              "sites": '04234000', "startDT": '2020-05-01',
              "endDT": '2020-05-31', "parameterCd":'00060',
              "siteType": 'ST', "siteStatus":'all'}
response = requests.get("", 

This produces the same data, but we now have an easier way to manipulate the arguments in the dictionary. Using simple loops and lists to iterate through one can retrieve and store multiple different datasets from this API.

Other things to be aware of

Multiple other people use these APIs and some of them carry datasets that are very large so querying them takes time. If the server needs to process something before producing it for you, that takes time too. Some APIs limit the number of requests you can make as a result and it is general good practice to try to be as efficient as possible with your requests. This means not requesting large datasets you only need parts of, but being specific to what you need. Depending on the database, this might mean adding several filters to your request URL to be as specific as possible to only the dates and attributes needed, or combining several attributes (e.g., multiple gages in the above example) in a single request.

The New and Improved Borg R Wrapper

This semester, I have had the opportunity to build a multi-objective optimization model in R that uses the Borg evolutionary algorithm framework for the optimization component to discover tradeoffs of the modeled system. The R wrapper historically has been the most under-developed of the wrappers offered for Borg, which has primarily been limiting for visiting students or groups on our campus in atmospheric or biological  science that tend to use R. While fuller/faster Borg capabilities may be realized by transitioning to C++ or Python, I spent some time fixing up the R wrapper for serial Borg in an effort to help broaden our target users.

Acquiring Access to Borg

To request access to the Borg MOEA, complete step 2 of Jazmin’s introduction to Borg, found here.  Once you receive access, within the Borg folder, navigate to /plugins/R. You will find borg.R (the borg R wrapper), DTLZ2.R (a test function), and a script called test.R that calls the wrapper and performs an optimization.

Setting up the Borg R Wrapper

The R wrapper pulls out relevant files, libraries, and functions from borg.c, borg.dll, and, so these files must be in the same directory as the above files in order for the R wrapper to work. The R wrapper requires an R package called rdyncall, which allows R to call C libraries that Borg uses. This package is unfortunately not offered within the current CRAN repository, so you can find an archived version of the package here and download it manually. In order to install a tar.gz package, you can do so using the following R command: install.packages(“directory/to/package”, type=”source”). This command has other variants that work specifically for Windows vs. Mac vs. Unix users and has given other students trouble in the past, so feel free to reach out if you can’t get this step to work. Once this is done, you can run test.R and get your optimization results. The way the wrapper is set up, it will return the decision variables and objective function values in a dataframe called “result”.

Running Multiple Seeds

The R wrapper is set up to only run with Borg’s default parameterization, so you have to go into the wrapper and adjust Borg’s random state in order to facilitate using a different random initialization of Borg’s population or operator dynamics. Borg is a “seeded” algorithm, so the initialization of the algorithm is determined by a  number shown in “rngstate”. This allows the runs to be reproducible, if the same seed is specific. In order to facilitate new seeds, you can just change this number. If we want to run 50 seeds, we wouldn’t want to change this number manually, so instead I subtract a variable “s” that will run from 1-50 depending on the seed.


We can feed this number in as an argument to specify seeds. Assuming you want to run this on the cluster, a bash script can be written to loop through the seeds and fed in as inputs into the test.R script as shown below:

bash scrip

The variable, “s” must be defined in your test.R script at the top, in the correct argument number. This number happens to be the 6th value that is found when calling commandArgs( ). This may differ across computers, so it’s important to check that you’re pulling in the correct argument. Provided you are running the seeds in parallel, you’ll also want to save the result in a file for its respective seed. The arguments into the Borg call are (number of variables, number of objectives, number of constraints, the name of the function that you are trying to optimize in DTLZ2.R which is named “DTLZ2”, NFE, and epsilons for each objective).


As shown at the top of DTLZ.R, you need to actually define nvars and nobjs and nconstrs if you have them. Objective functions must be listed as obj[1]=… etc. and the function must return(objs, constrs) if necessary.


Creating Runtime Files

Another important tool that has been missing is the creation of runtime files which allow you to track how the Pareto front is evolving and to get a sense of convergence. In order to implement your own runtime files, you need to navigate to the part of the wrapper where the Borg MOEA is actually running (Lines 314-318). Inside this while loop, you need to implement the following statement block. First initialize the runtime file and write the problem characteristics at the time. Then, create a runtime vector that outlines the frequency that you want your output. Here I’m running 1000 FE and I want the runtime every 100 FE. Because Borg can be a little fickle and not print at the specific FE that you want, I specify a range of +2 around each value, so that if something doesn’t print quite at 100 FE, you’ll likely get output at 101 or 102 FE. When running jobs on the Cube, it’s important to have your output file stored somewhere that any external node can access; therefore I create my file in my scratch directory instead of my home directory.


When you land on a FE value in your output frequency range, you have to use the appropriate getter functions to first get the archive and get the variables and objectives out of that result. The NFE is printed along with the archive and then a # is added at the end to separate out solutions. An example runtime file is shown below.


This style of file might not be the easiest to pull data from, but it’s the structure that is necessary to be recognized by MOEAFramework to create a reference set and runtime metrics. Happy coding!

Visualizing the diversity of a Pareto front approximation using RadVis

During many-objective search we seek to find approximate Pareto fronts that are convergent, meaning they are close to the “true” Pareto front, and diverse, meaning they contain solutions that they uniformly cover the full range of the “true” front. This post will focus on examining the latter criteria for high dimensional problems using Radial Visualization (RadVis; Hoffman et al., 1997), a visualization technique that transforms a solution set into radial coordinates. I’ll discuss the theory behind RadVis, provide some example code, and discuss potential applications, limitations and extensions.


When performing runtime diagnostics (a subject of two of my recent posts, which you can find here and here), we often asses MOEA performance using hypervolume because it captures both the convergence and diversity of the approximate Pareto front. Though tracking hypervolume provides us with an aggregate measure of diversity, it does not provide information on how the solutions are spread across the objective space. For low dimensional problems (two or three objectives), the diversity of the approximation set can easily be discerned through visualization (such as 2-D or 3-D scatter plots). For many-objective problems however, this becomes challenging. Tools such as parallel coordinate plots, bubble plots and glyph plots can allow us to visualize the solutions within an approximation set in their full dimensionality, but they are limited in their ability to convey information on diversity in high dimensional space. An alternative strategy for examining diversity within a solution set is to transform the objective space into a form coordinate system that highlights diversity. One common transformation is to view the set using radial coordinates.


RadVis uses an analogy from physics to plot many dimensional data. Each of n-objectives are plotted uniformly around the circumference of a unit circle while, each solution is represented as a point that is attached to all objectives by a set of n hypothetical springs (one spring is attached to each objective-solution pair). A solution’s objective value for the ith objective defines the spring constant exerted by the ith spring. A point’s location thus represents the equilibrium point between all springs (Hoffman et al., 1997). For example, a point that has equal objective values for all n-objectives will lie directly in the center of the plot, while a point that maximizes one objective and minimizes all others will lie on the circumference of the circle at the location specified for the maximized objective.

Below, I’ve plotted RadVis plots for a 5 objective instance of DTLZ2 at runtime snapshots from 1,000 function evaluations (NFE), 5,000 NFE, 10,000 NFE and 50,000 NFE. Notice how the approximate front for 1,000 NFE is sparse and skewed toward one side, while the front for 50,000 NFE is fairly uniform and centered. DTLZ2 is a test problem with a known true Pareto front which is continuous and uniformly distributed, using this information, we can see that the algorithm has likely improved its approximation set over the course of the run.

RadVis representations of the approximate Pareto set of a 5 objective instance of the DTLZ2 test problem. Each subplot represents a different runtime snapshot.

I coded this example using the Yellowbrick machine learning visualization toolkit from sklearn. Code for this example has been added to my runtime diagnostics Github repository, which can be found here.

Caution! RadVis provides no information on convergence

It’s important to note that RadVis provides no information about solution convergence since the location of each point is dependent on equilibrium between all objectives. A set of solutions that performs poorly across all objectives will look similar to a set of solutions that performs well across all objectives. For this reason, RadVis should never be used alone to assess the quality of a Pareto front approximation. To include convergence information in RadVis, Ibrahim et al,. (2016) and Ibrahim et al., 2018 have proposed 3d-RadVis and 3D-RadVis antenna, extensions of RadVis methodology which augment Radvis by providing convergence information to decision makers. These methodologies may be subject to future posts.


Hoffman, P., Grinstein, G., Marx, K., Grosse, I., & Stanley, E. (1997, October). DNA visual and analytic data mining. In Proceedings. Visualization’97 (Cat. No. 97CB36155) (pp. 437-441). IEEE.

Ibrahim, A., Rahnamayan, S., Martin, M. V., & Deb, K. (2016, July). 3D-RadVis: Visualization of Pareto front in many-objective optimization. In 2016 IEEE Congress on Evolutionary Computation (CEC) (pp. 736-745). IEEE.

Ibrahim, A., Rahnamayan, S., Martin, M. V., & Deb, K. (2018). 3D-RadVis Antenna: Visualization and performance measure for many-objective optimization. Swarm and evolutionary computation, 39, 157-176.

Displaying Interactions with chordDiagram in R

Sensitivity analysis is a powerful tool to find out which parameters in a model have the largest effect on the results. Besides the impact of the main parameters, exploring the interactions between parameters is also important because complex systems usually have many active interactions. Learning from them will improve our understanding of how a model works at different underlying conditions. This could potentially lead to improving our inferences of the model’s results and model evaluation. Previously, Antonio showed how to create a Radial convergence plot for visualizing Sobol indices in Python. In this blog post, I am going to use a library in R to create a Chord diagram for Sobol interaction indices for annual baseflow. Then, I will create an animated GIF file that shows the interactions for a few years. The dataset (download from here) that I am using came from the simulation of an agro-biophysical model that has 33 parameters.

baseflow_S2<- read.csv(paste("...(your path).../sobol_ Baseflow_S2_1989.csv",sep=""))
baseflow_S2$X<- NULL

To work with this library, the format of the data should be a matrix or list.

baseflow_S2<- as.matrix(baseflow_S2)

As I mentioned above, the interactions are between 33 parameters, so we have 33 rows and columns in this dataset. I am going to assign a name to each row and column based on the actual order in the dataset:

rownames(baseflow_S2) = c("H_1","H_2","H_3","H_4","B_1","B_2","B_3","B_4","B_5","W_4","W_5","T_4","W_1","W_2","W_6","W_7","Phy_1","Phy_2","T_1","T_2","W_3","Phy_3","Phy_4","T_3","Phy_5","Phy_6","H_5","Ph_1","Ph_2","Ph_3","Ph_4","Ph_5","Ph_6")                   
colnames(baseflow_S2)= c("H_1","H_2","H_3","H_4","B_1","B_2","B_3","B_4","B_5","W_4","W_5","T_4","W_1","W_2","W_6","W_7","Phy_1","Phy_2","T_1","T_2","W_3","Phy_3","Phy_4","T_3","Phy_5","Phy_6","H_5","Ph_1","Ph_2","Ph_3","Ph_4","Ph_5","Ph_6") 	       

Now, we are going to assign a color for each parameter. For better visualization and interpretability of the results, these parameters are classified into six main groups (H, Ph, B, T, W, and Phy) based on how they are related to the model. I am going to assign the same color to all of the parameters within each group.


Now we can look at the plot:

interactions_plot<- chordDiagram(baseflow_S2,grid.col = grid.col )

As we see, there are many small interactions between parameters. Therefore, you may want to set a limit to show some specific interactions that are large, or you may just want to clean up some of the interactions that are less than specific values. To do this task, we can use the result of the above function without any more adjustments in order to get the final format of a dataset that is created within the function. Then, we can modify it based on what we want to show. The “interactions_plot” is actually a plot, but you can also look at its data frame. The “col” column has the color code that is assigned to each interaction. We can extract that column.

 col2<- interactions_plot$col

Then get the index of values that are less than for example 0.005:

idx <- which(interactions_plot$value2 < 0.005, arr.ind = TRUE)

And we can replace the actual color in “col2” with “transparent” for those rows for which their index was selected above.

col2[idx] <- 'transparent'

Now, we can create a final graph with some more adjustments: with “order,” we can manually sort a grid (sectors). “link.sort” and “link.decreasing” are used to sort the links based on the value of the interaction (which is shown by the width of the sector). This might help in detecting interesting interactions: “annotationTrack” set to “grid”, print the sectors, “preAllocateTracks”, specified the track, “circos.track” creates plotting regions for a track: track.index is the index for the track, which is going to be updated. “” is used to add graphics and customize sector labels (x and y correspond to the data points in each cell).

chordDiagram(baseflow_S2,order = c("H_1","H_2","H_3","H_4","H_5","Ph_1","Ph_2","Ph_3","Ph_4","Ph_5","Ph_6","B_1","B_2","B_3","B_4","B_5","T_1","T_2","T_3","T_4","W_1","W_2","W_3","W_4","W_5","W_6","W_7","Phy_1","Phy_2","Phy_3","Phy_4","Phy_5","Phy_6"),grid.col = grid.col ,col=col2, link.sort = TRUE, link.decreasing = TRUE,annotationTrack = "grid", preAllocateTracks = list(1))
    circos.track (track.index = 1, = function(x, y) {
      xlim ="xlim")
      ylim ="ylim") ="sector.index")
      circos.text(mean(xlim), ylim[1]+.1,, facing = "clockwise", niceFacing = TRUE, adj = c(-0.5,0.5),cex = 0.6)
      circos.axis(h = "top", labels.cex = 0.6, major.tick.percentage = 0.2, track.index = 2)}, bg.border = NA) 

We may want to add another layer to highlight a specific sector and add a group name. In this case, we can use “highlight.sector” to specify the color, label, and components of this new layer. With “padding,” we can also change the width of this layer.

highlight.sector(rownames(baseflow_S2[c("H_1","H_2","H_3","H_4","H_5"),]), track.index = 1, col = "deepskyblue4",text = "H", cex = 0.8, text.col = "black", niceFacing = TRUE,padding = c(-0.9, 0, 0.27, 0),font=2) 

highlight.sector(rownames(baseflow_S2[c("Ph_1","Ph_2","Ph_3","Ph_4","Ph_5","Ph_6"),]), track.index = 1, col = "darkorange3", text = "PH", cex = 0.8, text.col = "black", niceFacing = TRUE,padding = c(-0.9, 0, 0.27, 0),font=2)

highlight.sector(rownames(baseflow_S2[c("T_1","T_2","T_3","T_4"),]), track.index = 1, col = "coral2", text = "T", cex = 0.8, text.col = "black", niceFacing = TRUE,padding = c(-0.9, 0, 0.27, 0),font=2)   

highlight.sector(rownames(baseflow_S2[c("Phy_1","Phy_2","Phy_3","Phy_4","Phy_5","Phy_6"),]), track.index = 1, col = "darkgreen",text = "PHY", cex = 0.8, text.col = "black", niceFacing = TRUE,padding = c(-0.9, 0, 0.27, 0),font=2)

highlight.sector(rownames(baseflow_S2[c("B_1","B_2","B_3","B_4","B_5"),]), track.index = 1, col = "brown4", text = "B", cex = 0.8, text.col = "black", niceFacing = TRUE,padding =c(-0.9, 0, 0.27, 0),font=2) 

highlight.sector(rownames(baseflow_S2[c("W_1","W_2","W_3",	"W_4","W_5","W_6","W_7"),]), track.index = 1, col = "cyan3",text = "W", cex = 0.8, text.col = "black", niceFacing = TRUE,padding = c(-0.9, 0, 0.27, 0),font=2)

And here is the final plot:

To save the graph use this line of code before calling the chordDiagram():

png(file = paste("…(your path)…./sobol_Baseflow_S2_1989.png",sep="") ,height = 7, width = 7, units = "in", res = 500) 

And at the end run Now, I am going to run this code for all of the other years and save them to make a GIF file that shows the interactions for all years. To make an animation from these plots, first we need to install “magick package.”

This simple code reads the first image and then adds the rest of the images to that in a loop. You can also add a message for each image. (Here, I added the year.) The “delay” option is used to adjust the delay after each frame.   

img <- image_read(path = paste0("...(your path).../sobol_Baseflow_S2_1989.png"))

for(Y in 1:nrow(years)) {
img0 <- image_read(path = paste0("...(your path).../sobol_Baseflow_S2_",years[Y,],".png"))
img <- c(img, img0) 
img1 <- image_scale(image = img, geometry = "720x720")
ani0 <- image_animate(image = img1, delay = 200)
image_write(image = ani0, path = paste0("...(your path).../sobol_Baseflow_S2.gif"))