Causal Inference Using Information-Theoretic Approaches

In my previous blog post, I explained information theory. I also talked about the application of information theory in sensitivity analysis. In this blog post, I briefly explain how information theory can be used in causal inference of time-series data.

First, I’ll offer a little perspective on causal analysis. We’re often interested to know what factors can cause a specific phenomenon. For example, let’s say that we want to understand what environmental conditions can cause a significant flood event in a specific region. Also, let’s assume that we have a time series of flood, precipitation, soil moisture, snow cover in upstream mountains, and temperature. There are many ways that this problem can be addressed. We can use regression analysis and correlation, principle component analysis, and variance-based analysis approaches to find out which situation best explains the flood events. However, these methods can potentially overlook many of the relationships that might exist in complex systems. Although the methods may give you a statistically meaningful equation, they cannot provide any causal insights. For example, there could be a strong relationship between the flood and precipitation two weeks prior to the flood event, or there might even be a relationship between snowfalls in winter and flood. Information-theoretic causal analysis can provide an alternative approach to exploring questions like these. Such methods have been used in many research areas, including environmental science (Rouge et al., 2019), neuroscience (Tononi et al., 2016), and economics (Yao and Li 2020).

Readers of this blog post can refer to these papers (Rouge et al., 2019, Goodwell et al., 2020, and Schreiber 2000) for further details about the basics and applications of information-theoretic causal analysis in different areas of science.

There are two schools of thought in these causal analyses (Goodwell et al., 2020): (1) Pearl causality and (2) Granger causality. Pearl causality was introduced by Judea Pearl (here) and is based on the idea that interventions in a complex system provide information that can potentially lead to a causal inference. However, Granger causality (here) focuses on exploring the transfer of information from the past to current states of the system. This blog post concentrates on Granger causality.

Many methods have been developed and used to study how factors can Granger-cause other variables. For example, the simplest way, which I discussed in the previous blog post, is mutual information. It is a pairwise causal-analysis method and basically tells us which lag times of variable X provide information to the current state of variable Y. In other words, we gain information about which time lag of variable X can improve the prediction of variable Y. However, mutual information alone is not informative enough for problems that deal with multiple variables and variables that depend on their own conditions at previous time-steps—conditions that usually exist in the real world. Partial information decomposition provides a way to incorporate unique information that each variable provides as well as mutual and redundant information that each variable can provide. Transfer entropy (Schreiber 2000), on the other hand, is for a pairwise analysis that also takes into account the information transfer from the previous time-step of that variable. In mathematical language,

Where, Y and X are time series, t denotes time-step, and τ is the time lag.


In this example, I am using an R package called “TransferEntropy” that calculates transfer entropy. There is a Python package called “TIGRAMITE” that has been developed for causal analysis of time-series data. The following will install and load the R library.


Then, you can use the following to activate an R dataset that includes information about the European stock market. This is the dataset that we are using to see if data about past states of two stock indicators—FTSE and SMI—were systematically related.


The following code can be used to calculate the transfer entropy:, header=T)
TE<-transfer_entropy(x=data_TE$SMI, y=data_TE$FTSE, 
                 lx = 1, ly = 1, q = 0.1, 
                 entropy = c('Shannon', 'Renyi'), shuffles = 100, 
                 type = c('quantiles', 'bins', 'limits'),
                 quantiles = c(5, 95), bins = NULL, limits = NULL,
                 nboot = 300, burn = 50, quiet = FALSE, seed = NULL)

And here is the output that I got:

The results show that there is a significant relationship between the two variables (in both directions), as the following figure also indicates.

Finally, studies have also discussed potential issues that causal inference using these methods could create. For example, James et al. (2016) reported a poor level of reliability of these methods and suggested network science–oriented analysis as an alternative for causal inference.

2 thoughts on “Causal Inference Using Information-Theoretic Approaches

    • Thanks for the comment, and thanks for sharing these resources. While the primary focus of this post was the information theory-based methods, what you mentioned is also a fascinating topic to explore in the future.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s