Causality for Dummies

The name of this post speaks for itself – this will be a soft introduction to causality, and is intended to increase familiarity with the ideas, methods, applications and limitations of causal theory. Specifically, we will walk through a brief introduction of causality and compare it with correlation. We will also distinguish causal discovery from causal inference, the types of datasets commonly encountered and provide a (very) brief overview of the methods that can be applied towards both. This post also includes a list of commonly-encountered terms within the literature, as well as prior posts written on this topic.

Proceed if you’d like to be less of a causality dummy!

Introducing Causality

Note: The terms “event” and “variable” will be used interchangeably in this post.

Causality has roots and applications in a diverse set of field: philosophy, economics, mathematics, neuroscience – just to name a few. The earliest formalization for causality can be attributed to Aristotle, while modern causality as we understand it stem from David Hume, an 18th-century philosopher (Kleinberg, 2012). Hume argued that causal relationships can be inferred from observations and is conditional upon the observer’s beliefs and perceptions. Formally, however, causation can be defined as the contribution of an event to the production of other events. Causal links between events can be uni- or bi-directional – that is

  1. Event X causes event Y where the reverse is false, or
  2. Event X causes event Y where the reverse is true.

Such causal links have to be first detected and and then quantified to confirm the existence of a causal relationship between two events or variables, where the strength of these relationships have been measured using some form of a score-based strength measure (where the score has to exceed a specific threshold for it to be “counted” as a causal link), or using statistical models to calculate conditional independence.

Before we get down into the details on the methods that enable aforementioned quantification, let’s confront the elephant in the room:

Correlation vs Causation

The distinction between correlation and causation is often-muddled. Fortunately, there exists a plethora of analogies as to why correlation is not causation. My favorite so far is the global warming and pirates analogy, where increasing global mean temperature was found to be negatively correlated with a consistent decrease in the number of pirates. To demonstrate my point:

This plot is trying to tell you something that will save the Earth. (source: Baker, 2020)

…so does this mean that I should halt my PhD and join the dwindling ranks of pirates to save planet Earth?

Well, no (alas). This example demonstrates why relationships between such highly-correlated variables can be purely coincidental, with no causal links. Instead, there are external factors that may have actually the number of pirates to decrease, and independently, the global mean temperature to rise. Dave’s, Rohini’s and Andrew’s blog posts, as well as this website provide more examples that further demonstrate this point. Conversely, two variables that are causally linked may not be correlated. This can happen when there is a lack of change in variables being measured, sometimes caused by insufficient or manipulated sampling (Penn, 2018).

Correlation is limited by its requirement that the relationship between two variables be linear. It also does not factor time-ordering and time-lags of the variables’ time series. As a result, depending on correlation to assess a system may cause you to overlook the many relationships that might exist within it. Correlation is therefore a useful in making predictions, where trends in one variable can be used to predict the trends of another. Causation, on the other hand, can be used in making decisions, as it can help to develop a better understanding of the cause-and-effects of changes made to system variables.

Now that we’ve distinguished causation from its close (but troublesome) cousin, let’s begin to get into the meat of causality with some commons terms you might encounter as you being exploring the literature.

A quick glossary

The information from this section is largely drawn from Alber (2022), Nogueira et al. (2022), Runge et al. (2019), and Weinberger (2018). It should not be considered exhaustive, by any measure, and should only be used to get your bearings as you begin to explore causality.

Causal faithfulness (“faithfulness”): The assumption that causally-connected events be probabistically dependent on each other.

Causal Markov condition: The assumption that, in a graphical model, Event Y is independent of every other event, conditional on Y’s direct causes.

Causal precedence: The assumption that Event A causes Event B if A happens before B. That is, events in the present cannot have caused events in the past.

Causal sufficiency: The assumption that all possible direct common causes of an event, or changes to a variable, have been observed.

Conditional independence: Two events or variables X and Y are conditionally independent (it is known that X does not cause Y, and vice versa) given information on an additional event or variable Z.

Confounders: “Interfering” variables that influence both the dependent and independent variables, therefore making it more challenging, or confounding, to verify to presence of a causal relationship.

Contemporaneous links: A causal relationship that exists between variables in the same time step, therefore being “con” (with, the same time as) and “temporary”. The existence of such links is one instance in which the causal precedence assumption is broken.

Directed acyclic graphs (DAGs): A graphical representation of the causal relationships between a set of variables or events. These relationships are known to be, or assumed to be, true.

Edges: The graphical representation of the links connecting two variables or events in a causal network or graph. These edges may or may not represent causal links.

Granger causality: Event X Granger-causes Event Y if predicting Y based on its own past observations and past observations of X performs better than predicting Y solely on its own past observations.

Markov equivalence class: A set of graphs that represent the same patterns of conditional independence between variables.

Nodes: The graphical representation of two events (or changes to variables).

Causality: Discovery, Inference, Data and Metrics

Causality and its related methods are typically used for two purposes: causal discovery and causal inference. Explanations for both are provided below.

Causal Discovery

Also known as causal structural learning (Tibau et al., 2022), the goal of causal discovery is to obtain causal information directly from observed or historical data. Methods used for causal discovery do not assume implicit causal links between the variables within a dataset. Instead, they begin with the assumption of a “clean slate” and attempts to generate (then analyze) models to illustrate the inter-variable links inherent to the dataset thus preserving them. The end goal of causal discovery is to approximate a graph that represents the presence or absence of causal relationships between a set of two or more nodes.

Causal Inference

Causal inference uses (and does not generate) causal graphs, focusing on thoroughly testing the truth of a the causal relationship between two variables. Unlike causal discover, it assumes that a causal relationship already exists between two variables. Following this assumption, it tests and quantifies the actual relationships between variables in the available data. It is useful for assessing the impact of one event, or a change in one variable, on another and can be applied towards studying the possible effects of altering a given system. Here, causal inference should not be confused with sensitivity analysis. The intended use of sensitivity analysis is to map changes in model output to changes in its underlying assumptions, parameterizations, and biases. Causal inference is focused on assessing the cause-and-effect relationships between variables or events in a system or model.

Data used in causal discovery and causal inference

There are two forms of data that are typically encountered in literature that use causal methods:

Comparing cross-sectional data and longitudinal data (source: Scribbr).

Cross-sectional data

Data in this form is non-temporal. Put simply, all variables available in cross-sectional data represents a single point in time. It may contain observations of multiple individuals, where each observation represents one individual and each variable contains information on a different aspect of the individual. The assumption of causal precedence does not hold for such datasets, and therefore requires additional processing to develop causal links between variables. Such datasets are handled using methods that measure causality using advanced variations of conditional independence tests (more on this later). An example of a cross-sectional dataset is the census data collected by the U.S. Census Bureau once every decade.

Longitudinal data

This form of data is a general category that also contains time-series data, which consists of a series of observations about a single (usually) subject across some time period. Such datasets are relatively easy to handle compared to cross-sectional data as the causal precedence assumption is (often) met. Therefore, most causality methods can be used to handle longitudinal data, but some common methods include the basic forms of Granger causality, cross-convergent mapping (CCM), and fast conditional independence (FCI). An example of longitudinal data would be historical rainfall gauge records of precipitation over time.

Causality Methods

The information from this section largely originates from Runge et al. (2019), Noguiera et al. (2022), and Ombadi et al. (2020).

There are several methods can can be used to discover or infer causality from the aforementioned datasets. In general, these methods are used to identify and extract causal interactions between observed data. The outcomes of these methods facilitate the filtering of relevant drivers (or variables that cause observed outcomes) from the larger set of potential ones and clarify inter-variable relationships that are often muddied with correlations. These methods measure causality in one of two ways:

  1. Score-based methods: Such methods assign a “relevance score” to rank each proposed causal graph based on the likelihood that they accurately represent the conditional (in)dependencies between variables in a dataset. Without additional phases to refine their algorithms, these methods are computationally expensive as all potential graph configurations have to be ranked.
  2. Constraint-based methods: These methods employ a number of statistical tests to identify “necessary” causal graph edges, and their corresponding directions. While less computationally expensive than basic score-based methods, constraint-based causality methods are limited to evaluating causal links for one node (that represents a variable) at a time. Therefore, it cannot evaluate multiple variables and potential multivariate causal relationships. It’s computational expense is also proportional to the number of variables in the dataset.

Now that we have a general idea of what causality methods can help us do, and what , let’s dive into a few general classes of causality methods.

Granger Causality

Often cited as one of the earliest mathematical representations of causality, Granger causality is a statistical hypothesis test to determine if one time series is useful in forecasting another time series based in prediction. It was introduced by Clive Granger in the 1960s and has widely been used in economics since, and more recently has found applications in neuroscience and climate science (see Runge et al. 2018, 2019). Granger causality can be used to characterize predictive causal relationships and to measure the influence of system or model drivers on variables’ time series. These relationships and influences can be uni-directional (where X Granger-causes Y, but not Y G.C. X) or bi-directional (X G.C. Y, and Y G.C. X).

A time series demonstration on how variable X G.C. Y (source: Google).

Due to its measuring predictive causality, Granger causality is thus limited to systems with independent driving variables. It cannot be used to assess multivariate systems or systems in which conditional dependencies exist, both of which are characteristic of real-world stochastic and linear processes. It is also requires separability where the causal variable (the cause) has to be independent of the influenced variable (the effect), and assumes causal precedence. It is nonetheless a useful preliminary tool for causal discovery.

Nonlinear state-space methods

An alternative to Granger causality are non-linear state-space methods, which includes methods such as convergent cross mapping (CCM). Such methods assumes that variable interactions occur in an underlying deterministic dynamical system, and then attempts to uncover causal relationships based on Takens’ theorem (see Dave’s blog post here for an example) by reconstructing the nonlinear state-space. The key idea here is this: if event X can be predicted using time-delayed information from event Y, then X had a causal effect on Y.

Visualization on how nonlinear state-space methods reconstruct the dynamic system and identify causal relationships (source: Time Series Analysis Handbook, Chapter 6)

Convergent Cross Mapping (CCM)

Convergent Cross Mapping (Sugihara et al., 2012; Ye et al., 2015) tests the reliability of a variable Y as an estimate or predictor of variable X, revealing weak non-linear interactions between time series which might otherwise have been missed (Delforge et al, 2022). The CCM algorithm involves generating the system manifold M, the X-variable shadow manifold MX, and the Y-variable shadow manifold MY. The algorithm then samples an arbitrary set of nearest-neighbor (NN) points from MX, then determines if they correspond to neighboring points in MY. If X and Y are causally linked, they should share M as a common “attractor” manifold (a system state that can be sustained in the short-term). Variable X can therefore be said to inform variable Y .but not vice versa (unidirectionally lined). CCM can also be used to detect causality due to external forcings, where X and Y do not interact by may be driven by a common external variable Z. It is therefore best suited for causal discovery.

While CCM does not rely on conditional independence, it assumes the existence of a deterministic (but unknown) underlying system (i.e. a computer program, physics-based model) which can be represented using M. Therefore, it does not work well for stochastic time series (i.e. streamflow, coin tosses, bacteria population growth). The predictive ability of CCMs are also vulnerable to noise in data. Furthermore, it requires a long time series to de a reliable measure of causality between two variables, as a longer series decreases the NN-distance on each manifold thus improving the ability of the CCM to predict causality.

Causal network learning algorithms

On the flipside of CCMs, causal network learning algorithms assume that the underlying system in which the variables arise to be purely stochastic. These class of algorithms add or remove edges to causal graphs using criteria based in conditional or mutual independence, and assumes that both the causal Markov and faithfulness conditions holds true for all proposed graph structures. They are therefore best used for causal inference, and can be applied to cross-sectional data, as well as linear and non-linear time series.

These algorithms result in a “best estimate” graphs where all edges have the associated optimal conditional independences that best reflect observed data. They employ two stages: the skeleton discovery phase where non-essential links are eliminated, and the orientation phase where the directionality of the causal links are finalized. Because of this, these algorithms can be used to reconstruct large-scale, high-dimensional systems with interacting variables. Some of the algorithms in this class are also capable of identifying the direction of contemporaneous links, thus not being beholden to the causal precedence assumption. However, such methods can only estimate graphs up to a Markov equivalence class, and require a longer time series to provide a better prediction of causal relationships between variables.

General framework of all causal network learning algorithms (source: Runge et al., 2019).

Let’s go over a few examples of causal network learning algorithms:

The PC (Peter-Clark) algorithm

The PC algorithm begins will a fully connected graph in its skeleton phase an iteratively removes edges where conditional independences exists. It then orients the remaining edges in its orientation phase. It its earliest form, the PC algorithm was limited due to assumptions on causal sufficiency, and the lack of contemporaneous dependency handling. It also did not scale well to high-dimensional data.

Later variations attempted to overcome these limitations. For example, the momentary conditional independence PC (PCMCI) and the PCMCI+ algorithms added a further step to determine causal between variables in different timesteps and to find lagged and contemporaneous relationships separately, therefore handling contemporaneity. The PC-select variation introduced the ability to apply conditional independence tests on target variables, allowing it to process high-dimensional data. These variations can also eliminate spurious causal links. However, the PC algorithm and its variables still depend on the causal Markov, faithfulness, and sufficiency assumptions. The causal links that it detects are also relative to the feature space (Ureyen et al, 2022). This means that the directionality (or existence) of these links may change if new information is introduced to the system.

Full conditional independence (FCI)

Unlike the PC-based algorithms, FCI does not require that causal sufficiency be met although it, too, is based on iterative conditional independence tests and begins will a complete graph. Another differentiating feature between PC and FCI is the lack of the assumption of causal links directionality in the latter. Instead of a uni- or bi-directional orientation that the PC algorithm eventually assigns to its causal graph edges, the FCI has four edge implementations to account for the possibility of spurious links. Given variables X and Y, FCI’s edge implementations are as follows:

  1. X causes Y
  2. X causes Y or Y causes X
  3. X causes Y or there are unmeasured confounding variables
  4. X causes Y, Y causes X, there are unmeasured confounding variables, or some combination of both

There are also several FCI variations that allow improved handling of large datasets, high dimensionality, and contemporaneous variables. For example, the Anytime and the Adaptive Anytime FCI restricts the maximum number of variables to be considered as drivers, and the time series FCI (TsFCI) uses sliding windows to transform the original, long time series into a set of “independent” subsamples that can be treated as cross-sectional. To effectively use FCI, however, the data should be carefully prepared using Joint Causal Inference (JCI) to allow the generated graph to include both variable information and system information, to account for system background knowledge (Mooij et al., 2016).

Structural causal models (SCMs)

Similar to causal network learning algorithms, SCMs assumes a purely stochastic underlying system and uses DAGs to model the flow of information. It also can detect causal graphs to within a Markov equivalence class. Unlike causal network learning algorithms, SCMs structure DAGs that consist of a set of endogenous (Y) and exogenous (X) variables that are connected by a set of functions (F) that determine the values of Y based on the values of X. Within this context, a node represents the a variable x and y in X and Y, while an edge represents a function f within F. By doing so, SCMs enable the discovery of causal directions in cases where the causal direction cannot be inferred with conditional independence-based methods. SCMs can also handle a wide range of systems (linear, non linear, various noise probability distributions). This last advantage is one of the limitations of SCMs: it requires that some information on underlying structure of the system is known a priori (e.g. the system is assumed to be linear with at least one of the noise terms being drawn from a Gaussian distribution). SCMs are best used for causal inference, as causal links between variables have to be assumed during the generation of SCMs.

Information-theoretic algorithms

Finally, we have information-theoretic (IT) algorithms, which are considered an extension of the GC methods and allows the verification of nonlinear relationships between system variables, and therefore is best used for causal inference. IT algorithms measure transfer entropy (TE), which is defined as the amount of shared information between variables X and Y when both are conditioned on external variable Z. The magnitude of TE reflects the Shannon Entropy reduction in Y when, given Z, information on X is added to the system. For further information on IT and TE, Andrew’s blog post and Keyvan’s May 2020 and June 2020 posts further expand on the theory and application of both concepts.

There are a couple of assumptions that come along with the use of IT algorithms. First, like both SCMs and causal learning network algorithms, it assumes that the underlying system is purely stochastic. It is also bound to causal precedence, and asssumes that the causal variable X provides all useful information for the prediction of the effect Y, given Z. In addition, IT algorithms benefit from longer time series to improve predictions of causal links between variables. On the other hand, it does not make assumptions about the underlying structure of the data and can detect both linear and nonlinear causal relationships.

Prior WaterProgramming blog posts

That was a lot! But if you would like a more detailed dive into causality and/or explore some toy problems, there are a number of solid blog posts written that focus on the underlying math and concepts central to the approaches used in causal discovery and/or inference:

  1. Introduction to Granger Causality
  2. Introduction to Convergent Cross Mappint
  3. Detecting Causality using Convergent Cross Mapping: A Python Demo using the Fisheries Game
  4. Causal Inference Using Information-Theoretic Approaches
  5. Information Theory and the Moment-Independent Sensitivity Indices
  6. Milton Friedman’s thermostat and sensitivity analysis of control policies

Summary and key challenges

In this blog post, we introduced causality and compared it to correlation. We listed a glossary of commonly-used terms in causality literature, as well as distinguished causal discovery from causal inference. Next, we explored a number of commonly-used causality methods: Granger causality, CCM, conditional independence-based causal learning network algorithms, SCMs, and information-theoretic algorithms.

From this overview, it can be concluded that methods to discovery and infer causal relationships are powerful tool that enable us to identify cause-and-effect links between seemingly unrelated system variables. Improvements to these methods are pivotal to improve climate models, increase AI explainability, and aid in better, more transparent decision-making. Nevertheless, these methods face challenges (Tibau et al. 2022) that include, but are not limited to:

  1. Handling gridded or spatio-temporally aggregated data
  2. Representing nonlinear processes that may interact across time scales
  3. Handling non-Gaussian variable distributions and data non-stationarity
  4. Handling partial observability where only a subset of system variables is observed, thus challenging the causal sufficiency assumption
  5. Uncertainty: Non-stationarity, noise, internal variability
  6. Dealing with mixed data types (discriete vs continuous)
  7. Lack of benchmarking approaches due to lack of ground truth data

This brings us to the end of the post – do take a look at the References for a list of key literature and online articles that will be helpful as you begin learning about causality. Thank you for sticking with me and happy exploring!

References

Alber, S. (2022, February 9). Directed Acyclic Graphs (DAGs) and Regression for Causal Inference. UC David Health. Davis; California. Retrieved Match 14, 2023, from https://health.ucdavis.edu/ctsc/area/Resource-library/documents/directed-acyclic-graphs20220209.pdf

Baker, L. (2020, July 9). Hilarious graphs (and pirates) prove that correlation is not causation. Medium. Retrieved March 14, 2023, from https://towardsdatascience.com/hilarious-graphs-and-pirates-prove-that-correlation-is-not-causation-667838af4159

Delforge, D., de Viron, O., Vanclooster, M., Van Camp, M., & Watlet, A. (2022). Detecting hydrological connectivity using causal inference from time series: Synthetic and real Karstic case studies. Hydrology and Earth System Sciences, 26(8), 2181–2199. https://doi.org/10.5194/hess-26-2181-2022

Gonçalves, B. (2020, September 9). Causal inference - part IV - structural causal models. Medium. Retrieved March 13, 2023, from https://medium.data4sci.com/causal-inference-part-iv-structural-causal-models-df10a83be580

Kleinberg, S. (2012). A Brief History of Causality (Chapter 2) – Causality, Probability, and Time. Cambridge Core. Retrieved March 14, 2023, from https://www.cambridge.org/core/books/abs/causality-probability-and-time/brief-history-of-causality/C87F30B5A6F4F63F0C28C3156B809B9E

Mooij, J. M., Sara, M., & Claasen, T. (2022). Joint Causal Inference from Multiple Contexts. Journal of Machine Learning Research 21, 21(1). https://doi.org/https://doi.org/10.48550/arXiv.1611.10351

Nogueira, A. R., Pugnana, A., Ruggieri, S., Pedreschi, D., & Gama, J. (2022). Methods and tools for causal discovery and causal inference. WIREs Data Mining and Knowledge Discovery, 12(2). https://doi.org/10.1002/widm.1449

Ombadi, M., Nguyen, P., Sorooshian, S., & Hsu, K. (2020). Evaluation of methods for causal discovery in hydrometeorological systems. Water Resources Research, 56(7). https://doi.org/10.1029/2020wr027251

Penn, C. S., (2020, August 25). Can causation exist without correlation? Yes! Christopher S. Penn – Marketing Data Science Keynote Speaker. Retrieved March 14, 2023, from https://www.christopherspenn.com/2018/08/can-causation-exist-without-correlation/

Runge, J. (2018). Causal network reconstruction from time series: From theoretical assumptions to practical estimation. Chaos: An Interdisciplinary Journal of Nonlinear Science, 28(7), 075310. https://doi.org/10.1063/1.5025050

Runge, J., Bathiany, S., Bollt, E., Camps-Valls, G., Coumou, D., Deyle, E., Glymour, C., Kretschmer, M., Mahecha, M. D., Muñoz-Marí, J., van Nes, E. H., Peters, J., Quax, R., Reichstein, M., Scheffer, M., Schölkopf, B., Spirtes, P., Sugihara, G., Sun, J., Zscheischler, J. (2019). Inferring causation from time series in Earth System Sciences. Nature Communications, 10(1). https://doi.org/10.1038/s41467-019-10105-3

Sugihara, G., May, R., Ye, H., Hsieh, C.-hao, Deyle, E., Fogarty, M., & Munch, S. (2012). Detecting causality in complex ecosystems. Science, 338(6106), 496–500. https://doi.org/10.1126/science.1227079

Tibau, X.-A., Reimers, C., Gerhardus, A., Denzler, J., Eyring, V., & Runge, J. (2022). A spatiotemporal stochastic climate model for benchmarking causal discovery methods for teleconnections. Environmental Data Science, 1. https://doi.org/10.1017/eds.2022.11

Uereyen, S., Bachofer, F., & Kuenzer, C. (2022). A framework for multivariate analysis of land surface dynamics and driving variables—a case study for Indo-Gangetic River basins. Remote Sensing, 14(1), 197. https://doi.org/10.3390/rs14010197

Weinberger, N. (2017). Faithfulness, coordination and causal coincidences. Erkenntnis, 83(2), 113–133. https://doi.org/10.1007/s10670-017-9882-6

Ye, H., Deyle, E. R., Gilarranz, L. J., & Sugihara, G. (2015). Distinguishing time-delayed causal interactions using convergent cross mapping. Scientific Reports, 5(1). https://doi.org/10.1038/srep14750

Detecting Causality using Convergent Cross Mapping: A Python Demo using the Fisheries Game

This post demonstrates the use of Convergent Cross Mapping (CCM) on the Fisheries Game, a dynamic predator-prey system. To conduct CCM, we’ll use the causal_ccm Python package by Prince Joseph Erneszer Javier, which you can find here. This demo follows the basic procedure used in the tutorial for the causal_ccm package, which you can find here.

All code in this blog post can be found in a Jupyter notebook in this Github repo.

Convergent Cross Mapping

CCM is a technique for understanding causality in dynamical systems where the effects of causal variables cannot be separated or uncoupled from the variables they influence (Sugihara et al., 2012). Rohini wrote a great blog post about CCM in 2021, so I’ll just provide a quick summary here and refer readers to Rohini’s post for more details.

CM harnesses the idea that the dynamics of a system can be represented by an underlying manifold, which can be approximated using lagged information from the time series of each variable of interest. If variable X has a causal effect on variable Y, then information about X should be encoded in variable Y, and we can “recover” historical states of X from the historical time series of Y. Using this concept, CCM develops “shadow manifolds” for system variables, and examines the relationships between shadow manifolds using cross mapping, which involves sampling nearest-neighbor points in one manifold, and determining if they correspond to neighboring points in the other. For a more detailed explanation of CCM, I highly recommend reading Sugihara et al., 2012. I also found the series of videos created by the authors to be extremely helpful. You can find them here.

Simulating the fisheries predator-prey system

We’ll start by simulating the predator-prey system used in Hadjimichael et al., (2020). This system models the population dynamics of two species, a predator and a prey. The simulation uses two differential equations to model the dynamics of the predator-prey system:

\frac{dx}{dt} = bx(1-\frac{x}{K}) - \frac{\alpha x y}{y^{m }+ \alpha h x} - zx


\frac{dy}{dt} = \frac{c \alpha x y}{\alpha h x} -dy

Where x and y represent the prey and predator population densities, t represents the time in years, a is the prey availability, b is the prey growth rate, c is the rate at which prey is converted to new predators, d is the predator death rate, h is the time needed to consume prey (called the handling time), K is the carrying capacity for prey, m is the level of predator interaction, and z is the harvesting rate. In the cell below, we’ll simulate this system for a given set of environmental parameters (a, b etc.), and plot the dynamics over 120 time periods. For more details on the Fisheries system, see the training blog posts by Lillian and Trevor (part 0, part 1, part 2). There is also an interactive Jupyter notebook in our recently published eBook on Uncertainty Characterization for Multisector Dynamics Research.

import numpy as np
from matplotlib import pyplot as plt

# assign default parameters
tsteps = 120 # number of timesteps to simulate
a = 0.01 # prey availability
b = 0.25 # prey growth rate
c = 0.30 # rate that prey is converted to predator
d = 0.1 # predator death rate
h = 0.61 # handling time
K = 1900 # prey carrying capacity
m = .20 # predator interference rate

# create arrays to store predator and prey populations
prey = np.zeros(tsteps+1)
pred = np.zeros(tsteps+1)

# initial population levels
prey[0] = 28
pred[0] = 28

# harvesting, which we will keep at 0 for this example
z = np.zeros(len(prey))

# simulate the system
for t in range(tsteps):
    if prey[t] > 0 and pred[t] > 0:
        prey[t + 1] = (prey[t] + b * prey[t] * (1 - prey[t] / K) - (a * prey[t] * pred[t]) / (np.power(pred[t], m) +
                                                            a * h * prey[t]) - z[t] * prey[t])   # Prey growth equation
        pred[t + 1] = (pred[t] + c * a * prey[t] * pred[t] / (np.power(pred[t], m) + a * h *
                                                            prey[t]) - d * pred[t]) # Predator growth equation

# plot the polulation dynamics
fig = plt.figure(figsize=(6,6))
plt.plot(np.arange(tsteps+1), prey, label = 'Prey')
plt.plot(np.arange(tsteps+1), pred, label = 'Predator')
plt.legend(prop={'size': 12})
plt.xlabel('Timestep', size=12)
plt.ylabel('Population Size', size=12)
plt.title('Population Dynamics', size=15)
Figure 1: predator and prey population dynamics

With these parameters, we can visualize the trajectory and direction field of the system of equations, which is shown below (I’m not including ploting code here for brevity, but see this tutorial if you’re interested in making these plots). In CCM terms, this is a visualization of the underlying manifold of the dynamical system.

Figure 2: Trajectories and direction fields of the predator-prey system

Causal detection with CCM

From the system of equations above, we know there is a causal relationship between the predator and prey, and we have visualized the common manifold. Below, we’ll test whether CCM can detect this relationship. If the algorithm works as it should, we will see a clear indication that the two populations are causally linked.

To conduct CCM, we need to specify the number of dimensions to use for shadow manifolds, E, the lag time, tau, and the library size, L.

The CCM algorithm should converge to a stable approximation of causality as the library size increases. Below we’ll test library sizes from 10 to 100 to see if we achieve convergence for the Fisheries system. We’ll start by assuming shadow manifolds have two dimensions (E=2), and a lag time of one time step (tau=1). To test convergence, we’ll plot the correlation (rho) between the shadow manifold predictions and the historical states of the two variables.

from causal_ccm.causal_ccm import ccm
E = 2 # dimensions of the shadow manifold
tau = 1 # lag time
L_range = range(10, 100, 5) # test a range of library sizes
preyhat_Mpred, predhat_Mprey = [], [] # correlation list
for L in L_range:
    ccm_prey_pred = ccm(prey, pred, tau, E, L) # define new ccm object # Testing for X -> Y
    ccm_pred_prey = ccm(pred, prey, tau, E, L) # define new ccm object # Testing for Y -> X
    preyhat_Mpred.append(ccm_prey_pred.causality()[0])
    predhat_Mprey.append(ccm_pred_prey.causality()[0])

# Plot Cross Mapping Convergence
plt.figure(figsize=(6,6))
plt.plot(L_range, preyhat_Mpred, label='$\hat{Prey}(t)|M_{Prey}$')
plt.plot(L_range, predhat_Mprey, label='$\hat{Pred}(t)|M_{Pred}$')
plt.ylim([0,1])
plt.xlabel('Library Size', size=12)
plt.ylabel(r'$\rho$', size=12)
plt.legend(prop={'size': 12})
Figure 3: Convergence in cross mapping estimates

Figure 3 shows that CCM does seem to detect a causal relationship between the predator and prey (rho is far above 0), and the estimated strength of this relationship starts to stabilize (converge) with a library size of around 60.

Next, we’ll examine how our lag time (tau) used to construct the shadow manifolds impacts our findings.

from causal_ccm.causal_ccm import ccm
E = 2 # dimensions of the shadow manifold
L = 60 # length of library (we'll return to this later)

# first, test different lags for construction of the shadow manifolds
# we'll test lags from 0 to 30 time steps
preyhat_Mpred, predhat_Mprey = [], []  # lists to store correlation
for tau in range(1, 20):
    ccm_prey_pred = ccm(prey, pred, tau, E, L)  # define new ccm object # Testing for prey -> pred
    ccm_pred_prey = ccm(pred, prey, tau, E, L)  # define new ccm object # Testing for pred -> prey
    preyhat_Mpred.append(ccm_prey_pred.causality()[0]) # stores prey -> pred
    predhat_Mprey.append(ccm_pred_prey.causality()[0]) # stores pred -> prey

# plot the correlation for different lag times
plt.figure(figsize=(6,6))
plt.plot(np.arange(1,20), preyhat_Mpred, label='$\hat{Prey}(t)|M_{Prey}$')
plt.plot(np.arange(1,20), predhat_Mprey, label='$\hat{Pred}(t)|M_{Pred}$')
plt.ylim([0,1.01])
plt.xlim([0,20])
plt.xticks(np.arange(20))
plt.xlabel('Lag', size=12)
plt.ylabel(r'$\rho$', size=12)
plt.legend(prop={'size': 12})
Figure 4: The impact of lag time, tau, on CCM estimates

The results in Figure 4 indicate that a lag time of 1 (which we initially assumed) does not adequately capture the causal relationship between the two variables. When lag times are set between 5 and 10, CMM shows a much stronger relationship between the two variables. Using this information, we can again test the convergence across different library sizes.

E = 2 # dimensions of the shadow manifold
tau = 5
L_range = range(10, 100, 5)
preyhat_Mpred, predhat_Mprey = [], [] # correlation list
for L in L_range:
    ccm_prey_pred = ccm(prey, pred, tau, E, L) # define new ccm object # Testing for X -> Y
    ccm_pred_prey = ccm(pred, prey, tau, E, L) # define new ccm object # Testing for Y -> X
    preyhat_Mpred.append(ccm_prey_pred.causality()[0])
    predhat_Mprey.append(ccm_pred_prey.causality()[0])

# Plot Cross Mapping Convergence
plt.figure(figsize=(6,6))
plt.plot(L_range, preyhat_Mpred, label='$\hat{Prey}(t)|M_{Prey}$')
plt.plot(L_range, predhat_Mprey, label='$\hat{Pred}(t)|M_{Pred}$')
plt.ylim([0,1])
plt.xlabel('Library Size', size=12)
plt.ylabel(r'$\rho$', size=12)
plt.legend(prop={'size': 12})
Figure 5: With a lag of 5, CCM converges much faster, and detects a much stronger relationship between predator and prey

In the Figure 5, we observe that CCM with a lag size of 5 converges much faster and generates a stronger correlation than our original estimate using tau = 1. In fact, CCM can reconstruct the historical system states almost perfectly. To see why we can visualize the underlying shadow manifolds and cross mapping conducted for this analysis (this is conveniently available in the causal_ccm package with the visualize_cross_mapping function).

# set lag (tau) to 7 and examine results
tau = 5

# prey -> predator
ccm1 = ccm(prey,pred, tau, E, L) # prey -> predator
print("prey -> predator: " + str(ccm1.causality()[0]))

# visualize the cross mapping from the two shadow manifolds
ccm1.visualize_cross_mapping()
Figure 6: Shadow manifolds and cross mapping between prey (shown as X) and predator (shown as Y).

Figure 6 shows the two shadow manifolds for prey (X in these plots) and predator (Y in these plots). We observe that the shapes of the shadow manifolds preserve the general characteristics of the original manifold, as shown by the trajectories plotted in Figure 2. Figure 6 also shows the the nearest neighbor points sampled in each manifold (blue boxes) and their mapping to the other variable’s shadow manifold (red stars). We can see how similar points on one manifold correspond to similar points on the other in both directions.

We can also use the causal_ccm package to visualize the correlation between the prey and the CCM prey estimates, with very impressive results (Figure 7).

Figure 7: The relationship between observed populations and CCM estimates.

Concluding thoughts

This example demonstrates that CCM can indeed detect the causal relationship between predator and prey in this system and in fact, provides extremely accurate reconstructions of both population sets. This shouldn’t come as a surprise since we knew from the start that a strong causal relationship does exist within this system. Still, I find it almost unnerving how well a job CCM manages to do here. For more info on CCM and coding CCM, see the links below:

A great tutorial of CCM by the author of the causal_ccm Python package

Rohini’s blog post (with a demo of CCM in R)

Video links teaching CCM core concepts

References:

Hadjimichael, A., Reed, P., & Quinn, J. (2020). Navigating Deeply Uncertain Tradeoffs in Harvested Predator-Prey Systems. Complexity, 2020, 1-18. https://doi.org/10.1155/2020/4170453

Sugihara, G., May, R., Ye, H., Hsieh, C. H., Deyle, E., Fogarty, M., & Munch, S. (2012). Detecting causality in complex ecosystems. science338(6106), 496-500.