Exploring time-evolving vulnerability with the newly published interactive tutorial in the Addressing Uncertainty in MultiSector Dynamics Research eBook

We recently published two new Jupyter Notebook tutorials as technical appendices to our eBook on Addressing Uncertainty in MultiSector Dynamics Research developed as part of the Integrated MultiSector, Multiscale Modeling (IM3) project, supported by the Department of Energy Office of Science’s MultiSector Dynamics Program. The eBook provides an overview of diagnostic modeling, perspectives on model evaluation, and a framework for basic methods and concepts used in sensitivity analysis. The technical appendices demonstrate key concepts introduced in the text and provide example Python code to use as a starting point for future analysis. In this post, I’ll discuss the concepts introduced in one of the new Python tutorials, Time-evolving scenario discovery for infrastructure pathways. This post will focus on the notebook’s connections to topics in the main text of the eBook rather than detailing the code demonstrated in the notebook. For details on the test case and code used in the tutorial, see the tutorial I posted last year.

In this post, I’ll first give a brief overview of the example water supply planning test case used in the tutorial, then discuss the methodological steps used to explore uncertainty and vulnerability in the system. The main topics discussed in this post are the design of experiments, factor mapping, and factor prioritization.

The Bedford-Greene water supply test case

The Bedford-Greene infrastructure investment planning problem (Figure 1) is a stylized water resources test case designed to reflect the challenges faced when evaluating infrastructure systems that evolve over time and are subject to uncertainty system inputs. The Bedford-Greene system contains two water utilities developing an infrastructure investment and management strategy to confront growing water demands and climate change. The test case was chosen for an eBook tutorial because it contains complex and evolving dynamics driven by strongly coupled human-natural system interactions. To explore these dynamics, the tutorial walks through an exploratory modeling experiment that evaluates how a large ensemble of uncertainties influences system performance and how these influences evolve over time.

In the Bedford-Greene system, modelers do not know or cannot agree upon the probability distributions of key system inputs, a condition referred to as “deep uncertainty” (Kwakkel et al., 2016). In the face of deep uncertainties, we perform an exploratory experiment to understand how a large ensemble of future scenarios may generate vulnerability for the water supply system.

Figure 1: The Bedford-Greene Water Resources Test Case

Setting up the computational experiment

The first step in the exploratory experiment is to define which factors in the mathematical model of the system are considered uncertain and how to sample the uncertainty space. The specific uncertain factors and their variability can be elicited through expert opinion, historical observation, values in literature, or physical meaning. In the Bedford-Greene system, we define 13 uncertain factors drawn from literature on real-world water supply systems (Gorelick et al., 2022). The uncertain factors used in this tutorial can be found in Table 1.

Factor NamePlausible RangeDescription
Near-Term Demand Growth Rate Mult.-0.25 to 2.0A scaling factor on projected demand growth over the first 15 years of the planning period
Mid-Term Demand Growth Rate Mult.-0.25 to 2.0A scaling factor on projected demand growth over the second 15 years of the planning period
Long-Term Demand Growth Rate Mult.-0.25 to 2.0A scaling factor on projected demand growth over the third15 years of the planning period
Bond Term0.8 to 1.2A scaling factor on the number of years over which infrastructure capital costs are repaid as debt service
Bond Interest Rate0.6 to 1.2A scaling factor that adjusts the fixed interest rate on bonds for infrastructure
Discount Rate0.8 to 1.2The rate at which infrastructure investment costs are discounted over time
Restriction Efficacy0.8 to 1.2A scaling factor on how effective water use restrictions are at reducing demands
Infrastructure Permitting Period0.75 to 1.5A scaling factor on estimated permitting periods for infrastructure projects
Infrastructure Construction Time1 to 1.2A scaling factor on estimated construction times for infrastructure projects
Inflow Amplitude0.8 to 1.2A sinusoidal scaling factor to apply non-stationary conditions to reservoir inflows
Inflow Frequency0.2 to 0.5A sinusoidal scaling factor to apply non-stationary conditions to reservoir inflows
Inflow Phase-pi/2 to pi/2A sinusoidal scaling factor to apply non-stationary conditions to reservoir inflows
Table 1: Deep Uncertainties sampled for the Bedford-Greene System

After the relevant uncertainties and their plausible ranges have been identified, the next step is to define a sampling strategy. A sampling strategy is often referred to as a design of experiments, a term that dates back to the work of Ronald Fisher in the context of laboratory or field-based experiments (Fischer, 1936). The design of experiments is a methodological choice that should be carefully considered before starting computational experiments. The design of experiments should be chosen to balance the computational cost of the exploratory experiment with the amount of information needed to accurately characterize system vulnerability. An effective design of experiments allows us to explore complex relationships within the model and evaluate the interactions of system inputs. Five commonly used designs of experiments are overviewed in Chapter 3.3 of the main text of the eBook.

In the Bedford-Greene test case, we employ a Latin Hypercube Sampling (LHS) strategy, shown in Figure 3d. With this sampling technique for the 13 factors shown in Table 1, a 13-dimensional hypercube is generated, with each factor divided into an equal number of levels to obtain 2,000 different samples of future scenarios. 2,000 samples were chosen here based on testing from similar water supply test cases (Trindade et al., 2020) for other analyses, but the sampling size must be determined on a case-by-case basis. The LHS design guarantees sampling from every level of the uncertainty space without overlaps and will generate a diverse coverage of the entire space. When the number of samples is much greater than the number of uncertain factors, LHS effectively approximates the more computationally expensive full factorial sampling scheme shown in Figure 3a without needing to constrain samples to discrete levels for each factor, as done in fractional factorial sampling, shown in Figure 3c. For more details on each sampling scheme and general information on design of experiments, see Chapter 3.3 of the eBook.

Figure 2: Alternative designs of experiments reproduced from Figure 3.3 of the eBook main text. a) full factorial design sampling of three factors at four levels with a total of 64 samples; b) the exponential growth of a necessary number of samples when applying full factorial design at four levels; c) fractional factorial design of three factors at four levels at a total of 32 samples; d) Latin Hypercube sample of three factors with uniform distributions for a total of 32 samples.

A final step in our experimental setup is to determine which model outputs are relevant to model users. In the Bedford-Greene test case, we specify five performance criteria along with performance thresholds that the water utilities would like their infrastructure investment policy to meet under all future conditions. The performance criteria and thresholds are shown in Table 2. These values are based on water supply literature, and relevant criteria and thresholds should be determined on a case-by-case basis.

Performance criteriaThreshold
Reliability< 99%
Restriction Frequency>20%
Worst-case cost>10 % annual revenue
Peak financial cost> 80% annual revenue
Stranded assets> $5/kgal unit cost of expansion
Table 2: Performance criteria and thresholds

Discovering consequential scenarios

To explore uncertainty in the Bedford-Greene system, we run the ensemble of uncertainties developed by our design of experiments through a water resources systems model and examine the outputs of each sampled combination of uncertainty. In this system, we’re interested in understanding 1) which uncertainties have the most impact on system vulnerability, 2) which combinations of uncertainties lead to consequential outcomes for the water supply system, and 3) how vulnerability evolves over time. Chapter 3.2 of the eBook introduces diagnostic approaches that can help us answer these questions. In this tutorial, we utilize gradient-boosted trees, a machine-learning algorithm that uses an ensemble of shallow trees to generate an accurate classifier (for more on boosting, see Bernardo’s post). Gradient-boosted trees are particularly well suited to infrastructure investment problems because they are able to capture non-linear and non-differential boundaries in the uncertainty space, which often occur as a result of discrete capacity expansions. Gradient-boosted trees are also resistant to overfitting, easy to interpret, and provide a simple means of ranking the importance of uncertainties. For more background on gradient-boosted trees for scenario discovery, see this post from last year.

Gradient-boosted trees provide a helpful measure of feature importance, the percentage decrease in the impurity of the ensemble of trees associated with each factor. We can use this measure to examine how each uncertainty contributes to the ability of the region’s infrastructure investment and management policy. Infrastructure investments fundamentally alter the water utilities’ storage-to-capacity ratios and levels of debt burden, which will impact their vulnerability over time. To account for these changes, we examine feature importance over three different time periods. The results of our exploratory modeling process are shown in Figure 3. We observe that the importance of various uncertainties evolves over time for both water utilities. For example, while near-term demand growth is a key factor for both utilities in all three time periods, restriction effectiveness is a key uncertainty for Bedford in the near- and mid-term but not in the long-term, likely indicating that infrastructure investment reduces the utility’s need to rely on water use restrictions. Greene is not sensitive to restriction effectiveness in the near-term or long-term, but is very sensitive in the mid-term. This likely indicates that the utility uses restrictions as a bridge to manage high demands before infrastructure investments have been fully constructed.

Figure 3: factor importance for the two utilities. Darker colors indicate that uncertainties have higher predictive value for discovering consequential scenarios.

To learn more about how vulnerability for the two water utilities evolves, we use factor mapping (eBook Chapter 3.2) to delineate regions of the uncertainty space that lead to consequential model outputs. The factor maps in Figures 4 and 5 complement the factor ranking in Figure 3 by providing additional information about which combinations of uncertainties generate vulnerability for the two utilities. While near-term demand growth and restriction effectiveness appear to generate vulnerability for Bedford in the near-term, Figure 4 reveals that the vast majority of sampled future states of the world meet the performance criteria. When evaluated using a 22-year planning horizon, however, failures emerge as a consequence of high demand and low restriction effectiveness. When evaluated across a 45-year planning horizon, the utility appears extremely vulnerable to high demand, indicating that the infrastructure investment policy is likely insufficient to maintain water supply reliability.

Figure 4: Factor maps for Bedford

Greene’s factor maps tell a different story. In the near-term, the utility is vulnerable to high-demand scenarios. In the mid-term, the vulnerable regions have transformed, and two failure modes are apparent. First, the utility is vulnerable to a combination of high near-term demand and low restriction effectiveness, indicating the potential for water supply reliability failures. Second, the utility is vulnerable to low-demand scenarios, highlighting a potential financial failure from over-investment in infrastructure. When analyzed across the 45-year planning horizon, the utility is vulnerable to only low-demand futures, indicating a severe financial risk from over-investment. These factor maps provide important context to the factor priorities shown in Figure 3. While the factor prioritization does highlight the importance of demand growth for Greene, it does not indicate which ranges of uncertainty generate vulnerability. Evaluating the system across time reveals that though the utility is always sensitive to demand growth, the consequences of demand growth and the range that generates vulnerability completely transform over the planning period.

Figure 5: Factor maps for Greene

Concluding thoughts

The purpose of this post was to provide additional context to the eBook tutorial on time-evolving scenario discovery. The Bedford-Greene test case was chosen because it represents a tightly coupled human natural system with complex and nonlinear dynamics. The infrastructure investments made by the two water utilities fundamentally alter the system’s state dynamics over time, necessitating an approach that can capture how vulnerability evolves. Through a carefully designed computational experiment, and scenario discovery using gradient-boosted trees, we discover multiple failure modes for both water utilities, which can help regional decision-makers monitor policy performance and adapt to changing conditions. While each application will be different, the code in this tutorial can be used as a starting point for applying this methodology to other human-natural systems. As with all tutorials in the eBook, the Jupyter notebook ends with a section on how to apply this methodology to your problem.

References

Kwakkel, J. H., Walker, W. E., & Haasnoot, M. (2016). Coping with the wickedness of public policy problems: approaches for decision making under deep uncertainty. Journal of Water Resources Planning and Management, 142(3), 01816001.

Fisher, R.A. (1936). Design of experiments. Br Med J, 1(3923):554–554

Trindade, B. C., Gold, D. F., Reed, P. M., Zeff, H. B., & Characklis, G. W. (2020). Water pathways: An open source stochastic simulation system for integrated water supply portfolio management and infrastructure investment planning. Environmental Modelling & Software, 132, 104772.

Causality for Dummies

The name of this post speaks for itself – this will be a soft introduction to causality, and is intended to increase familiarity with the ideas, methods, applications and limitations of causal theory. Specifically, we will walk through a brief introduction of causality and compare it with correlation. We will also distinguish causal discovery from causal inference, the types of datasets commonly encountered and provide a (very) brief overview of the methods that can be applied towards both. This post also includes a list of commonly-encountered terms within the literature, as well as prior posts written on this topic.

Proceed if you’d like to be less of a causality dummy!

Introducing Causality

Note: The terms “event” and “variable” will be used interchangeably in this post.

Causality has roots and applications in a diverse set of field: philosophy, economics, mathematics, neuroscience – just to name a few. The earliest formalization for causality can be attributed to Aristotle, while modern causality as we understand it stem from David Hume, an 18th-century philosopher (Kleinberg, 2012). Hume argued that causal relationships can be inferred from observations and is conditional upon the observer’s beliefs and perceptions. Formally, however, causation can be defined as the contribution of an event to the production of other events. Causal links between events can be uni- or bi-directional – that is

  1. Event X causes event Y where the reverse is false, or
  2. Event X causes event Y where the reverse is true.

Such causal links have to be first detected and and then quantified to confirm the existence of a causal relationship between two events or variables, where the strength of these relationships have been measured using some form of a score-based strength measure (where the score has to exceed a specific threshold for it to be “counted” as a causal link), or using statistical models to calculate conditional independence.

Before we get down into the details on the methods that enable aforementioned quantification, let’s confront the elephant in the room:

Correlation vs Causation

The distinction between correlation and causation is often-muddled. Fortunately, there exists a plethora of analogies as to why correlation is not causation. My favorite so far is the global warming and pirates analogy, where increasing global mean temperature was found to be negatively correlated with a consistent decrease in the number of pirates. To demonstrate my point:

This plot is trying to tell you something that will save the Earth. (source: Baker, 2020)

…so does this mean that I should halt my PhD and join the dwindling ranks of pirates to save planet Earth?

Well, no (alas). This example demonstrates why relationships between such highly-correlated variables can be purely coincidental, with no causal links. Instead, there are external factors that may have actually the number of pirates to decrease, and independently, the global mean temperature to rise. Dave’s, Rohini’s and Andrew’s blog posts, as well as this website provide more examples that further demonstrate this point. Conversely, two variables that are causally linked may not be correlated. This can happen when there is a lack of change in variables being measured, sometimes caused by insufficient or manipulated sampling (Penn, 2018).

Correlation is limited by its requirement that the relationship between two variables be linear. It also does not factor time-ordering and time-lags of the variables’ time series. As a result, depending on correlation to assess a system may cause you to overlook the many relationships that might exist within it. Correlation is therefore a useful in making predictions, where trends in one variable can be used to predict the trends of another. Causation, on the other hand, can be used in making decisions, as it can help to develop a better understanding of the cause-and-effects of changes made to system variables.

Now that we’ve distinguished causation from its close (but troublesome) cousin, let’s begin to get into the meat of causality with some commons terms you might encounter as you being exploring the literature.

A quick glossary

The information from this section is largely drawn from Alber (2022), Nogueira et al. (2022), Runge et al. (2019), and Weinberger (2018). It should not be considered exhaustive, by any measure, and should only be used to get your bearings as you begin to explore causality.

Causal faithfulness (“faithfulness”): The assumption that causally-connected events be probabistically dependent on each other.

Causal Markov condition: The assumption that, in a graphical model, Event Y is independent of every other event, conditional on Y’s direct causes.

Causal precedence: The assumption that Event A causes Event B if A happens before B. That is, events in the present cannot have caused events in the past.

Causal sufficiency: The assumption that all possible direct common causes of an event, or changes to a variable, have been observed.

Conditional independence: Two events or variables X and Y are conditionally independent (it is known that X does not cause Y, and vice versa) given information on an additional event or variable Z.

Confounders: “Interfering” variables that influence both the dependent and independent variables, therefore making it more challenging, or confounding, to verify to presence of a causal relationship.

Contemporaneous links: A causal relationship that exists between variables in the same time step, therefore being “con” (with, the same time as) and “temporary”. The existence of such links is one instance in which the causal precedence assumption is broken.

Directed acyclic graphs (DAGs): A graphical representation of the causal relationships between a set of variables or events. These relationships are known to be, or assumed to be, true.

Edges: The graphical representation of the links connecting two variables or events in a causal network or graph. These edges may or may not represent causal links.

Granger causality: Event X Granger-causes Event Y if predicting Y based on its own past observations and past observations of X performs better than predicting Y solely on its own past observations.

Markov equivalence class: A set of graphs that represent the same patterns of conditional independence between variables.

Nodes: The graphical representation of two events (or changes to variables).

Causality: Discovery, Inference, Data and Metrics

Causality and its related methods are typically used for two purposes: causal discovery and causal inference. Explanations for both are provided below.

Causal Discovery

Also known as causal structural learning (Tibau et al., 2022), the goal of causal discovery is to obtain causal information directly from observed or historical data. Methods used for causal discovery do not assume implicit causal links between the variables within a dataset. Instead, they begin with the assumption of a “clean slate” and attempts to generate (then analyze) models to illustrate the inter-variable links inherent to the dataset thus preserving them. The end goal of causal discovery is to approximate a graph that represents the presence or absence of causal relationships between a set of two or more nodes.

Causal Inference

Causal inference uses (and does not generate) causal graphs, focusing on thoroughly testing the truth of a the causal relationship between two variables. Unlike causal discover, it assumes that a causal relationship already exists between two variables. Following this assumption, it tests and quantifies the actual relationships between variables in the available data. It is useful for assessing the impact of one event, or a change in one variable, on another and can be applied towards studying the possible effects of altering a given system. Here, causal inference should not be confused with sensitivity analysis. The intended use of sensitivity analysis is to map changes in model output to changes in its underlying assumptions, parameterizations, and biases. Causal inference is focused on assessing the cause-and-effect relationships between variables or events in a system or model.

Data used in causal discovery and causal inference

There are two forms of data that are typically encountered in literature that use causal methods:

Comparing cross-sectional data and longitudinal data (source: Scribbr).

Cross-sectional data

Data in this form is non-temporal. Put simply, all variables available in cross-sectional data represents a single point in time. It may contain observations of multiple individuals, where each observation represents one individual and each variable contains information on a different aspect of the individual. The assumption of causal precedence does not hold for such datasets, and therefore requires additional processing to develop causal links between variables. Such datasets are handled using methods that measure causality using advanced variations of conditional independence tests (more on this later). An example of a cross-sectional dataset is the census data collected by the U.S. Census Bureau once every decade.

Longitudinal data

This form of data is a general category that also contains time-series data, which consists of a series of observations about a single (usually) subject across some time period. Such datasets are relatively easy to handle compared to cross-sectional data as the causal precedence assumption is (often) met. Therefore, most causality methods can be used to handle longitudinal data, but some common methods include the basic forms of Granger causality, cross-convergent mapping (CCM), and fast conditional independence (FCI). An example of longitudinal data would be historical rainfall gauge records of precipitation over time.

Causality Methods

The information from this section largely originates from Runge et al. (2019), Noguiera et al. (2022), and Ombadi et al. (2020).

There are several methods can can be used to discover or infer causality from the aforementioned datasets. In general, these methods are used to identify and extract causal interactions between observed data. The outcomes of these methods facilitate the filtering of relevant drivers (or variables that cause observed outcomes) from the larger set of potential ones and clarify inter-variable relationships that are often muddied with correlations. These methods measure causality in one of two ways:

  1. Score-based methods: Such methods assign a “relevance score” to rank each proposed causal graph based on the likelihood that they accurately represent the conditional (in)dependencies between variables in a dataset. Without additional phases to refine their algorithms, these methods are computationally expensive as all potential graph configurations have to be ranked.
  2. Constraint-based methods: These methods employ a number of statistical tests to identify “necessary” causal graph edges, and their corresponding directions. While less computationally expensive than basic score-based methods, constraint-based causality methods are limited to evaluating causal links for one node (that represents a variable) at a time. Therefore, it cannot evaluate multiple variables and potential multivariate causal relationships. It’s computational expense is also proportional to the number of variables in the dataset.

Now that we have a general idea of what causality methods can help us do, and what , let’s dive into a few general classes of causality methods.

Granger Causality

Often cited as one of the earliest mathematical representations of causality, Granger causality is a statistical hypothesis test to determine if one time series is useful in forecasting another time series based in prediction. It was introduced by Clive Granger in the 1960s and has widely been used in economics since, and more recently has found applications in neuroscience and climate science (see Runge et al. 2018, 2019). Granger causality can be used to characterize predictive causal relationships and to measure the influence of system or model drivers on variables’ time series. These relationships and influences can be uni-directional (where X Granger-causes Y, but not Y G.C. X) or bi-directional (X G.C. Y, and Y G.C. X).

A time series demonstration on how variable X G.C. Y (source: Google).

Due to its measuring predictive causality, Granger causality is thus limited to systems with independent driving variables. It cannot be used to assess multivariate systems or systems in which conditional dependencies exist, both of which are characteristic of real-world stochastic and linear processes. It is also requires separability where the causal variable (the cause) has to be independent of the influenced variable (the effect), and assumes causal precedence. It is nonetheless a useful preliminary tool for causal discovery.

Nonlinear state-space methods

An alternative to Granger causality are non-linear state-space methods, which includes methods such as convergent cross mapping (CCM). Such methods assumes that variable interactions occur in an underlying deterministic dynamical system, and then attempts to uncover causal relationships based on Takens’ theorem (see Dave’s blog post here for an example) by reconstructing the nonlinear state-space. The key idea here is this: if event X can be predicted using time-delayed information from event Y, then X had a causal effect on Y.

Visualization on how nonlinear state-space methods reconstruct the dynamic system and identify causal relationships (source: Time Series Analysis Handbook, Chapter 6)

Convergent Cross Mapping (CCM)

Convergent Cross Mapping (Sugihara et al., 2012; Ye et al., 2015) tests the reliability of a variable Y as an estimate or predictor of variable X, revealing weak non-linear interactions between time series which might otherwise have been missed (Delforge et al, 2022). The CCM algorithm involves generating the system manifold M, the X-variable shadow manifold MX, and the Y-variable shadow manifold MY. The algorithm then samples an arbitrary set of nearest-neighbor (NN) points from MX, then determines if they correspond to neighboring points in MY. If X and Y are causally linked, they should share M as a common “attractor” manifold (a system state that can be sustained in the short-term). Variable X can therefore be said to inform variable Y .but not vice versa (unidirectionally lined). CCM can also be used to detect causality due to external forcings, where X and Y do not interact by may be driven by a common external variable Z. It is therefore best suited for causal discovery.

While CCM does not rely on conditional independence, it assumes the existence of a deterministic (but unknown) underlying system (i.e. a computer program, physics-based model) which can be represented using M. Therefore, it does not work well for stochastic time series (i.e. streamflow, coin tosses, bacteria population growth). The predictive ability of CCMs are also vulnerable to noise in data. Furthermore, it requires a long time series to de a reliable measure of causality between two variables, as a longer series decreases the NN-distance on each manifold thus improving the ability of the CCM to predict causality.

Causal network learning algorithms

On the flipside of CCMs, causal network learning algorithms assume that the underlying system in which the variables arise to be purely stochastic. These class of algorithms add or remove edges to causal graphs using criteria based in conditional or mutual independence, and assumes that both the causal Markov and faithfulness conditions holds true for all proposed graph structures. They are therefore best used for causal inference, and can be applied to cross-sectional data, as well as linear and non-linear time series.

These algorithms result in a “best estimate” graphs where all edges have the associated optimal conditional independences that best reflect observed data. They employ two stages: the skeleton discovery phase where non-essential links are eliminated, and the orientation phase where the directionality of the causal links are finalized. Because of this, these algorithms can be used to reconstruct large-scale, high-dimensional systems with interacting variables. Some of the algorithms in this class are also capable of identifying the direction of contemporaneous links, thus not being beholden to the causal precedence assumption. However, such methods can only estimate graphs up to a Markov equivalence class, and require a longer time series to provide a better prediction of causal relationships between variables.

General framework of all causal network learning algorithms (source: Runge et al., 2019).

Let’s go over a few examples of causal network learning algorithms:

The PC (Peter-Clark) algorithm

The PC algorithm begins will a fully connected graph in its skeleton phase an iteratively removes edges where conditional independences exists. It then orients the remaining edges in its orientation phase. It its earliest form, the PC algorithm was limited due to assumptions on causal sufficiency, and the lack of contemporaneous dependency handling. It also did not scale well to high-dimensional data.

Later variations attempted to overcome these limitations. For example, the momentary conditional independence PC (PCMCI) and the PCMCI+ algorithms added a further step to determine causal between variables in different timesteps and to find lagged and contemporaneous relationships separately, therefore handling contemporaneity. The PC-select variation introduced the ability to apply conditional independence tests on target variables, allowing it to process high-dimensional data. These variations can also eliminate spurious causal links. However, the PC algorithm and its variables still depend on the causal Markov, faithfulness, and sufficiency assumptions. The causal links that it detects are also relative to the feature space (Ureyen et al, 2022). This means that the directionality (or existence) of these links may change if new information is introduced to the system.

Full conditional independence (FCI)

Unlike the PC-based algorithms, FCI does not require that causal sufficiency be met although it, too, is based on iterative conditional independence tests and begins will a complete graph. Another differentiating feature between PC and FCI is the lack of the assumption of causal links directionality in the latter. Instead of a uni- or bi-directional orientation that the PC algorithm eventually assigns to its causal graph edges, the FCI has four edge implementations to account for the possibility of spurious links. Given variables X and Y, FCI’s edge implementations are as follows:

  1. X causes Y
  2. X causes Y or Y causes X
  3. X causes Y or there are unmeasured confounding variables
  4. X causes Y, Y causes X, there are unmeasured confounding variables, or some combination of both

There are also several FCI variations that allow improved handling of large datasets, high dimensionality, and contemporaneous variables. For example, the Anytime and the Adaptive Anytime FCI restricts the maximum number of variables to be considered as drivers, and the time series FCI (TsFCI) uses sliding windows to transform the original, long time series into a set of “independent” subsamples that can be treated as cross-sectional. To effectively use FCI, however, the data should be carefully prepared using Joint Causal Inference (JCI) to allow the generated graph to include both variable information and system information, to account for system background knowledge (Mooij et al., 2016).

Structural causal models (SCMs)

Similar to causal network learning algorithms, SCMs assumes a purely stochastic underlying system and uses DAGs to model the flow of information. It also can detect causal graphs to within a Markov equivalence class. Unlike causal network learning algorithms, SCMs structure DAGs that consist of a set of endogenous (Y) and exogenous (X) variables that are connected by a set of functions (F) that determine the values of Y based on the values of X. Within this context, a node represents the a variable x and y in X and Y, while an edge represents a function f within F. By doing so, SCMs enable the discovery of causal directions in cases where the causal direction cannot be inferred with conditional independence-based methods. SCMs can also handle a wide range of systems (linear, non linear, various noise probability distributions). This last advantage is one of the limitations of SCMs: it requires that some information on underlying structure of the system is known a priori (e.g. the system is assumed to be linear with at least one of the noise terms being drawn from a Gaussian distribution). SCMs are best used for causal inference, as causal links between variables have to be assumed during the generation of SCMs.

Information-theoretic algorithms

Finally, we have information-theoretic (IT) algorithms, which are considered an extension of the GC methods and allows the verification of nonlinear relationships between system variables, and therefore is best used for causal inference. IT algorithms measure transfer entropy (TE), which is defined as the amount of shared information between variables X and Y when both are conditioned on external variable Z. The magnitude of TE reflects the Shannon Entropy reduction in Y when, given Z, information on X is added to the system. For further information on IT and TE, Andrew’s blog post and Keyvan’s May 2020 and June 2020 posts further expand on the theory and application of both concepts.

There are a couple of assumptions that come along with the use of IT algorithms. First, like both SCMs and causal learning network algorithms, it assumes that the underlying system is purely stochastic. It is also bound to causal precedence, and asssumes that the causal variable X provides all useful information for the prediction of the effect Y, given Z. In addition, IT algorithms benefit from longer time series to improve predictions of causal links between variables. On the other hand, it does not make assumptions about the underlying structure of the data and can detect both linear and nonlinear causal relationships.

Prior WaterProgramming blog posts

That was a lot! But if you would like a more detailed dive into causality and/or explore some toy problems, there are a number of solid blog posts written that focus on the underlying math and concepts central to the approaches used in causal discovery and/or inference:

  1. Introduction to Granger Causality
  2. Introduction to Convergent Cross Mappint
  3. Detecting Causality using Convergent Cross Mapping: A Python Demo using the Fisheries Game
  4. Causal Inference Using Information-Theoretic Approaches
  5. Information Theory and the Moment-Independent Sensitivity Indices
  6. Milton Friedman’s thermostat and sensitivity analysis of control policies

Summary and key challenges

In this blog post, we introduced causality and compared it to correlation. We listed a glossary of commonly-used terms in causality literature, as well as distinguished causal discovery from causal inference. Next, we explored a number of commonly-used causality methods: Granger causality, CCM, conditional independence-based causal learning network algorithms, SCMs, and information-theoretic algorithms.

From this overview, it can be concluded that methods to discovery and infer causal relationships are powerful tool that enable us to identify cause-and-effect links between seemingly unrelated system variables. Improvements to these methods are pivotal to improve climate models, increase AI explainability, and aid in better, more transparent decision-making. Nevertheless, these methods face challenges (Tibau et al. 2022) that include, but are not limited to:

  1. Handling gridded or spatio-temporally aggregated data
  2. Representing nonlinear processes that may interact across time scales
  3. Handling non-Gaussian variable distributions and data non-stationarity
  4. Handling partial observability where only a subset of system variables is observed, thus challenging the causal sufficiency assumption
  5. Uncertainty: Non-stationarity, noise, internal variability
  6. Dealing with mixed data types (discriete vs continuous)
  7. Lack of benchmarking approaches due to lack of ground truth data

This brings us to the end of the post – do take a look at the References for a list of key literature and online articles that will be helpful as you begin learning about causality. Thank you for sticking with me and happy exploring!

References

Alber, S. (2022, February 9). Directed Acyclic Graphs (DAGs) and Regression for Causal Inference. UC David Health. Davis; California. Retrieved Match 14, 2023, from https://health.ucdavis.edu/ctsc/area/Resource-library/documents/directed-acyclic-graphs20220209.pdf

Baker, L. (2020, July 9). Hilarious graphs (and pirates) prove that correlation is not causation. Medium. Retrieved March 14, 2023, from https://towardsdatascience.com/hilarious-graphs-and-pirates-prove-that-correlation-is-not-causation-667838af4159

Delforge, D., de Viron, O., Vanclooster, M., Van Camp, M., & Watlet, A. (2022). Detecting hydrological connectivity using causal inference from time series: Synthetic and real Karstic case studies. Hydrology and Earth System Sciences, 26(8), 2181–2199. https://doi.org/10.5194/hess-26-2181-2022

Gonçalves, B. (2020, September 9). Causal inference - part IV - structural causal models. Medium. Retrieved March 13, 2023, from https://medium.data4sci.com/causal-inference-part-iv-structural-causal-models-df10a83be580

Kleinberg, S. (2012). A Brief History of Causality (Chapter 2) – Causality, Probability, and Time. Cambridge Core. Retrieved March 14, 2023, from https://www.cambridge.org/core/books/abs/causality-probability-and-time/brief-history-of-causality/C87F30B5A6F4F63F0C28C3156B809B9E

Mooij, J. M., Sara, M., & Claasen, T. (2022). Joint Causal Inference from Multiple Contexts. Journal of Machine Learning Research 21, 21(1). https://doi.org/https://doi.org/10.48550/arXiv.1611.10351

Nogueira, A. R., Pugnana, A., Ruggieri, S., Pedreschi, D., & Gama, J. (2022). Methods and tools for causal discovery and causal inference. WIREs Data Mining and Knowledge Discovery, 12(2). https://doi.org/10.1002/widm.1449

Ombadi, M., Nguyen, P., Sorooshian, S., & Hsu, K. (2020). Evaluation of methods for causal discovery in hydrometeorological systems. Water Resources Research, 56(7). https://doi.org/10.1029/2020wr027251

Penn, C. S., (2020, August 25). Can causation exist without correlation? Yes! Christopher S. Penn – Marketing Data Science Keynote Speaker. Retrieved March 14, 2023, from https://www.christopherspenn.com/2018/08/can-causation-exist-without-correlation/

Runge, J. (2018). Causal network reconstruction from time series: From theoretical assumptions to practical estimation. Chaos: An Interdisciplinary Journal of Nonlinear Science, 28(7), 075310. https://doi.org/10.1063/1.5025050

Runge, J., Bathiany, S., Bollt, E., Camps-Valls, G., Coumou, D., Deyle, E., Glymour, C., Kretschmer, M., Mahecha, M. D., Muñoz-Marí, J., van Nes, E. H., Peters, J., Quax, R., Reichstein, M., Scheffer, M., Schölkopf, B., Spirtes, P., Sugihara, G., Sun, J., Zscheischler, J. (2019). Inferring causation from time series in Earth System Sciences. Nature Communications, 10(1). https://doi.org/10.1038/s41467-019-10105-3

Sugihara, G., May, R., Ye, H., Hsieh, C.-hao, Deyle, E., Fogarty, M., & Munch, S. (2012). Detecting causality in complex ecosystems. Science, 338(6106), 496–500. https://doi.org/10.1126/science.1227079

Tibau, X.-A., Reimers, C., Gerhardus, A., Denzler, J., Eyring, V., & Runge, J. (2022). A spatiotemporal stochastic climate model for benchmarking causal discovery methods for teleconnections. Environmental Data Science, 1. https://doi.org/10.1017/eds.2022.11

Uereyen, S., Bachofer, F., & Kuenzer, C. (2022). A framework for multivariate analysis of land surface dynamics and driving variables—a case study for Indo-Gangetic River basins. Remote Sensing, 14(1), 197. https://doi.org/10.3390/rs14010197

Weinberger, N. (2017). Faithfulness, coordination and causal coincidences. Erkenntnis, 83(2), 113–133. https://doi.org/10.1007/s10670-017-9882-6

Ye, H., Deyle, E. R., Gilarranz, L. J., & Sugihara, G. (2015). Distinguishing time-delayed causal interactions using convergent cross mapping. Scientific Reports, 5(1). https://doi.org/10.1038/srep14750

Introducing the GRRIEn Analysis Framework: Defining standards for reproducible and robust supervised learning of earth surface processes at large spatial scales

I’m very excited to write this blogpost to highlight the GRRIEn framework that was developed in part by Dr. Elizabeth Carter at Syracuse University, along with collaborators Carolynne Hultquist and Tao Wen. I first saw Liz present the GRRIEn framework at the Frontiers in Hydrology conference last year and I knew it was a framework that I wanted to eventually demo on the blog. The associated manuscript (now published here) that introduces the GRRIEn (Generalizable, Reproducible, Robust, and Interpreted Environmental) framework is a cheat sheet of best management practices to extract generalizable insight on environmental systems using supervised learning of global environmental observations (EOs; satellites and coupled earth systems models), including standards for reproducible data engineering, robust model training, and domain-specialist interpreted model diagnostics (Carter et al., 2023).

In short, this is a paper that I wish existed when I first started diving into data science and machine learning in graduate school. If you are a student/have a student who is new to statistics, data science, and machine learning for environmental applications, I highly recommend this paper as the single most useful one-stop shop read. Below, I will introduce a bit about Liz and her group, the steps of the framework, and include some of the awesome ways that Liz has incorporated GRRIEn into her classes.

The CHaRTS Lab

Dr. Carter is based at Syracuse University and runs the Climate Hazards Research Team Syracuse (CHaRTS). CHaRTS uses proxy observations of the hydrologic cycle from satellites, photographs, and earth systems models to support fair and stable management of water resources and hydroclimatic risk in the twenty-first century. I talked to Liz about creating the GRRIEn framework and the results that she has seen in the classroom.

Why did you decide to create GRRIen?

“GRRIEn was created as a teaching tool for my graduate class, Environmental Data Science (EDS). EDS was intended to equip students with the computation tools they would need to use global earth observations for thesis/dissertation research. Some of my students have an interest in spatial statistics and machine learning, but most of them are domain specialists who want to use global earth observations to learn more about very specific environmental processes. The GRRIEn method was created for these students, the domain specialists. The goal is to reduce barriers to accessing insight from global earth observations associated with computational methods and advanced statistical techniques. For example, many students struggle to find and process spatial data. In EDS, they adopt a standard reproducible computational pipeline in the form of a GitHub repository for accessing raw data and creation of an analysis-ready dataset. For students who have limited statistical skills, I teach a few simple model-agnostic tools to diagnose and control for feature and observation dependence in multivariate observational spatiotemporal datasets. Most importantly, I teach methods in model interpretation. We need domain specialists to evaluate whether data-driven models are a physically plausible representation of environmental processes if we want to use global earth observations for knowledge discovery.”

What kind of results have you seen as you incorporate GRRIen in the classroom?

“My favorite part of EDS is seeing students going from never having written a line of code to processing massive volumes of data. Because so much of the process has been constrained by the GRRIEn method, students can really spread their wings in terms of application. For the past three years, I’ve had several students take their EDS project to conferences like AGU, or submit it for publication.”

Overview of the GRRIEn Framework

As the volume of earth observations from satellites, global models, and the environmental IoT continues to grow, so does the potential of these observations to help scientists discover trends and patterns in environmental systems at large spatial scales. Because many emerging global datasets are too large to store and process on personal computers, and because of scale-dependent qualities of spatial processes that can reduce the robustness of traditional statistical methods, domain specialists need targeted and accessible exposure to skills in reproducible scientific computing and spatial statistics to allow the use of global observations from satellites, earth systems models, and the environmental IoT to generalize insights from in-situ field observations across unsampled times and locations. The GRRIEn framework (which stands for generalizable, robust, reproducible, and interpretable environmental analytic framework) was developed for this purpose (Carter al., 2023). Below are the four key components of the framework, but please refer to the manuscript for full descriptions and examples. 

Generalizable: How well do your experimental results from a sample extend to the population as a whole?

Three common sources of global gridded earth observations (EOs) include active satellite remote sensing data, such as synthetic aperture radar imagery (left), passive satellite remote sensing data, including optical and passive microwave imagery (center), and gridded outputs from global coupled earth system models (right).

A primary motivation to generate global EOs is to allow scientists to translate insight from our limited field measurements to unsampled times and locations to improve the generalizability of our earth systems theories. As global EOs are (approximately) spatiotemporally continuous observations, the overall objective of GRRIEn analysis is to train a supervised learning algorithm that can predict an environmental process using globally available EOs as input data. This algorithm can be used to generalize insights from direct observations of an environmental process sampled experimentally or in-situ across unlabeled times or locations where global EOs are available (i.e. through interpolation, extrapolation, and diagnostic modeling)

Robust: Do your statistics show good performance on data drawn from a wide range of probability and joint probability distributions?

In order for your model to generalize well, you must quantify and account for scale-dependent multicollinearity and spatial/temporal dependence in your data. For example, multicollinearity occurs when one or more predictor variables are linearly related to each other. It is a problem for prediction and inference because the accuracy of predictions depends on the exact structure of multicollinearity that was present in the training dataset being present in the prediction dataset. Since it is associated with model instability in most machine learning and some deep learning applications, it also impacts diagnostic modeling; the model is less likely to interpret the system in a physically plausible way, and small changes in training data can lead to big changes in model weights and parameters.

Furthermore, correlation structure tends to be dynamic in space and time. The figure below shows an example of correlation structure between summertime air temperature and precipitation from 1950-2000 (left) and projected from 2050-2100 (right). One can see that the correlation between air temperature and precipitation will change for many locations in the US under climate change. This suggests that statistical models trained using temperature and precipitation under 20th century conditions will not make robust estimates in the 21st century.

Pearson’s correlation coefficient [scaled between -1 and 1, color bar (Benesty et al.
2009)] between bias-corrected statistically downscaled Climate Model Intercomparison
Project 5 ensemble mean monthly precipitation and daily max temperature. Historical
observations 1950-1999 (left) and moderate emissions forecast (RCP 4.5) for 2050-2099 (right) both indicate spatiotemporal variability collinearity between summertime maximum temperature and precipitation. Covariance of meteorological variables is a signature of local climate. As local climates shift due to global warming, so will the local covariability of meteorological variables (right). This generates complexity for predicting environmental process response to meteorological variables under climate change (Taylor et al. 2012)

We can’t expect to collect a sample of future covariability structure because these conditions haven’t happened yet. So how do we address these issues? The manuscript summarizes a great checklist of steps to make sure your model development is robust to both dependent features and observations.  

Checklist for robust data engineering and model development with dependent
features (left) and observations (right) in spatiotemporal systems.

Reproducible: Can other scientists understand and replicate your analysis and yield the same results?

The best way to facilitate this is to create a clear repository structure which contains a README, a container image that has specified versions of all packages that are used, and code that facilitates both the download of larger geospatial datasets that can’t be stored in a repository and code to reproduce analyses and figures.

In Liz’s Environmental Data Science class at Syracuse (CEE 609), she has her students complete a semester long project where every piece of the project is documented in a GRRIEn repository structure (see here for a great example). By the end of the semester, she noted that her students have a written paper, fully documented workflow, and a populated repository that often has served as a basis for a journal publication for students in her research group. 

Interpretable: Do your model parameters reflect a physically plausible diagnosis of the system?

The most important step to ensuring that your model will make the right predictions in different contexts is to make sure those predictions are happening for the right reason: have your model weights and parameters diagnosed the importance of your predictors, and their relationship to the environmental process that serves as your predictand, in a physically plausible way? The authors suggest forming a set of interpretable hypotheses before modeling that denotes the data that you are using and the relationships you expect to find and then utilizing local and global interpretation methods to confirm or reject the original hypotheses.   

Conclusion/Resources

I think the GRRIEn framework is a really great tool for young researchers embarking on data science and machine learning projects. If you are a newer faculty that is interested in incorporating some of these concepts into your class, Liz has made “Code Sprints” from her class available here. There are Jupyter Notebooks on working with Python, raster data, vector data, regressions, autocorrelation, and multicollinearity. Be sure to keep up with the work coming out of the CHaRTS lab on Twitter here!

Find the paper here: https://doi.org/10.1175/AIES-D-22-0065.1