May 20, 2024 by Zach Brodeur

Ensemble forecasting – application

In contrast to my theoretically oriented previous post on ensemble forecasting, I will attempt, in this post, to provide some practically oriented operational context for ensemble forecasting from the perspective of water management. This and the previous post will serve as a useful background for the planned third post in this series that will focus on the details of ensemble forecast verification. This post is largely a distillation of my experience in working with the formal Forecast Informed Reservoir Operations (FIRO) efforts that are ongoing, primarily in the western US. I use the term ‘formal’ here deliberately to differentiate from the more general notion of FIRO, which has been in the research literature since at least the 90’s. Importantly, ‘formal’ FIRO, as I choose to label it, is an effort to formalize the use of forecast information into the operations of our nation’s portfolio of federally owned or controlled dams, which contain the largest and most important reservoirs in CONUS. This is in stark contrast to the largely informal use of FIRO that has been applied somewhat ‘under the radar’ and scattershot in various ways and for various purposes by dam operators up to this point. While there is a very interesting discussion to be had about the institutional and operational complexities of this effort (perhaps a future post), I’ll leave this brief once-over-the-FIRO-world contextualization at that for now.

For the purposes of this post, I mention the operational context to narrow the window of forecast applications to a tractable space that is a) pertinent to water systems operations and b) is a space that I am intimately familiar with. The world of hydrometeorological forecasting is large and growing, containing both advances in legacy dynamical forecasting techniques as well as emergent machine learning approaches. When it comes to federal agencies, tried/tested/verified is incredibly important, so I anticipate the uptake of the latter ML approaches to be some ways off. Baby steps…the US Army Corp of Engineers (USACE) is still trying to move past the ‘water on the ground’ legacy of water management.

Meteorological forecasting

In discussing the current state of ensemble forecasting, I will progress hierarchically from the meteorological to the hydrologic forecasting space. I will first discuss some aspects of dynamical meteorological forecasting that form the backbone of any hydrological forecasting effort. It is, after all, the weather gods casting their hydrometeors (rain/snow/ice) upon the earth that cause the majority of the hydrologic responses of interest to practical FIRO applications. This is an important point that I’ll note here and come back to later. Meteorological forcings dominate the hydrologic response and by consequence, dominate the hydrologic uncertainties when we start thinking in terms of future predictions (forecasts) where meteorologic uncertainties become large. This is wholly in contrast to efforts to understand hydrologic uncertainty with observational data as forcings (for instance, stochastic watershed modeling, see my post here), where the hydrologic model’s structural inadequacy is the primary driver of hydrologic predictive uncertainty. The observational (input) uncertainties are not inconsequential in these sorts of applications, but they form a less substantial portion of the overall uncertainty and can largely be considered aleatory.

Advances in meteorological forecasting skill

With some sense of the importance of meteorological uncertainty to the hydrologic forecasting problem, I want to turn now to a brief discussion on the advances in dynamical meteorological forecasting and areas for future research and improvement. Most folks have some practical sense of forecast skill (or lack thereof) from their daily lives; you check the smartphone weather before you head out for the day, put on shorts and a t-shirt, then get soaked walking to work due to some unforecast rain event. Despite events like this and the general challenge of dynamical uncertainty in the atmospheric system (see previous post), the reality is that forecast skill has steadily improved over time, to the tune of 1-day of ‘useful’ skill improvement per decade (see Figure 1). Although dependent on the particular skill metric being used, it is common to assess ‘useful’ skill above 60% skill and a ‘high degree of accuracy’ above 80% skill. By some visual extrapolation, one might surmise that we should be approaching ‘useful’ skill out to 10-days by now in the northern hemisphere (NH) for the atmospheric variable in question.

Figure 1. Improvements in forecast skill of 500 hPa geopotential height anomaly over time (Bauer et al., 2015)

Notably, Figure 1 shows the skill improvements for a large, synoptic scale variable at a moderate altitude in the atmosphere (500 hPa ~ 15,000’ above sea level). Skill varies greatly across atmospheric variables and is particularly challenging for variables that are highly dependent on sub-grid scale processes (i.e. processes that occur below the native resolution of the forecast model); one of these, as luck would have it, is the aforementioned hydrometeor (precipitation) generating process of such interest to hydrologic applications. As I will loosely paraphrase, something I heard from F. Martin Ralph, the current director of the Center for Western Weather and Water extremes (CW3E) at Scripps, is that ‘our skill in predicting upper atmospheric variables at large scales has increased enormously, whereas our skill in predicting local scale precipitation has barely budged’. This basic notion is a Pandora’s box of sorts in terms of thinking how we might use the information that is available in forecast models (i.e. the highly accurate synoptic scale variables) differently to more directly forecast key variables of interests at the local scale; many of these emerging techniques rely on ML techniques applied across wide spatial scales that can more precisely target key hydrometeorological attributes of interest to water management like the probability of exceeding some rainfall threshold (Zhang et al., 2022) or mapping directly to streamflow itself and skipping the whole precipitation part (Nearing et al., 2021). These are certainly highly promising avenues for research, but we’ll return to some practical advancements in dynamical forecasting for now.

Areas for improving meteorological forecasting

One could spend an entire blog post (or many posts really) on this subject. I want to touch on just a few key areas where much research effort is currently dedicated to advancing meteorological forecasting. The first of these is improving the spatiotemporal resolution of the forecast models. To take a quick step back, it is useful to compare a ‘forecast’ model to a Global Circulation/Climate Model (GCM) or Earth System Model (ESM). In a broad sense, these models are all constructed in the same basic way. They start with a dynamical representation of the atmosphere based on the Navier-Stokes fluid dynamics equations that are discretized and solved across ‘grids’ of some horizontal and vertical dimensions across the globe. In climate science speak, this part of the model is often referred to as the ‘dynamical core’ (or ‘dy-core’ if you want to sound cool). For the most part, things that happen below this native grid resolution are parameterized (e.g. cloud formation), which means that they are modeled based on fitted statistical relationships between grid scale atmospheric states and sub-grid scale process outcomes. These parameterizations are often referred to as the model’s ‘physics’. Intuitively, the smaller the scale we can represent the atmospheric processes, then the better we might be able to capture these local behaviors. Advances in computational power have enable much higher resolutions for forecast models/GCM/ESM over time and will likely continue to do so.

Where these models primarily differ is in all the other stuff needed to run a simulation of earth’s atmosphere, namely the land/ocean/ice surfaces and their couplings with the atmosphere. Forecast models must be computationally tractable at operational relevant timescales to be useful. In the space we are discussing, this means that the model must be able to reinitialize and produce a forecast at least daily. Moreover, the forecast temporal resolution must be much higher to capture evolution of the climate system at operationally relevant scales, typically 6-hourly with current technology. To overcome this complexity tradeoff (see Figure 2), forecast models have traditionally relied on extremely simplified representations of the land/ocean/ice surface boundary conditions that are assumed to be constant across the forecast period. Conversely, the development of GCM/ESMs has continually increased the complexity of the land/ocean/ice modeling and their atmospheric coupling while retaining sufficient atmospheric spatiotemporal resolution for long term simulations. GCM/ESM must also provide provable closure of things like energy and carbon fluxes across long time scales; this is not really a primary concern for forecast models, they can be ‘leaky’. Increasingly, however, computational advances have enabled forecast models to look more like GCM/ESM, with state-of-the-science forecast models now modeling and coupling land/ocean interactions along with the atmospheric evolution.

Figure 2. Tradeoff between model complexity and resolution in modeling the earth system (Bauer et al., 2015)

The second key area that I’ll highlight is data assimilation. Data assimilation is the process of constraining forecast trajectories based on some intermediate assessment of the actual evolution of the climate system (i.e. an observation). In meteorological forecasting, this is typically done in a relatively short window after the forecast has been generated, for instance the 12-hour assimilation window in Figure 3 below. Importantly, for the issuance of forecasts, this assimilation period is happening ‘under the hood’, so that the user of forecasts would only see the forecast as being issued at 21:00, not the 09:00 time when the actual forecast model was initialized. While the details of data assimilation are complex and well outside the scope of this post, it is an incredibly important area of continued research for dynamical forecasting. To put this into context, another loose paraphrase I heard from Vijay Tallapragada, senior scientist for NOAA’s Environmental Modeling Center, is ‘if you want long-term job security in the weather forecasting, get into data assimilation’.

Figure 3. Data assimilation example in ensemble forecasting (Bauer et al., 2015)

Lastly, as alluded to in my previous post, the methods to sample initial condition uncertainty and propagate it through the dynamical forecast model are also a very important area of advancement in meteorological ensemble forecasting. The mathematical challenge of adequately sampling initial condition uncertainty at global scales across highly correlated initialization variables is huge, as is the computational burden of running multiple perturbed members of the forecasts to produce an ensemble. The latter challenge has particular significance to the hydrological forecasting enterprise and may, at least partially, explain certain design choices in the hydrologic forecast systems discussed in the following section.

Hydrologic ensemble forecasting

To turn the meteorologic outputs of precipitation and temperature to streamflow, we need a streamflow generating process model. Moreover, if we want an ensemble of streamflow predictions, we need a way to generate some uncertainty in the streamflows to reflect the dominant meteorological uncertainties in the forecasts. With this modeling chain of meteorologic forecast model à streamflow process model, one could envision many possible ways to attempt to capture this uncertainty. Perhaps the most intuitive would be to take each ensemble output from the meteorologic forecast model and run it through the streamflow process model; voila! ensemble streamflow forecasts!

The Hydrologic Ensemble Forecast Service (HEFS)

This is, however, not how it’s done in the Hydrologic Ensemble Forecast Service (HEFS) that is the current operational model used by NOAA/NWS river forecast centers (RFC) and the model being primarily used for formal FIRO efforts (see Figure 4 below). Instead, HEFS has its own internal Meteorological Ensemble Forecast Processor (MEFP) that ingests an ensemble mean of temperature and precipitation inputs from the meteorological ensemble forecast (global ensemble forecast system, GEFS) and then creates its own internal ensemble of meteorologic forecasts. These internally generated traces of forecasted temperature/precipitation are each used to force a single, watershed calibrated version of the SAC-SMA hydrologic model to produce an ensemble of streamflow forecasts (Demargne et al., 2014). Why use this implementation instead of the seemingly more straightforward approach with the meteorological ensemble? I don’t have a definitive answer for that, but my understanding is that, from a statistical verification perspective, the current HEFS approach is more reliable.

Figure 4. Conceptual diagram of HEFS model (Demargne et al., 2014)

Another possible reason stems from the computational complexity challenge of meteorological ensemble forecasting mentioned in the previous section, particularly as it relates to hindcasting. Hindcasts are forecasts that are generated with a modern day forecast model (and their modern day skill) against historical observations of the appropriate quality (post satellite era generally speaking, ~1979 – present). In meteorological spheres, hindcasts are important in documenting the performance of forecast model across a much larger dataset. For the purposes of advancing operational water management with forecasts, hindcasts are more than important…they are indispensable. Hindcasts form the basis for the design and testing of forecast informed water management procedures; there is no workaround for this. To generate a meteorological hindcast, the NWS has to carve out time to dedicate its warehouse sized forecast model computational platform to generating 20 -30 years of forecasts and archiving the data. This time must be balanced with the operational necessity of producing real time forecasts for things like: not getting wet going to work or maybe, I don’t know, aviation? This challenge means that hindcasts must typically be produced at reduced resolution compared to the operational forecasts. For example, the current NOAA Global Ensemble Forecast System (GEFSv12) produces operational forecasts with 31 ensemble members, but only 5 ensemble members in the hindcasts (Guan et al., 2019).

This limitation is significant. 5 ensemble members is sufficient to produce a relatively good approximation of the ensemble-mean, but grossly undersized for producing an adequate probabilistic streamflow ensemble. In this hindcast context, then, the construction of HEFS makes sense and enables more flexibility for generating historical forecast ensembles for development of forecast informed water management policies.

Practical HEFS limitations

The current implementations of HEFS typically generate ~40 ensemble members. The MEFP is conditioned on a historical period of reference using a Schaake shuffle type procedure, so this ensemble size is actually the length (in years) of that historical conditioning period. As with meteorological forecasting, the production of HEFS hindcasts is also challenging, in that it requires the same carving out of time on an operational computational architecture to produce the hindcasts. This limitation substantiates one of the core interests from folks within the formal FIRO implementation efforts into synthetic forecasting methods to expand the length and availability of simulated hindcasts for operational policy development. In general, HEFS is reliable, but it suffers from systemic ‘Type-II conditional bias’, which in layman’s terms is the tendency to underpredict extreme events. I will dig into some of these attributes of HEFS forecasts in a future post on ensemble verification.

Outside of questions of statistical reliability, there are other more challenging to quantify concerns about the implementation of HEFS. The ensemble-mean forecasts from the NOAA GEFS model that force HEFS are bound to be more ‘skillful’ than, for instance, the ensemble control forecast. (Note: the control forecast is the forecast trajectory initialized from the ‘best guess’ at the initial condition state of the atmosphere). This is a well-documented property of ensemble-mean predictions. However, ensemble-mean forecasts are not dynamically consistent representations of the atmosphere (Wilks, 2019). As I noted in my previous post, an ideal forecast ensemble would include all possible trajectories of the forecasts, which might diverge substantially at some future time. Where forecast trajectories have diverged, an average of those trajectories will produce a value that is not in the space of plausible dynamic trajectories. In practical terms, this leads to ensemble-mean forecasts collapsing to climatology at longer leads and systematic reductions in the range of magnitudes of key variables as lead time increases. In short, this reliance on the meteorological ensemble-mean forecast does incur some risk (in my opinion) of missing high impact extremes due to, for example, even relatively small spatial errors in the landfall locations of atmospheric river storms.

Final thoughts

The intent of this post was to provide some current operational context for the types of forecasts being considered for formal implementation of forecast informed water management operations. The ensemble forecasting space is large and exciting from both research and operational perspectives, but it’s challenging to navigate and understand what is actually being considered for use, and what is simply novel and interesting. In my next post, I’ll discuss some of the primary verification techniques used for ensemble forecasts with a focus on the performance of HEFS in an example watershed and some associated practical implementation.

Reference

Bauer, P., Thorpe, A., & Brunet, G. (2015). The quiet revolution of numerical weather prediction. Nature, 525(7567), 47–55. https://doi.org/10.1038/nature14956

Demargne, J., Wu, L., Regonda, S. K., Brown, J. D., Lee, H., He, M., … Zhu, Y. (2014). The science of NOAA’s operational hydrologic ensemble forecast service. Bulletin of the American Meteorological Society, 95(1), 79–98. https://doi.org/10.1175/BAMS-D-12-00081.1

Guan, H., Zhu, Y., Sinsky, E., Fu, B., Zhou, X., Li, W., et al. (2019). The NCEP GEFS-v12 Reforecasts to Support Subseasonal and Hydrometeorological Applications. 44th NOAA Annual Climate Diagnostics and Prediction Workshop, (October), 78–81.

Nearing, G. S., Kratzert, F., Sampson, A. K., Pelissier, C. S., Klotz, D., Frame, J. M., … Gupta, H. V. (2021). What Role Does Hydrological Science Play in the Age of Machine Learning? Water Resources Research, 57(3). https://doi.org/10.1029/2020WR028091

Wilks, D. S., (2019). Statistical Methods in the Atmospheric Sciences, 4th ed. Cambridge, MA: Elsevier.

Zhang, C., Brodeur, Z. P., Steinschneider, S., & Herman, J. D. (2022). Leveraging Spatial Patterns in Precipitation Forecasts Using Deep Learning to Support Regional Water Management. Water Resources Research, 58(9), 1–18. https://doi.org/10.1029/2021WR031910

April 15, 2024 by Zach Brodeur

Ensemble forecasting – theory

‘Does the flap of a butterfly’s wing in Brazil stir up a tornado in Texas?’ These words reflect the so-called ‘butterfly effect’ typically ascribed to Edward Lorenz (MIT), who was a pioneer in the field of numerical weather prediction (NWP). (Interestingly, Lorenz himself first referred to the ‘flap of a seagulls’ wing’; the butterfly swap out was the work of a creative conference session convener. Somewhat coincidentally, the shape of the ‘Lorenz attractor’ is also reminiscent of a butterfly, see Figure 1). Motivated by the very salient challenge in the post-WWII era to improve weather prediction capabilities for strategic purposes, he developed a theoretical framework for the predictability of non-linear dynamical systems in a seminal paper, ‘Deterministic nonperiodic flow’ (Lorenz, 1963), that would come to be known as ‘chaos theory’ or ‘chaotic dynamics’. This theory was foundational to the development of ensemble forecasting (and a whole host of other things), and still underpins our current theoretical understanding of the inherent predictability of complex dynamical systems like the atmosphere (Lorenz, 2006).

Figure 1. Book cover (left panel) and the characteristic ‘butterfly wing’ shape of the Lorenz attractor (right panel) from his seminal 1963 work. (Palmer, 1993)

In actuality, the whole ‘flap of a butterfly’s wing creating a tornado in Texas’ relates specifically to only one aspect of the theoretical framework developed by Lorenz. In this post, I will attempt to give a slightly fuller treatment of this theoretical framework and its development. My hope is that this post will be a theoretical primer for two follow-on practically oriented posts: 1) a discussion of the state-of-the-science of hydrometeorological ensemble forecasting and its application in emerging water resources management practices like forecast informed reservoir operations, and 2) a deeper dive into ensemble verification techniques. While the techniques for (2) are grounded in the theoretical statistical properties of stochastic-dynamic predictive ensembles, they have broad applications to any scenario where one needs to diagnose the performance of an ensemble.

The Lorenz model and ‘chaos’ theory

When most people hear the word ‘chaos’, they tend to think of the dictionary definition: ‘complete disorder and confusion’ (Oxford), which in a scientific context might aptly describe some sort of white noise process. As you will see, this is somewhat of a far cry from the ‘chaos’ described in Lorenz’s theory. The ‘chaos’ terminology was actually coined by a later paper building on Lorenz’s work (Li & Yorke, 1975) and as noted by Wilks (2019): ‘…this label is somewhat unfortunate in that it is not really descriptive of the sensitive-dependence phenomenon’. The sensitive-dependence phenomenon is one of a set of 4 key properties (to be discussed later) that characterize the behaviors of the sort of non-linear, non-periodic deterministic systems that Lorenz argued were most representative of the atmosphere. In contrast to ‘disorder’ and ‘confusion’, these properties, in fact, lend some sense of order to the evolution of these dynamical systems, although the nature of that order is highly dependent on the system state.

A deterministic, non-periodic systems model

First, let’s dive into a few details of Lorenz’s 1963 work, using some figures and descriptions from a later paper (Palmer, 1993) that are easier to follow. While the 1963 paper is quite dense, much of the mathematical logic is dedicated to justifying the use of a particular system of equations that forms the basis of the study. This system of 3 equations and 3 variables (X, Y, Z) describes a non-linear, dissipative model of convection in a fluid of uniform depth, where there is a temperature difference between the upper and lower surfaces. Lorenz derived the set of 3 equations shown in the upper left panel of Figure 2 from earlier work by Rayleigh (1916) on this particular problem. In short, X relates to the intensity of convective motion, Y relates to the temperature difference between ascending and descending currents, and Z relates to the distortion of the vertical temperature profile from linearity; the details of these variables are not actually all that important to the study. What is important is that this system of equations has no general solutions (aside from the steady state solution) and must be numerically integrated in the time dimension to determine the convective flow dynamics. The ‘trajectories’ of these integrations shown in the phase space diagrams in Figure 2 exhibit the sorts of unstable, non-periodic behaviors that Lorenz thought were the most useful analog to atmospheric dynamics, albeit in a much simpler system. (‘Much’ is an understatement here; modern NWP models have a phase space on the order of 10⁹ in contrast to the 3-dimensional phase space of this problem, Wilks, 2019. Of note, the use of phase-space diagrams (i.e. plots where each axis corresponds to a dependent variable in the system of equations) preceded Lorenz, but his use of them is perhaps one of the best-known early instances of this kind of visualization. Other uses of phase space relationships can be found in Rohini’s post or Dave’s post)

Figure 2. a) Lorenz equations and the XY phase space diagram of their integrations. b-c) X variable timeseries of two separate integrations of the Lorenz equations from very similar initial conditions. (Palmer, 1993)

Regime structure and multiple distinct timescales

What ‘behaviors’ then, are we talking about? Referring again to Figure 2a, we see the projection of a long series of model integration trajectories onto the XY-plane of the phase space diagram, where each dot is a timestep in the integration. What is immediately apparent in the form of these integrations is the two lobed ‘butterfly’ shape of the diagram. Each lobe has a central attractor where the trajectories often reside for multiple revolutions, but can then transition to the other attractor when passing near the center of the diagram. These so-called ‘Lorenz attractors’ comprise one of the key properties of chaotic dynamics, namely regime structure, which is tendency of the trajectories to reside around phase space attractors for some period of time. This residence time in a regime is generally quite a bit longer than the timescales of the trajectories’ individual revolutions around an attractor. This attribute is referred to as multiple distinct timescales and is evidenced in Figure 2b-c, where smaller amplitude sections of the timeseries show residence in one or the other regime and large amplitude sections describe transitions between regimes. Often, but not always, the residence in these regimes occurs for multiple revolutions, suggesting that there are shorter timescale evolutions of the system that take place within these regimes, while infrequent, random shifts to the other regimes occur at longer timescales.

Sensitive-dependence and state-dependent variation in predictability

Figure 3. a-c) Different trajectories through the XY phase space dependent on the initial condition state-space (black circle). (Palmer, 1993)

Returning now to the ‘butterfly effect’; what, then, is this whole sensitive-dependence thing mentioned earlier? Figure 3a-c provide a nice graphical representation of this phenomenon. In each panel, a set of nearby initial states are chosen at different location in the phase space diagram and then are followed through their set of trajectories. In 3a, these trajectories neatly transition from one regime to the other, remaining relatively close to each other at the trajectories’ end. In 3b, a set of initial states not so far from 3a are chosen and instead of neatly transitioning to the other regime, they diverge strongly near the center of the diagram, with some trajectories remaining in the original regime, and some transitioning. However, for about half of the timespan, the trajectories remain very close to one another. Finally, an initial state chosen near the center of the diagram (3c) diverges quickly into both regimes, with trajectories ending up at nearly opposite ends of the phase space (black tails at upper right and left of diagram). Figures b-c, in particular, showcase the sensitive-dependence on initial conditions attributes of the system. In other words, from a set of very close-by initial states, the final trajectories in phase space can yield strongly divergent results. Importantly, this divergence in trajectories over some period of time can occur right away (3c), at some intermediate time (3b), or not at all (3a).

This is the basic idea behind the last core property of these systems, state-dependent variation in predictability. Clearly, a forecast initialized from the starting point in 3a could be a little bit uncertain about the exact starting state and still end up in about the right spot for an accurate future prediction at the end of the forecast horizon. At medium ranges, this is also the case for 3b, but the longer range forecast is highly uncertain. For 3c, all forecast ranges are highly uncertain; in other words, the flap of a butterfly’s wing can mean the difference between one or the other trajectory! Importantly, one can imagine in this representation that an average value of 3c’s trajectories (i.e. the ensemble mean) would fall somewhere in the middle of the phase space and be representative of none of the two physically plausible trajectories into the right or left regimes. This is an important point that we’ll return to at the end of this post.

From the Lorenz model to ensemble forecasting

The previous sections have highlighted this idealized dynamical system (Lorenz model) that theoretically has properties of the actual atmosphere. Those 4 properties (the big 4!) were: 1) sensitive-dependence on initial conditions, 2) regime structure, 3) multiple distinct timescales, and 4) state-dependent variation in predictability. In this last section, I will tie these concepts into a theoretical discussion of ensemble forecasting. Notably, in the previous sections, I discussed ‘timescales’ without any reference to the real atmosphere. The dynamics of the Lorenz system prove to be quite illuminating about the actual atmosphere when timescales are attached to the system that are roughly on the order of the evolution of synoptic scale weather systems. If one assumes that a lap around the Lorenz attractor equates to a synoptic timescale of ~5 days, then the Lorenz model can be conceptualized in terms of relatively homogenous synoptic weather progressions occurring within two distinct weather regimes that typically persist on the order of weeks to months (see Figure 3b-c). This theoretical foundation jives nicely with the empirically derived weather-regime based approach (see Rohini’s post or Nasser’s post) where incidentally, 4 or more weather regimes are commonly found (The real atmosphere is quite a bit more complex than the Lorenz model after all). This discussion of the Lorenz model has hopefully given you some intuition that conceptualizing real world weather progressions outside the context of an appropriate regime structure could lead to some very unphysical representations of the climate system.

Ensemble forecasting as a discrete approximation of forecast uncertainty

In terms of weather prediction, though, the big 4 make things really tough. While there are certainly conditions in the atmosphere that can lead to excellent long range predictability (i.e. ‘forecasts of opportunity’, see Figure 3a), the ‘typical’ dynamics of the atmosphere yield the potential for multiple regimes and associated transitions within the short-to-medium range timeframe (1-15 days) where synoptic-scale hydro-meteorological forecasting is theoretically possible. (Note, by ‘synoptic-scale’ here, I am referring to the capability to predict actual weather events, not prevailing conditions. Current science puts the theoretical dynamical limit to predictability at ~2 weeks with current NWP technology achieving ‘usable’ skill out to ~7 days, give or take a few dependent on the region)

Early efforts to bring the Lorenz model into weather prediction sought to develop analytical methods to propagate initial condition uncertainty into a useful probabilistic approximation of the forecast uncertainty at various lead times. This can work for simplified representation of the atmosphere like the Lorenz model, but quickly becomes intractable as the complexity and scale of the governing equations and boundary conditions increases.

Figure 4. Conceptual depiction of ensemble forecasting including sampling of initial condition uncertainty, forecasts at different lead times, regime structure, and ensemble mean comparison to individual ensemble members. Solid arrow represents a ‘control’ member of the ensemble. Histograms represent an approximate empirical distribution of the ensemble forecast. (Wilks, 2019)

Thankfully, rapid advances in computing power led to an alternate approach, ensemble forecasting! Ensemble forecasting is a stochastic-dynamic approach that couples a discrete, probabilistic sampling of the initial condition uncertainty (Figure 4 ‘Initial time’) and propagates each of those initial condition state vectors through a dynamical NWP model. Each of these NWP integrations is a unique trajectory through state space of the dynamical model that yields a discrete approximation of the forecast uncertainty (Figure 4 ‘Intermediate/Final forecast lead time’). This discrete forecast uncertainty distribution theoretically encompasses the full space of potential hydro-meteorologic trajectories and allows a probabilistic representation of forecast outcomes through analysis of the empirical forecast ensemble distribution. These forecast trajectories highlight many of the big 4 properties discussed in previous sections, including regime structure and state-dependent predictability (Figure 4 trajectories are analogous to the Figure 3b trajectories for the Lorenz model). The ensemble mean prediction is an accurate and dynamically consistent prediction at the intermediate lead time, but at the final lead time where distinct regime partitioning has occurred, it is no longer dynamically consistent and delineates a region of low probability in the full ensemble distribution. I will explore properties of ensemble averaging, both good and bad, in a future post.

Lastly, I will note that the ensemble forecasting approach is a type of Monte Carlo procedure. Like other Monte Carlo approaches with complex systems, the methodology for sampling the initial condition uncertainty has a profound effect on the uncertainty quantification contained within the final NWP ensemble output, especially when considering the high degree of spatiotemporal relationships within the observed climatic variables that form the initial state vector. This is a key area of continued research and improvement in ensemble forecasting models.

Final thoughts

I hope that you find this discussion of the theoretical underpinnings of chaotic dynamics and ensemble forecasting to be useful. I have always found these foundational concepts to be particularly fascinating. Moreover, the basic theory has strong connections outside of ensemble forecasting, including the ties to weather regime theory mentioned in this post. I also think these foundational concepts are necessary to understand how much actual ensemble forecasting techniques can diverge from the theoretical stochastic-dynamic framework. This will be the subject of some future posts that will delve into more practical and applied aspects of ensemble forecasting with a focus on water resources management.

Reference

Lorenz, E. N. (1963). Deterministic nonperiodic flow. Journal of the Atmospheric Sciences, 20, 130–141.

Lorenz, E. N. (2006). Reflections on the Conception , Birth , and Childhood of Numerical Weather Prediction. Annual Review of Earth and Planetary Science, 34, 37–45. https://doi.org/10.1146/annurev.earth.34.083105.102317

Palmer, T. N. (1993). Extended-range atmospheric prediction and the Lorenz model. Bulletin – American Meteorological Society, 74(1), 49–65. https://doi.org/10.1175/1520-0477(1993)074<0049:ERAPAT>2.0.CO;2

Rayleigh, L. (1916). On convective currents in a horizontal layer when the higher temperature is on the underside. Phil. Mag., 32, 529-546.

Wilks, D. S., (2019). Statistical Methods in the Atmospheric Sciences, 4th ed. Cambridge, MA: Elsevier.

July 7, 2023 by Trevor Amestoy

Rodionov’s Regime Shift Detection Algorithm and the “Epic Pluvial” in the Delaware Basin

In their 2013 publication, Pederson et al. try to answer the question posed by the title: Is an Epic Pluvial Masking the Water Insecurity of the Greater New York City Region?

To help answer this question, they first reconstruct 472 years of Palmer Drought Severity Index (PDSI) for the region, created using tree-ring chronologies, and show that study the hydroclimate patterns over the historic record. The reconstructed PDSI, along with measured meteorological data dating back to 1895 are used to assess trends the region’s hydroclimate.

Their analysis shows that the upper Delaware River Basin (source of NYC drinking water) has experienced a prolonged wet period since the early 1970s, and that the water availability experienced between 1970 until the time of writing may not be representative of what is “normal” in the context of the past 400 years.

This works supports the findings of Seager et al. (2012) who also identified the post-1960s “Epic Pluvial”, and also compared the observational data with temperature-forced global climate models to show that the wet regime is plausibly explained by internal atmospheric variability of the region. Lehner and Deser (2023) define internal variability as “fluctuations that arise intrinsically in a non-linear dynamical system, even when that system is closed.”

The first figure in Pederson et al. (2013; below) visualizes this shift toward a wetter regime, starting in 1970, along with a second wetter-regime shift again in 2003. The orange line in the figure shows the average precipitation of the wet-regime detected in 1970, which was detected using the sequential algorithm for testing climate regime shifts proposed by Rodionov (2004).

When I saw this figure, several of my own questions came to mind:

Has the wet-regime continued after 2011 (end of study period)?
This regime shift was detected in precipitation, will it also be detected in streamflow data?

Post Overview

In this post, I present Rodionov’s regime shift detection algorithm, and use it to assess streamflow regime shifts in the upper Delaware River Basin (DRB) from 1904 through the present.

A Python implementation of the algorithm, the streamflow data studied here, and a Jupyter Notebook used to run the following analysis is all available here: Rodionov_regime_shifts

Rodionov’s Algorithm

The mathematical details of the algorithm are best described by Rodionov (2004), and it would not be advantageous for me to replicate those here, but I encourage you to read the formal description in combination with the descriptions here.

Below, I provide both a verbal explanation of the process along with the Python implementation here: rodionov.py

Rodionov’s algorithm in words

Step 1: Set parameters (l & p)

In Rodionov’s sequential regime algorithm, the first step is to specify the expected minimum length of a regime (denoted as l) and a required statistical significance level (p) used to test regime differences.

Additionally, the average variance of all periods of length l in the data record is calculated, and each regime is assumed to have the average variance.

Step 2: Determine statistically significant deviation values

The algorithm defines the threshold for identifying potential new regimes based on how different a value needs to be from the mean of the current regime for it to be considered a potential regime shift.

This difference depends on a Student’s t-test value, the minimum regime length, average variance, and the user-specified significance level (p).

Step 3: Calculate initial regime statistics

Assume that the initial regime is the first l days, specified in Step 1.

The mean of this initial regime is calculated, and upper and lower significance bounds are established using the difference value obtained in Step 2. Upper and lower bounds are equal to the mean plus/minus the significant difference value.

Step 4: Identifying potential regime shift points

One-by-one test each subsequent value in the series to tested to determine if it exceeds the upper or lower bounds established for the current regime distribution.

If the value is inside the bounds of the current regime, then re-calculate the current regime mean including the new point, and move on to the next day.

If a value is outside the bounds of the current regime, then consider that point as a potential regime shift (Step 5).

Step 5: Testing a potential regime shift

Once a potential regime shift is detected, adopt a null hypothesis that the new regime has a mean equal to the statistically significant boundary value of the previous regime distribution that was exceeded.

The regime shift hypothesis is then assessed by calculating the Regime Shift Index (RSI), which is a cumulative measure of exceedances beyond the prior regime’s significance value. The cumulative sum of exceedances over the minimum regime length (l) defines the RSI for the potential regime shift point.

Step 6: Rejecting potential regime shifts

If RSI <0 at any point within the l-periods after the initial shift period, then the new-regime hypothesis is rejected.

When the potential shift is rejected, the mean of the prior regime is modified to incorporate the potential shift point into the distribution, and the analysis continues by returning to Step 4 to search for additional potential regime shifts.

Step 7: Accepting regime shifts

If RSI>0 after accumulating over the minimum regime length (l), then the new regime R2 is identified as significant.

Once the new regime is adopted, the mean of the new regime is calculated for the first l periods of R2, and new significance bounds are calculated using the significant difference value from Step 2.

The regime-shift-search is continued, starting on the first day of the new regime R2 (Return to Step 4.)

Notably, by resuming the search at the start of R2, regimes can be shorter than the specified minimum length l, and re-assessing the values in R2 lessens the impact of making assumptions about l.

Rodionov’s algorithm in code

The Python implementation, rodionov_regimes(), thus follows:

def rodionov_regimes(data, l, p):
    """Implements Rodionov's (2004) regime shift detection algorithm:
    Rodionov, S. N. (2004). A sequential algorithm for testing climate regime shifts.
    Geophysical Research Letters, 31(9).

    Args:
        data (array): Timeseries array of std values
        l (int): The assumed minimum regime length
        p (float): The singificance probability to use when assessing shifts

    Returns:
        list, list: Two lists: The regime-shift indices, the RSI values
    """

    # Step 1: Set the cut-off length l of the regimes
    # l: Number of years for each regime
    # p: Probability level for significance
    n = len(data)
    regime_shift_indices = []
    rsi = np.zeros(n)
	
	# Step 2: Determine the difference diff for statistically significant mean values
	t_stat = np.abs(t_test.ppf(p, (2*l-2)))
	avg_var = np.mean([np.var(data[i:(i+l)]) for i in range(n-l)])	
	diff = t_stat * np.sqrt(2 * avg_var / l)

	# Step 3: Calculate initial mean for regime R1
    r1 = np.mean(data[:l])
    r1_lower = r1 - diff
    r1_upper = r1 + diff

    i = l + 1
    while i < n:
        # Step 4: Check if the value exceeds the range of R1 ± diff
        if data[i] < r1_lower or data[i] > r1_upper:
            j = i
  
            # Estimate the mean of regime 2 as the upper bound of r1 distribution
            test_r2 = r1_lower if data[i] < r1_lower else r1_upper

            # Step 5: Calculate regime shift index (RSI) across next l-window
            for k in range(j + 1, min(j + l,n)):
                if data[j] > r1:
                    rsi[j] += (data[k] - test_r2)/(avg_var*l)
                elif data[j] < r1:
                    rsi[j] += (test_r2 - data[k])/(avg_var*l)

                # Step 6: Test for a regime shift at year j
                if rsi[j] < 0:
                    rsi[j] = 0
                    break

            # Step 7: Confirm significant regime shift at year j
            if rsi[j] > 0:
                regime_shift_indices.append(j)
                r2 = np.mean(data[j:min(j+l,n)])
                r1 = r2
                r1_lower = r1 - diff
                r1_upper = r1 + diff

            i = j + 1
        else:
            # Recalculate average for R1 to include the new value
            r1 = ((l - 1) * r1 + data[i]) / l
            
        i += 1
    return regime_shift_indices, rsi

Usage in the DRB: Detecting streamflow regime shift

Now it is time to try and answer the questions posed earlier:

Has the wet-regime identified by Pederson et al. (2013) continued after 2011?
This regime shift was detected in precipitation, will it also be detected in streamflow data?

See the Rodionov_DRB_regime_demo.ipynb to see how the following results are generated.

Streamflow data

In choosing streamflow data for this study, I looked for a USGS gauge that (A) is in or downstream of the upper DRB region considered by Pederson et al. (2013), and (B) has a long record.

I settled on USGS-01434000 which is located on the Delaware River near the middle of the basin, and has daily streamflow data dating back to 1904! I pulled all flow data from 1904 to July, 2023 (present).

Before using Rodionov’s algorithm I standardized the flow data by removing seasonality (remove monthly means) and log-transforming the deviations from seasonal means.

Lastly, I take the annual mean of standardized flow values.

Application of Rodionov’s algorithm

Running the rodionov_regimes() function is as easy as passing a 1D timeseries array, along with the l and p parameters.

shift_indices, rsi_values = rodionov_regimes(Z.values.flatten(), l= 10, p= 0.005)

The shift_indices contains a list of the array indices where a statistically-significant regime shift was detected. The rsi_values contains the RSI for each year in the timeseries, and will be 0 whenever there is no regime shift.

If the lists are empty, then there were no statistically significant regimes detected; consider lowering your statistical confidence requirements (increase p) and try again.

In this study, I start by searching for regimes with an assumed minimum length of 10 years (l= 10) and significance level of p= 0.005.

Streamflow regime shifts in the DRB

Running Rodionov’s algorithm on the standardized streamflow data reveals a significant regime shift toward wetter conditions in 2003:

Notably, the 1970s regime shift reported by Pederson et al. (2013) is not detected in the streamflow timeseries, under the initial parameterization. However, the authors also identified the 2003 regime shift toward wetter conditions, and the 2003 shift was the only shift they detected when analyzing summer precipitation (see lower timeseries in the Pederson figure earlier in this post).

The regime shifts found using the Rodionov algorithm is sensitive to the l and parameterization, and it may be that Pederson et al. (2013) used a different parameterization (I didn’t see it detailed in the publication). Although, in hydrology is often very difficult to decide on a definitive timescale for hydrologic processes that are highly unpredictable, like wet/dry regimes.

Rather than specifying a single assumed minimum regime length, I re-ran the Rodionov algorithm using every regime length between 5 and 35.

The figure below shows all of the different regimes determined to be significant across the range of different regime lengths. I also added a measure of Found Frequency (lower row) which indicates how often a specific year contained a significant regime shift across the 30 instances of the search.

From the above analysis, we can see:

Identified regimes are variable in duration, and magnitude depending on the specified minimum regime length.
Despite the variability, the 2003 shift toward a wetter regime is identified as being statistically significant in 29/30 of the searches!

Conclusions

To answer the questions that motivated this post:

Has the wet-regime identified by Pederson et al. (2013) continued after 2011?

Yes! Based on the Rodionov algorithm, it seems clear that the period 2011-2023 continues to be classified as a significantly wet period.

This regime shift was detected in precipitation, will it also be detected in streamflow data?

Yes, kinda! Extension of the work by Pederson et al. (2013) to include streamflow does show that the wet regime is detected in the streamflow record, however it is not clear that the regime started in 1970. However, the 2003 shift is also reported by Pederson et al. (2013) and clearly detected in the streamflow timeseries.

Implications for water resource systems

The persistence of a prolonged wet regime has implications for the New York City water supply, as NYC depends on the upper Delaware River Basin, Catskills region as a major supply source, owning three major reservoirs in the region. Contextualizing the present hydroclimate, and understanding the potential variability on trends in water availability will be important to making informed decisions that lead to water supply reliability in an uncertain and potentially less-epic climate.

If you’ve made it this far, thank you for reading!

References

Lehner, F., & Deser, C. (2023). Origin, importance, and predictive limits of internal climate variability. Environmental Research: Climate, 2(2), 023001.

Pederson, N., Bell, A. R., Cook, E. R., Lall, U., Devineni, N., Seager, R., … & Vranes, K. P. (2013). Is an epic pluvial masking the water insecurity of the greater New York City region?. Journal of Climate, 26(4), 1339-1354.

Rodionov, S. N. (2004). A sequential algorithm for testing climate regime shifts. Geophysical Research Letters, 31(9).

Seager, R., Pederson, N., Kushnir, Y., Nakamura, J., & Jurburg, S. (2012). The 1960s drought and the subsequent shift to a wetter climate in the Catskill Mountains region of the New York City watershed. Journal of Climate, 25(19), 6721-6742.

May 1, 2023 by Audrey Noziere

The Colorado River Basin: Exploring its Past, Present, and Future Management

I. Introduction

*Figure 1*. *Aerial view of the Colorado River passing through the Grand Canyon in Arizona (McBride*) [18]

The Colorado River Basin (CRB) covers a drainage area of 246,000 square miles across California, Colorado, Nevada, New Mexico, Utah, Wyoming and Arizona. Studies have estimated that the basin provides $1.4 trillion of economic revenue per year, 16 million jobs across 7 states, water to 40 million people, and irrigation to approximately 5.5 million acres of agricultural land. For states such as New Mexico and Nevada, it represents more than two thirds of their annual GDP (65% and 87%, respectively). [1]

On August 21, 2021, the U.S. federal government declared its first-ever water cuts in the basin, due to the ongoing megadrought. The basin is estimated to be filled to 35% of its full capacity and has suffered a 20% decrease in inflow in the last century. [2] Rising temperatures caused in part by climate change have dried up the water supply of the basin, with certain areas experiencing more than double the global average temperature. Hotter temperatures have had three notable impacts on the drying of the basin: accelerated evaporation and snowpack melting, drying up soil before runoff reaches reservoirs, and increased wildfires which cause erosion of sediment into the basin.

The CRB finds itself at a critical juncture; as the population and demand in the area continues to grow, the water supply is only diminishing. Ideally, the basin should provide for municipal, agricultural, tribal, recreational and wildlife needs. Thus, appropriate water management policies will be foundational to the efficacy of water distribution in these critical times.

II. Brief History

Representatives from the seven Colorado River states negotiated and signed the Colorado River Compact on November 9th, 1922, and a century later, this compact still defines much of the management strategies of the CRB. The compact divided the basin into Upper and Lower sections and provided a framework for the distribution of water [3]. At the time of the signing, it was estimated that the annual flow of the basin was 16.4 million acre-feet (maf/y). Accordingly, the right to 7.5 maf/y of water was split between each portion of the basin. A more accurate estimate of total flow adjusted this to 13.5 maf/y with fluctuations between 4.4 maf/y to 22 maf/y [3]. State-specific allocations were defined in 1928 as part of the Boulder Canyon Act for the Lower Basin, and in 1948 for the Upper Basin in the Upper Colorado River Basin Compact [4]. Figure 2 displays water distribution on a state basis in million-acre feet/year.

*Figure 2. Distribution (maf-y) and uses of water from the Colorado River Basin [3]*

The system of water allocation, which is still in place today, dates back to 1876 when the Colorado Constitution put into place the Doctrine of Prior Appropriation. The core of the doctrine enforces that water rights can only be obtained for beneficial use. These rights can be bought, sold, inherited, and relocated. The doctrine gives owners of the land near the water, equal rights of use. This system regulates the uses of surface and tributary groundwater on a basis of priority. The highest priority are senior water rights holders. This group are those that have had the rights for the longest time and are typically best positioned in times of drought. Contrastingly, junior water rights holders are those that have obtained their water rights more recently and tend to be those whose supply is curtailed first in times of drought. This system is often described as “first-in-time, first-in-right” [5]. However, a key criticism of this doctrine is that it fails to encompass beneficial uses of water, which are particularly important when water is scarce.

Figure 3 summarizes the key historic moments of the Colorado River Basin from the Colorado Constitution in 1876 to the most recent 2023 negotiations. Some of the most notable events are the 1922 signing of the Colorado River Compact which created the foundation of division and apportionment of the water basin. Additionally, the construction of dams in the early to mid 20th century such as the Hoover Dam, Parker Dam, Glen Canyon, Flaming Gorge, Navajo and Curecanti have played an important role in determining water flow today. The formation of the Central Arizona Project in 1968, provided access to water for both agricultural lands and metropolitan areas such as the Maricopa, Pinal and Pima counties [6]. More recently, the critically low water levels of Lake Powell and the Glen Canyon Dam have raised concerns about their ability to generate hydropower. In early 2023, the seven states that utilize the CRB met in hopes of reaching an agreement on how to deal with the current shortages but the proposal failed.

*Figure 3. A Brief History of the Colorado River Basin from 1876 to Today [6]*

III. Key Players

Figure 4, highlights prominent key players in the Colorado River Basin who are frequently involved in negotiations and policy making.

*Figure 4. Main Actors of the Colorado River Basin*

IV. Current Situation

In January 2023, the states within the CRB failed to come to an agreement on an updated water consumption plan aimed to curbing the impacts of the megadrought. The proposed plan would require California to cut their water from 4.4 maf/y to 1 maf/y. The agreement, aimed at preventing Lake Powell and Mead from falling below the critical level for hydropower, proposed major water cuts for Southwestern states of California and Arizona and incorporated losses from evaporation into the cutbacks [7]. Although six states agreed on a plan to move forward, California rejected the major water cuts due to concerns related to agriculture and legal water rights status. The proposed cuts would primarily impact regions such as Imperial County which have senior rights to 3.1 maf/y and an agricultural revenue of $2.3 billion annually [8]. California proposed an alternative plan, which called for a 400,000 acre-foot reduction (for CA specifically), with additional reductions contingent upon the water level of Lake Mead in the future. While both plans are similar, the California plan is founded on waiting until the water level passes a critical threshold, while the plan from the other states is more preventative in nature [7].

In more recent news, the Biden Administration just proposed a plan to evenly cut water allocations to California, Arizona and Nevada by approximately one-quarter. An intervention from the federal government on such a scale would be unprecedented in the history of the region. The options considered include: taking no action, making reductions based on senior water rights, or making equal reductions across the three states. Opting for the first option would likely result to a dead pool in Lake Mead and Powell, a term used to describe when the water levels fall too low to continue flowing downstream [12]. The second option would favor California, as it is one of the states which holds the most seniority in the basin particularly in the Coachella and Imperial Valleys of Southern CA [9]. Consequently, this decision would more severely impact Arizona and Nevada, which are important swing states for the President and have an important tribal presence. The final decision is anticipated to be made in the summer of 2023 [10].

In many parts of California and Colorado, this winter has been particularly heavy in rain and snow, making people hopeful that the river could be replenished. Scholars estimate that it would require 3 years of average snow with no water consumption to fully restore the Colorado River Basin reservoirs and 7 years under current consumption activity [11]. Unfortunately, the basin’s soil is extremely dry, which means that any excess water that comes from the snowpack is likely to be absorbed by the ground before it has a chance to reach rivers and streams. It is estimated that by the end of 2023 Lake Powell will be at 3,555 feet of elevation (roughly 32% capacity). When Lake Powell reaches 3,490 feet, the Glen Canyon Dam will be unable to produce hydroelectric power. At 3,370 feet, a dead pool will be reached.

V. Paths to a Sustainable Future

As water supply in the CRB continues to diminish, it has become increasingly crucial to find ways to minimize water loss of the system. One of the major contributor is through evaporation, which accounts for approximately 1.5 maf/y of loss. This loss is more than Utah’s allocation from the Colorado River. In order to minimize evaporation loss, an ideal reservoir would be very deep and have small surface area. However, this is not the case for reservoirs like Lake Powell which loses approximately 0.86 maf/y (more than 6% of CRB annual flow) [13]. This raises the important question of how to best allocate the CRB water considering that some states experience more evaporation loss than others. According to research from CU Boulder, some possible solutions include covering the surface water with reflective materials, films of organic compounds, and lightweight shades. Additionally, relocating the reservoir water underground storage areas or aquifers could also serve to reduce evaporation [14].

An alternative approach is cloud seeding. In early March of 2023, the Federal Government invested $2.4 million in cloud seeding for the CRB. Cloud seeding is a technique used to artificially induce more precipitation by injecting ice nuclei, silver iodide or other small crystals into subfreezing clouds. This promotes condensation of water around the nuclei and the formation of rain drops which are estimated to increase precipitation by 5-15% [15]. The grant will be used to fund new cloud seeding generators which can be operated remotely as well as aircrafts for silver iodide injections. While this is a significant investment, cloud seeding has been practiced for decades in the CRB. Indeed, it is estimated that Colorado, Utah, and Wyoming each spend over $1 million annually on cloud seeding and Utah has planned to increase its spendings to $14 million next year [16]. While the negative impacts of cloud seeding are still unclear, some scholars believe that they could cause silver toxicity because of the use of potentially harmful chemicals. Additionally, the wind can sometimes blow the seeded clouds to a different location [17]. Ultimately, cloud seeding does not solve the underlying obstacles of climate change or aridification in the region, but it may help alleviate some of the impact from the drought until a more sustainable alternative can be found.

Work Cited

[1] “Economic Importance of the Colorado River.” The Nature Conservancy. The Nature Conservancy. Accessed December 1, 2021. https://www.nature.org/en-us/about-us/where-we-work/priority-landscapes/colorado-river/economic-importance-of-the-colorado-river/.

[2] Kann, Drew. “Climate Change Is Drying up the Colorado River, Putting Millions at Risk of ‘Severe Water Shortages’.” CNN. Cable News Network, February 22, 2020. https://www.cnn.com/2020/02/21/weather/colorado-river-flow-dwindling-warming-temperatures-climate-change/index.html.

[3] Megdal, Sharon B. “Sharing Colorado River Water: History, Public Policy and the Colorado River Compact.” The University of Arizona: Water Resources Research Center , 1 Aug. 1997, https://wrrc.arizona.edu/publication/sharing-colorado-river-water-history-public-policy-and-colorado-river-compact. 

[4] Water Education Foundation. (n.d.). Colorado River Compact. Water Education Foundation. Retrieved April 17, 2023, from https://www.watereducation.org/aquapedia-background/colorado-river-compact#:~:text=The%20Lower%20Basin%20states%20were,River%20Basin%20Compact%20of%201948. 

[5] Hockaday, S, and K.J Ormerod. “Western Water Law: Understanding the Doctrine of Prior Appropriation: Extension.” University of Nevada, Reno , University of Nevada, Reno, 2020, https://extension.unr.edu/publication.aspx?PubID=3750#:~:text=Senior%20water%20rights%20are%20often,farming%2C%20ranching%20and%20agricultural%20uses.&text=A%20claim%20to%20water%20that%20is%20more%20recent%20than%20senior,municipal%2C%20environmental%20or%20recreational%20uses. 

[6] State of the Rockies Project 2011-12 Research Team. “The Colorado River Basin: An Overview.” Colorado College, 2012.

[7] Partlow, Joshua. “As the Colorado River Dries up, States Can’t Agree on Saving Water.” The Washington Post, The Washington Post, 4 Apr. 2023, https://www.washingtonpost.com/climate-environment/2023/01/31/colorado-river-states-water-cuts-agreement/.

[8] Bland, Alastair. “California, Other States Reach Impasse over Colorado River.” CalMatters, 1 Feb. 2023, https://calmatters.org/environment/2023/01/california-colorado-river-water-2/. 

[9] Hager, Alex. “Six States Agree on a Proposal for Colorado River Cutbacks, California Has a Counter.” KUNC, NPR, 1 Feb. 2023, https://www.kunc.org/environment/2023-01-31/six-states-agree-on-a-proposal-for-colorado-river-cutbacks-california-has-a-counter.

[10] Flavelle, Christopher. “Biden Administration Proposes Evenly Cutting Water Allotments from Colorado River.” The New York Times, The New York Times, 11 Apr. 2023, https://www.nytimes.com/2023/04/11/climate/colorado-river-water-cuts-drought.html?campaign_id=190&%3Bemc=edit_ufn_20230411&%3Binstance_id=89950&%3Bnl=from-the-times&%3Bregi_id=108807334&%3Bsegment_id=130155&%3Bte=1&%3Buser_id=ea42cfd845993c7028deab54b22c44cd. 

[11] Mullane, Shannon. “Colorado’s Healthy Snowpack Promises to Offer Some Relief for Strained Water Supplies.” The Colorado Sun, The Colorado Sun, 14 Mar. 2023, https://coloradosun.com/2023/03/14/colorado-snowpack-water-supply-relief/.

[12]Tavernise, Sabrina, host. “7 States, 1 River and an Agonizing Choice.” The Daily, The New York Times, 31 Jan. 2023. https://www.nytimes.com/2023/01/31/podcasts/the-daily/colorado-river-water-cuts.html?showTranscript=1

[13] “Lake Powell Reservoir: A Failed Solution.” Glen Canyon Institute, Glen Canyon Institute , https://www.glencanyon.org/lake-powell-reservoir-a-failed-solution/.

[14] Blanken, Peter, et al. “Reservoir Evaporation a Big Challenge for Water Managers in West.” CU Boulder Today, 28 Dec. 2015, https://www.colorado.edu/today/2015/12/28/reservoir-evaporation-big-challenge-water-managers-west.

[15] McDonough, Frank. “What Is Cloud Seeding?” DRI, Desert Research Institute, 19 Sept. 2022, https://www.dri.edu/cloud-seeding-program/what-is-cloud-seeding/.

[16] Peterson, Brittany. “Feds Spend $2.4 Million on Cloud Seeding for Colorado River.” AP NEWS, Associated Press, 17 Mar. 2023, https://apnews.com/article/climate-change-cloud-seeding-colorado-river-f02c216532f698230d575d97a4a8ac7b.

[17] Rinkesh. “Various Pros and Cons of Cloud Seeding.” Conserve Energy Future, Conserve Energy Future, 28 July 2022, https://www.conserve-energy-future.com/pros-cons-of-cloud-seeding.php.

Work Cited for Photos

[18] McBride , P. (n.d.). The Colorado River winds through the Grand Canyon. photograph, Arizona.

[19] Kuhn, Eric, and John Fleck . “A Century Ago in Colorado River Compact Negotiations: Storage, Yes. but in the Compact?” Jfleck at Inkstain , 15 Nov. 2022, https://www.inkstain.net/2022/11/a-century-ago-in-colorado-river-compact-negotiations-storage-yes-but-in-the-compact/.

[20] “A Project for the Ages.” Bechtel Corporate, https://www.bechtel.com/projects/hoover-dam/.

[21] Lane, Taylor. “Lake Mead through the Decades – Lake Mead in 1950 Photograph from United States National Park Service Photography Collection .” Journal, Las Vegas Review-Journal, 17 Aug. 2022, https://www.reviewjournal.com/local/local-las-vegas/lake-mead-through-the-decades-photos-2602149/.

[22] “1944: U.S. Mexico Water Treaty.” Know Your Water News , Central Arizona Project , 25 Oct. 2022, https://knowyourwaternews.com/1944-u-s-mexico-water-treaty/.

[23] “Glen Canyon Unit – Arial View of the Glen Canyon Dam and Lake Powell.” Bureau of Reclamation – Upper Colorado Region , Bureau of Reclemation , https://www.usbr.gov/uc/rm/crsp/gc/.

[24] “Reclamation and Arizona.” Reclamation, U.S Department of the Interior , https://www.usbr.gov/lc/phoenix/AZ100/1960/supreme_court_AZ_vs_CA.html.

[25] Ferris, Kathleen. “CAP: Tracking the Flow of Colorado River Water to Your City.” AMWUA, Arizona Municipal Water Users Association, 19 Oct. 2015, https://www.amwua.org/blog/cap-tracking-the-flow-of-colorado-river-water-to-your-city.

March 14, 2023 by Lillian Lau

Causality for Dummies

The name of this post speaks for itself – this will be a soft introduction to causality, and is intended to increase familiarity with the ideas, methods, applications and limitations of causal theory. Specifically, we will walk through a brief introduction of causality and compare it with correlation. We will also distinguish causal discovery from causal inference, the types of datasets commonly encountered and provide a (very) brief overview of the methods that can be applied towards both. This post also includes a list of commonly-encountered terms within the literature, as well as prior posts written on this topic.

Proceed if you’d like to be less of a causality dummy!

Introducing Causality

Note: The terms “event” and “variable” will be used interchangeably in this post.

Causality has roots and applications in a diverse set of field: philosophy, economics, mathematics, neuroscience – just to name a few. The earliest formalization for causality can be attributed to Aristotle, while modern causality as we understand it stem from David Hume, an 18th-century philosopher (Kleinberg, 2012). Hume argued that causal relationships can be inferred from observations and is conditional upon the observer’s beliefs and perceptions. Formally, however, causation can be defined as the contribution of an event to the production of other events. Causal links between events can be uni- or bi-directional – that is

Event X causes event Y where the reverse is false, or
Event X causes event Y where the reverse is true.

Such causal links have to be first detected and and then quantified to confirm the existence of a causal relationship between two events or variables, where the strength of these relationships have been measured using some form of a score-based strength measure (where the score has to exceed a specific threshold for it to be “counted” as a causal link), or using statistical models to calculate conditional independence.

Before we get down into the details on the methods that enable aforementioned quantification, let’s confront the elephant in the room:

Correlation vs Causation

The distinction between correlation and causation is often-muddled. Fortunately, there exists a plethora of analogies as to why correlation is not causation. My favorite so far is the global warming and pirates analogy, where increasing global mean temperature was found to be negatively correlated with a consistent decrease in the number of pirates. To demonstrate my point:

This plot is trying to tell you something that will save the Earth. (source: Baker, 2020)

…so does this mean that I should halt my PhD and join the dwindling ranks of pirates to save planet Earth?

Well, no (alas). This example demonstrates why relationships between such highly-correlated variables can be purely coincidental, with no causal links. Instead, there are external factors that may have actually the number of pirates to decrease, and independently, the global mean temperature to rise. Dave’s, Rohini’s and Andrew’s blog posts, as well as this website provide more examples that further demonstrate this point. Conversely, two variables that are causally linked may not be correlated. This can happen when there is a lack of change in variables being measured, sometimes caused by insufficient or manipulated sampling (Penn, 2018).

Correlation is limited by its requirement that the relationship between two variables be linear. It also does not factor time-ordering and time-lags of the variables’ time series. As a result, depending on correlation to assess a system may cause you to overlook the many relationships that might exist within it. Correlation is therefore a useful in making predictions, where trends in one variable can be used to predict the trends of another. Causation, on the other hand, can be used in making decisions, as it can help to develop a better understanding of the cause-and-effects of changes made to system variables.

Now that we’ve distinguished causation from its close (but troublesome) cousin, let’s begin to get into the meat of causality with some commons terms you might encounter as you being exploring the literature.

A quick glossary

The information from this section is largely drawn from Alber (2022), Nogueira et al. (2022), Runge et al. (2019), and Weinberger (2018). It should not be considered exhaustive, by any measure, and should only be used to get your bearings as you begin to explore causality.

Causal faithfulness (“faithfulness”): The assumption that causally-connected events be probabistically dependent on each other.

Causal Markov condition: The assumption that, in a graphical model, Event Y is independent of every other event, conditional on Y’s direct causes.

Causal precedence: The assumption that Event A causes Event B if A happens before B. That is, events in the present cannot have caused events in the past.

Causal sufficiency: The assumption that all possible direct common causes of an event, or changes to a variable, have been observed.

Conditional independence: Two events or variables X and Y are conditionally independent (it is known that X does not cause Y, and vice versa) given information on an additional event or variable Z.

Confounders: “Interfering” variables that influence both the dependent and independent variables, therefore making it more challenging, or confounding, to verify to presence of a causal relationship.

Contemporaneous links: A causal relationship that exists between variables in the same time step, therefore being “con” (with, the same time as) and “temporary”. The existence of such links is one instance in which the causal precedence assumption is broken.

Directed acyclic graphs (DAGs): A graphical representation of the causal relationships between a set of variables or events. These relationships are known to be, or assumed to be, true.

Edges: The graphical representation of the links connecting two variables or events in a causal network or graph. These edges may or may not represent causal links.

Granger causality: Event X Granger-causes Event Y if predicting Y based on its own past observations and past observations of X performs better than predicting Y solely on its own past observations.

Markov equivalence class: A set of graphs that represent the same patterns of conditional independence between variables.

Nodes: The graphical representation of two events (or changes to variables).

Causality: Discovery, Inference, Data and Metrics

Causality and its related methods are typically used for two purposes: causal discovery and causal inference. Explanations for both are provided below.

Causal Discovery

Also known as causal structural learning (Tibau et al., 2022), the goal of causal discovery is to obtain causal information directly from observed or historical data. Methods used for causal discovery do not assume implicit causal links between the variables within a dataset. Instead, they begin with the assumption of a “clean slate” and attempts to generate (then analyze) models to illustrate the inter-variable links inherent to the dataset thus preserving them. The end goal of causal discovery is to approximate a graph that represents the presence or absence of causal relationships between a set of two or more nodes.

Causal Inference

Causal inference uses (and does not generate) causal graphs, focusing on thoroughly testing the truth of a the causal relationship between two variables. Unlike causal discover, it assumes that a causal relationship already exists between two variables. Following this assumption, it tests and quantifies the actual relationships between variables in the available data. It is useful for assessing the impact of one event, or a change in one variable, on another and can be applied towards studying the possible effects of altering a given system. Here, causal inference should not be confused with sensitivity analysis. The intended use of sensitivity analysis is to map changes in model output to changes in its underlying assumptions, parameterizations, and biases. Causal inference is focused on assessing the cause-and-effect relationships between variables or events in a system or model.

Data used in causal discovery and causal inference

There are two forms of data that are typically encountered in literature that use causal methods:

Cross-sectional data

Data in this form is non-temporal. Put simply, all variables available in cross-sectional data represents a single point in time. It may contain observations of multiple individuals, where each observation represents one individual and each variable contains information on a different aspect of the individual. The assumption of causal precedence does not hold for such datasets, and therefore requires additional processing to develop causal links between variables. Such datasets are handled using methods that measure causality using advanced variations of conditional independence tests (more on this later). An example of a cross-sectional dataset is the census data collected by the U.S. Census Bureau once every decade.

Longitudinal data

This form of data is a general category that also contains time-series data, which consists of a series of observations about a single (usually) subject across some time period. Such datasets are relatively easy to handle compared to cross-sectional data as the causal precedence assumption is (often) met. Therefore, most causality methods can be used to handle longitudinal data, but some common methods include the basic forms of Granger causality, cross-convergent mapping (CCM), and fast conditional independence (FCI). An example of longitudinal data would be historical rainfall gauge records of precipitation over time.

Causality Methods

The information from this section largely originates from Runge et al. (2019), Noguiera et al. (2022), and Ombadi et al. (2020).

There are several methods can can be used to discover or infer causality from the aforementioned datasets. In general, these methods are used to identify and extract causal interactions between observed data. The outcomes of these methods facilitate the filtering of relevant drivers (or variables that cause observed outcomes) from the larger set of potential ones and clarify inter-variable relationships that are often muddied with correlations. These methods measure causality in one of two ways:

Score-based methods: Such methods assign a “relevance score” to rank each proposed causal graph based on the likelihood that they accurately represent the conditional (in)dependencies between variables in a dataset. Without additional phases to refine their algorithms, these methods are computationally expensive as all potential graph configurations have to be ranked.
Constraint-based methods: These methods employ a number of statistical tests to identify “necessary” causal graph edges, and their corresponding directions. While less computationally expensive than basic score-based methods, constraint-based causality methods are limited to evaluating causal links for one node (that represents a variable) at a time. Therefore, it cannot evaluate multiple variables and potential multivariate causal relationships. It’s computational expense is also proportional to the number of variables in the dataset.

Now that we have a general idea of what causality methods can help us do, and what , let’s dive into a few general classes of causality methods.

Granger Causality

Often cited as one of the earliest mathematical representations of causality, Granger causality is a statistical hypothesis test to determine if one time series is useful in forecasting another time series based in prediction. It was introduced by Clive Granger in the 1960s and has widely been used in economics since, and more recently has found applications in neuroscience and climate science (see Runge et al. 2018, 2019). Granger causality can be used to characterize predictive causal relationships and to measure the influence of system or model drivers on variables’ time series. These relationships and influences can be uni-directional (where X Granger-causes Y, but not Y G.C. X) or bi-directional (X G.C. Y, and Y G.C. X).

A time series demonstration on how variable X G.C. Y (source: Google).

Due to its measuring predictive causality, Granger causality is thus limited to systems with independent driving variables. It cannot be used to assess multivariate systems or systems in which conditional dependencies exist, both of which are characteristic of real-world stochastic and linear processes. It is also requires separability where the causal variable (the cause) has to be independent of the influenced variable (the effect), and assumes causal precedence. It is nonetheless a useful preliminary tool for causal discovery.

Nonlinear state-space methods

An alternative to Granger causality are non-linear state-space methods, which includes methods such as convergent cross mapping (CCM). Such methods assumes that variable interactions occur in an underlying deterministic dynamical system, and then attempts to uncover causal relationships based on Takens’ theorem (see Dave’s blog post here for an example) by reconstructing the nonlinear state-space. The key idea here is this: if event X can be predicted using time-delayed information from event Y, then X had a causal effect on Y.

Convergent Cross Mapping (CCM)

Convergent Cross Mapping (Sugihara et al., 2012; Ye et al., 2015) tests the reliability of a variable Y as an estimate or predictor of variable X, revealing weak non-linear interactions between time series which might otherwise have been missed (Delforge et al, 2022). The CCM algorithm involves generating the system manifold M, the X-variable shadow manifold M_X, and the Y-variable shadow manifold M_Y. The algorithm then samples an arbitrary set of nearest-neighbor (NN) points from M_X, then determines if they correspond to neighboring points in M_Y. If X and Y are causally linked, they should share M as a common “attractor” manifold (a system state that can be sustained in the short-term). Variable X can therefore be said to inform variable Y .but not vice versa (unidirectionally lined). CCM can also be used to detect causality due to external forcings, where X and Y do not interact by may be driven by a common external variable Z. It is therefore best suited for causal discovery.

While CCM does not rely on conditional independence, it assumes the existence of a deterministic (but unknown) underlying system (i.e. a computer program, physics-based model) which can be represented using M. Therefore, it does not work well for stochastic time series (i.e. streamflow, coin tosses, bacteria population growth). The predictive ability of CCMs are also vulnerable to noise in data. Furthermore, it requires a long time series to de a reliable measure of causality between two variables, as a longer series decreases the NN-distance on each manifold thus improving the ability of the CCM to predict causality.

Causal network learning algorithms

On the flipside of CCMs, causal network learning algorithms assume that the underlying system in which the variables arise to be purely stochastic. These class of algorithms add or remove edges to causal graphs using criteria based in conditional or mutual independence, and assumes that both the causal Markov and faithfulness conditions holds true for all proposed graph structures. They are therefore best used for causal inference, and can be applied to cross-sectional data, as well as linear and non-linear time series.

These algorithms result in a “best estimate” graphs where all edges have the associated optimal conditional independences that best reflect observed data. They employ two stages: the skeleton discovery phase where non-essential links are eliminated, and the orientation phase where the directionality of the causal links are finalized. Because of this, these algorithms can be used to reconstruct large-scale, high-dimensional systems with interacting variables. Some of the algorithms in this class are also capable of identifying the direction of contemporaneous links, thus not being beholden to the causal precedence assumption. However, such methods can only estimate graphs up to a Markov equivalence class, and require a longer time series to provide a better prediction of causal relationships between variables.

General framework of all causal network learning algorithms (source: Runge et al., 2019).

Let’s go over a few examples of causal network learning algorithms:

The PC (Peter-Clark) algorithm

The PC algorithm begins will a fully connected graph in its skeleton phase an iteratively removes edges where conditional independences exists. It then orients the remaining edges in its orientation phase. It its earliest form, the PC algorithm was limited due to assumptions on causal sufficiency, and the lack of contemporaneous dependency handling. It also did not scale well to high-dimensional data.

Later variations attempted to overcome these limitations. For example, the momentary conditional independence PC (PCMCI) and the PCMCI+ algorithms added a further step to determine causal between variables in different timesteps and to find lagged and contemporaneous relationships separately, therefore handling contemporaneity. The PC-select variation introduced the ability to apply conditional independence tests on target variables, allowing it to process high-dimensional data. These variations can also eliminate spurious causal links. However, the PC algorithm and its variables still depend on the causal Markov, faithfulness, and sufficiency assumptions. The causal links that it detects are also relative to the feature space (Ureyen et al, 2022). This means that the directionality (or existence) of these links may change if new information is introduced to the system.

Full conditional independence (FCI)

Unlike the PC-based algorithms, FCI does not require that causal sufficiency be met although it, too, is based on iterative conditional independence tests and begins will a complete graph. Another differentiating feature between PC and FCI is the lack of the assumption of causal links directionality in the latter. Instead of a uni- or bi-directional orientation that the PC algorithm eventually assigns to its causal graph edges, the FCI has four edge implementations to account for the possibility of spurious links. Given variables X and Y, FCI’s edge implementations are as follows:

X causes Y
X causes Y or Y causes X
X causes Y or there are unmeasured confounding variables
X causes Y, Y causes X, there are unmeasured confounding variables, or some combination of both

There are also several FCI variations that allow improved handling of large datasets, high dimensionality, and contemporaneous variables. For example, the Anytime and the Adaptive Anytime FCI restricts the maximum number of variables to be considered as drivers, and the time series FCI (TsFCI) uses sliding windows to transform the original, long time series into a set of “independent” subsamples that can be treated as cross-sectional. To effectively use FCI, however, the data should be carefully prepared using Joint Causal Inference (JCI) to allow the generated graph to include both variable information and system information, to account for system background knowledge (Mooij et al., 2016).

Structural causal models (SCMs)

Similar to causal network learning algorithms, SCMs assumes a purely stochastic underlying system and uses DAGs to model the flow of information. It also can detect causal graphs to within a Markov equivalence class. Unlike causal network learning algorithms, SCMs structure DAGs that consist of a set of endogenous (Y) and exogenous (X) variables that are connected by a set of functions (F) that determine the values of Y based on the values of X. Within this context, a node represents the a variable x and y in X and Y, while an edge represents a function f within F. By doing so, SCMs enable the discovery of causal directions in cases where the causal direction cannot be inferred with conditional independence-based methods. SCMs can also handle a wide range of systems (linear, non linear, various noise probability distributions). This last advantage is one of the limitations of SCMs: it requires that some information on underlying structure of the system is known a priori (e.g. the system is assumed to be linear with at least one of the noise terms being drawn from a Gaussian distribution). SCMs are best used for causal inference, as causal links between variables have to be assumed during the generation of SCMs.

Information-theoretic algorithms

Finally, we have information-theoretic (IT) algorithms, which are considered an extension of the GC methods and allows the verification of nonlinear relationships between system variables, and therefore is best used for causal inference. IT algorithms measure transfer entropy (TE), which is defined as the amount of shared information between variables X and Y when both are conditioned on external variable Z. The magnitude of TE reflects the Shannon Entropy reduction in Y when, given Z, information on X is added to the system. For further information on IT and TE, Andrew’s blog post and Keyvan’s May 2020 and June 2020 posts further expand on the theory and application of both concepts.

There are a couple of assumptions that come along with the use of IT algorithms. First, like both SCMs and causal learning network algorithms, it assumes that the underlying system is purely stochastic. It is also bound to causal precedence, and asssumes that the causal variable X provides all useful information for the prediction of the effect Y, given Z. In addition, IT algorithms benefit from longer time series to improve predictions of causal links between variables. On the other hand, it does not make assumptions about the underlying structure of the data and can detect both linear and nonlinear causal relationships.

Prior WaterProgramming blog posts

That was a lot! But if you would like a more detailed dive into causality and/or explore some toy problems, there are a number of solid blog posts written that focus on the underlying math and concepts central to the approaches used in causal discovery and/or inference:

Summary and key challenges

In this blog post, we introduced causality and compared it to correlation. We listed a glossary of commonly-used terms in causality literature, as well as distinguished causal discovery from causal inference. Next, we explored a number of commonly-used causality methods: Granger causality, CCM, conditional independence-based causal learning network algorithms, SCMs, and information-theoretic algorithms.

From this overview, it can be concluded that methods to discovery and infer causal relationships are powerful tool that enable us to identify cause-and-effect links between seemingly unrelated system variables. Improvements to these methods are pivotal to improve climate models, increase AI explainability, and aid in better, more transparent decision-making. Nevertheless, these methods face challenges (Tibau et al. 2022) that include, but are not limited to:

Handling gridded or spatio-temporally aggregated data
Representing nonlinear processes that may interact across time scales
Handling non-Gaussian variable distributions and data non-stationarity
Handling partial observability where only a subset of system variables is observed, thus challenging the causal sufficiency assumption
Uncertainty: Non-stationarity, noise, internal variability
Dealing with mixed data types (discriete vs continuous)
Lack of benchmarking approaches due to lack of ground truth data

This brings us to the end of the post – do take a look at the References for a list of key literature and online articles that will be helpful as you begin learning about causality. Thank you for sticking with me and happy exploring!

References

Alber, S. (2022, February 9). Directed Acyclic Graphs (DAGs) and Regression for Causal Inference. UC David Health. Davis; California. Retrieved Match 14, 2023, from https://health.ucdavis.edu/ctsc/area/Resource-library/documents/directed-acyclic-graphs20220209.pdf

Baker, L. (2020, July 9). Hilarious graphs (and pirates) prove that correlation is not causation. Medium. Retrieved March 14, 2023, from https://towardsdatascience.com/hilarious-graphs-and-pirates-prove-that-correlation-is-not-causation-667838af4159

Delforge, D., de Viron, O., Vanclooster, M., Van Camp, M., & Watlet, A. (2022). Detecting hydrological connectivity using causal inference from time series: Synthetic and real Karstic case studies. Hydrology and Earth System Sciences, 26(8), 2181–2199. https://doi.org/10.5194/hess-26-2181-2022

Gonçalves, B. (2020, September 9). Causal inference - part IV - structural causal models. Medium. Retrieved March 13, 2023, from https://medium.data4sci.com/causal-inference-part-iv-structural-causal-models-df10a83be580

Kleinberg, S. (2012). A Brief History of Causality (Chapter 2) – Causality, Probability, and Time. Cambridge Core. Retrieved March 14, 2023, from https://www.cambridge.org/core/books/abs/causality-probability-and-time/brief-history-of-causality/C87F30B5A6F4F63F0C28C3156B809B9E

Mooij, J. M., Sara, M., & Claasen, T. (2022). Joint Causal Inference from Multiple Contexts. Journal of Machine Learning Research 21, 21(1). https://doi.org/https://doi.org/10.48550/arXiv.1611.10351

Nogueira, A. R., Pugnana, A., Ruggieri, S., Pedreschi, D., & Gama, J. (2022). Methods and tools for causal discovery and causal inference. WIREs Data Mining and Knowledge Discovery, 12(2). https://doi.org/10.1002/widm.1449

Ombadi, M., Nguyen, P., Sorooshian, S., & Hsu, K. (2020). Evaluation of methods for causal discovery in hydrometeorological systems. Water Resources Research, 56(7). https://doi.org/10.1029/2020wr027251

Penn, C. S., (2020, August 25). Can causation exist without correlation? Yes! Christopher S. Penn – Marketing Data Science Keynote Speaker. Retrieved March 14, 2023, from https://www.christopherspenn.com/2018/08/can-causation-exist-without-correlation/

Runge, J. (2018). Causal network reconstruction from time series: From theoretical assumptions to practical estimation. Chaos: An Interdisciplinary Journal of Nonlinear Science, 28(7), 075310. https://doi.org/10.1063/1.5025050

Runge, J., Bathiany, S., Bollt, E., Camps-Valls, G., Coumou, D., Deyle, E., Glymour, C., Kretschmer, M., Mahecha, M. D., Muñoz-Marí, J., van Nes, E. H., Peters, J., Quax, R., Reichstein, M., Scheffer, M., Schölkopf, B., Spirtes, P., Sugihara, G., Sun, J., Zscheischler, J. (2019). Inferring causation from time series in Earth System Sciences. Nature Communications, 10(1). https://doi.org/10.1038/s41467-019-10105-3

Sugihara, G., May, R., Ye, H., Hsieh, C.-hao, Deyle, E., Fogarty, M., & Munch, S. (2012). Detecting causality in complex ecosystems. Science, 338(6106), 496–500. https://doi.org/10.1126/science.1227079

Tibau, X.-A., Reimers, C., Gerhardus, A., Denzler, J., Eyring, V., & Runge, J. (2022). A spatiotemporal stochastic climate model for benchmarking causal discovery methods for teleconnections. Environmental Data Science, 1. https://doi.org/10.1017/eds.2022.11

Uereyen, S., Bachofer, F., & Kuenzer, C. (2022). A framework for multivariate analysis of land surface dynamics and driving variables—a case study for Indo-Gangetic River basins. Remote Sensing, 14(1), 197. https://doi.org/10.3390/rs14010197

Weinberger, N. (2017). Faithfulness, coordination and causal coincidences. Erkenntnis, 83(2), 113–133. https://doi.org/10.1007/s10670-017-9882-6

Ye, H., Deyle, E. R., Gilarranz, L. J., & Sugihara, G. (2015). Distinguishing time-delayed causal interactions using convergent cross mapping. Scientific Reports, 5(1). https://doi.org/10.1038/srep14750

October 21, 2022 by Trevor Amestoy

Markdown-Based Scientific and Computational Note Taking with Obsidian

Motivation

Over the last year, being in the Reed Research Group, I have been exposed to new ideas more rapidly than I can manage. During this period, I was filling my laptop storage with countless .docx, .pdf, .txt, .html files semi-sporadically stored across different cloud and local storages.

Finding a good method for organizing my notes and documentation has been very beneficial for my productivity, processing of new ideas, and ultimately staying sane. For me, this has been done through Obsidian.

Obsidian is an open-source markdown (.md) based note taking and organization tool, with strengths being:

Connected and relational note organization
Rendering code and mathematical (LaTeX) syntax
Community-developed plugins
Light-weight
A nice aesthetic

The fact that all notes are simply .md files is very appealing to me. I don’t need to waste time searching through different local and cloud directories for my notes, and can avoid having OneNote, Notepad, and 6 Word windows open at the same time.

Also, I like having localized storage of the notes and documentation that I am accumulating. I store my Obsidian directory on GitHub, and push-pull copies between my laptop and desktop, giving me the security of having three copies of my precious notes at all times.

I’ve been using Obsidian as my primary notetaking app for a few months now, and it become central to my workflow. In this post, I show how Obsidian extends Markdown feature utility, helps to organize topical .md libraries, and generally makes the documentation process more enjoyable. I also try to explain why I believe Obsidian is so effective for researchers or code developers.

Note Taking in Markdown

Markdown (.md) text is an efficient and effective way of not just documenting code (e.g., README.md files), but is great for writing tutorials (e.g., in JupyterNotebook), taking notes, and even designing a Website (as demonstrated in Andrew’s Blog Post last week).

In my opinion, the beauty is that only a few syntax tricks are necessary for producing a very clean .md document. This is particularly relevant as a programmer, when notes often require mathematical and code syntax.

I am not going to delve into Markdown syntax. For those who have not written in Markdown, I suggest you see the Markdown Documentation here. Instead, I focus on how Obsidian enables a pleasant environment in which to write .md documentation.

Obsidian Key Features

Below, I hit on what I view as the key strengths of Obsidian:

Connected and relational note organization
Rendering code and mathematical (LaTeX) syntax
Community-developed plugins

Relational Note Organization

If you take a quick look at any of the Obsidian forums, you will come across this word: Zettelkasten.

Zettelkasten, meaning "slip-box" in German and also referred to as card file, is a note taking strategy which encourages the use of many highly-specific notes which are connected to one another via references.

Obsidian is designed to help facilitate this style of note taking, with the goal of facilitating what they refer to as a "second brain". The goal of this relational note taking is to store not just information about a single topic but rather a network of connected information.

To this end, Obsidian helps to easily navigate these connections and visualize relationships stored within a Vault (which is just a fancy word for one large folder directory). Below is a screen cap of my note network developed over the last ~4 months. On the left is a visualization of my entire note directory, with a zoomed-in view on the right.

Notice the scenario discovery node, which has direct connections to methodological notes on Logistic Regression, Boosted Trees, PRIM, and literature review notes on Bryant & Lempert 2010, a paper which was influential in advocating for participatory, computer-assisted scenario discovery.

Each of these nodes is then connected to the broader network of notes and documentation.

These relations are easily constructed by linking other pages within the directory, or subheadings within those pages. When writing the notes, you can then click through the links to flip between files in the directory.

Links to Internal Pages and Subheadings

Referencing other pages (.md files) in your library is done with a double square bracket on either side: [[Markdown cheatsheet]]

You can link get down to finer resolution and specifically reference various levels of sub-headings within a page by adding a hashtag to the internal link, such as: [[Markdown cheatsheet#Basics#Bold]]

Tags and Tag Searches

Another tool that helps facilitate relational note taking is the use of #tags. Attach a # to any word within any of your notes, and that word becomes connected to other instances of the word throughout the directory.

Once tags have been created, they can be searched. The following Obsidian syntax generates a live list of occurrences of that tag across your entire vault:

```query
tag: #scenarios

Which produces the window:

Rending code and math syntax

Language-Specific Code Snippets

Obsidian will beautifully stylized code snippets using language-specific formatting, and if you don’t agree then you change change your style settings.

A block of code is specified, for some specific language using the tripple-tic syntax as such:

```langage
<Enter code here>

The classic HelloWorld() function can be stylistically rendered in Python or C++:

LaTeX

As per usual, the $ characters are used to render LaTeX equations. Use of single-$ characters will results in in-line equations ( $<Enter LaTeX>$ ) with double-$$ used for centered equations.

Obsidian is not limited to short LaTeX equations, and has plugins designed to allow for inclusion other LaTeX packages or HTML syntaxes.

latex $$ \phi_{t,i} = \left\{ \begin{array}\\ \phi_{t-1,i} + 1 & \text{if } \ z_t < \text{limit}\\ 0 & \text{otherwise.} \end{array} \right. $$

will produce:

Plugins

Obsidian boasts an impressive 667 community-developed plugins with a wide variety of functionality. A glimpse at the webpage shows plugins that give more control over the visual interface, allow for alternative LaTeX environments, or allow for pages to be exported to various file formats.

Realistically, I have not spent a lot of time working with the plugins. But, if you are someone who likes the idea of a continuously evolving and modifiable documentation environment then you may want to check them out in greater depth.

Conclusion: Why did I write this?

This is not a sponsered post in any way, I just like the app.

When diving into a new project or starting a research program, it is critical to find a way of taking notes, documenting resources, and storing ideas that works for you. Obsidian is one tool which has proven to be effective for me. It may help you.

Best of luck with your learning.

January 23, 2020 by Antonia Hadjimichael

So What’s the Rationale Behind the Water Programming Blog?

By Patrick M. Reed

In general, I’ve made an effort to keep this blog focused on the technical topics that have helped my students tackle various issues big and small. It helps with collaborative learning and maintaining our exploration of new ideas.

This post, however, represents a bit of departure from our normal posts in response to some requests for a suggested reading guide for my Fall 2019 AGU Paul A. Witherspoon Lecture entitled “Conflict, Coordination, and Control in Water Resources Systems Confronting Change” (on Youtube should you have interest).

The intent is to take a step back and zoom out a bit to get the bigger picture behind what we’re doing as research group and much of the original motivation in initiating the Water Programming Blog itself. So below, I’ll provide a summary of papers that related to the topics covered in the lecture sequenced by the talk’s focal points at various points in time. Hope this provides some interesting reading for folks. My intent here is to keep this informal. The highlighted reading resources were helpful to me and are not meant to be a formal review of any form.

So let’s first highlight Paul Witherspoon himself, a truly exceptional scientist and leader (7 minute marker, slides 2-3).

His biographical profile
A summary of his legacy
The LBL Memo Creating the Earth Sciences Division (an example of institutional change)

Next stop, do we understand coordination, control, and conflicting objectives in our institutionally complex river basins (10 minute marker, slides 6-8)? Some examples and a complex systems perspective.

The NY Times Bomb Cyclone Example
Interactive ProPublica and The Texas Tribune Interactive Boomtown, Flood Town (note this was written before Hurricane Harvey hit Houston)
A Perspective on Interactions, Multiple Stressors, and Complex Systems (NCA4)

Does your scientific workflow define the scope of your hypotheses? Or do your hypotheses define how you need to advance your workflow (13 minute marker, slide 9)? How should we collaborate and network in our science?

Perspectives and background on Artificial Intelligence (15 minute marker, slides 10-16)

The Wicked Problems Debate (~22 minute marker, slides 17-19) and the emergence of post normal science and decision making under deep uncertainty.

Lastly, the Vietnam and North Carolina application examples.

August 26, 2018 by Bernardo Trindade

Converting Latex to MS Word docx (almost perfectly)

Many of students and academics write academic papers and reports on Latex due to the ease of formatting and managing citations and bibliography. However, collaborative editing Latex tools have not reached the level of versatility and convenience of Microsoft Word and Google Docs. We then often begin writing the paper in Word when major comments, additions and changes are made, and later manually translate it to Latex, which is a tedious task and just a waste of time.

A way around this hurdle is to use Pandoc to automatically convert Latex files to Word documents. To install Pandoc on Linux, open a terminal run sudo apt-get install pandoc pandoc-citeproc texlive, while Windows executables and MacOS instructions are available here. To run Pandoc on a Latex document:

Open a terminal (on Windows, hold the Windows key and press “r,” then type “cmd” in the command bar)
Use the “cd” command to navigate to the folder where your Latex document it.
Type pandoc -s latex_document.tex --bibliography=bib_file.bib -o output_word_document.docx.

Now you should have a Word document with all your bitmap (png, jpeg, bmp, etc.) figures, equations in Word format and with a bibliography based on your citations and bib file. If the latest version of Pandoc does not work, try version 1.19.1.

There are some known limitations, though. Numbering and referencing equations, figures and tables, and referencing sections will not work and you will have to number those by hand. This limitation seems to apply only to outputting Word documents and you can circumvent them by outputting to other formats with Pandoc reference filters plus the appropriate Pandoc calls and specific latex syntax (see filters’ pages for details). Vectorized figures will not be included in the output document (svg, eps, pdf, etc.). On the other hand, bitmaps will appear in the output document and will preserve their captions.

Hopefully some skilled programmer feeling like generously volunteering his time for the Latex academic cause will address the mentioned limitations soon. Until then, we will have to go around them by hand.

August 11, 2017 by Antonia Hadjimichael

Glossary of commonly used terms

I have recently started training with the group and coming from a slightly different research background I was unfamiliar with some a lot of the terminology. I thought it might be useful to put together a glossary of sorts containing the terms that someone new to this field of study might not intuitively understand. The idea is that when someone encounters an unfamiliar term while going through the training or reading through the material, they can come to the blog glossary and quickly Ctrl-F the term (yes, that is a keyboard shortcut used as a verb).

The definitions are not exhaustive, but links/references are provided so the reader can find additional material. The glossary is a work in progress, some definitions are still incomplete, but it will be regularly updated. I’m also probably the least qualified person in the group to provide the definitions, so any corrections/suggestions are more than welcome. If there’s any other term I’ve omitted and you feel should be included, please leave a comment so I can edit the post.

Glossary

Adaptive metropolis algorithm

It is based on the classical random walk Metropolis algorithm and adapts continuously to the target distribution. ¹

Akaike’s Information Criterion (AIC)

A measure of the relative quality of statistical models for a given set of data.²

AMALGAM

MOEA that applies a multi-algorithm search that combines the NSGAII, particle swarm optimization, differential evolution, and adaptive metropolis.³

Approximation set

The set of solutions output by a multi-objective evolutionary algorithm approximating the Pareto front.⁴

Archive

A secondary population used by many multi-objective evolutionary algorithms to store non-dominated solutions found through the generational process.⁵

Attainment

Attainment plot

Bayesian Information Criterion

Borg MOEA

Iterative search algorithm for multi-objective optimization. It combines adaptive operator selection with ε-dominance, adaptive population sizing and time continuation. ⁶

Classification and Regression Trees (CART)

Decision tree algorithms that can used for classification and regression analysis of predictive modeling problems.⁷

Closed vs. open loop control

Concavity

See Convexity.

Constraints

Restrictions imposed on the decision space by the particular characteristics of the system. They must be satisfied for a certain solution to be considered acceptable.⁵

Control map

Controllability

Refers to whether the parameterization choices have significant impacts on the success or failure for an algorithm.

Convergence

Progress towards the reference set.

Convexity

Geometrically, a function is convex is a line segment drawn from any point along function f to any other point along f lies on or above the graph of f. For optimization problems, a problem is convex if the objective function and all constraints are convex functions if minimizing, and concave is maximizing.

(The) Cube

The Reed Group’s computing cluster run by the Center for Advanced Computing (CAC) at Cornell. The cluster has 32 nodes, each with Dual 8-core E5-2680 CPUs @ 2.7 GHz and 128 GB of RAM. For more information see: https://www.cac.cornell.edu/wiki/index.php?title=THECUBE_Cluster

Decision space

The set of all decision variables.

Decision variables

The numerical quantities (variables) that are manipulated during the optimization process and they represent how our actions are encoded within the problem.⁵

Deterioration

When elements of a solution set at a given time are dominated by a solution set the algorithm maintained some time before.⁵

Differential evolution

Evolutionary algorithm designed to optimize problems over continuous domains.⁵

Direct policy search

Dominance resistance

The inability of an algorithm to produce offspring that dominates poorly performing, non-dominated members of the population. (See also Pareto dominance).⁸

DTLZ problems

A suite of representative test problems for MOEAs that for which the analytical solutions have been found. The acronym is a combination of the first letters of the creators’ last names (Deb, Thiele, Laumanns, Zitzler). DTLZ problems have been used for benchmarking and diagnostics when evaluating the performance of MOEAs.

Dynamic memory

Elitism

Refers to a case where as evolution progresses, non-dominated solutions will not be lost in subsequent generations.

Epsilon (ε) dominance

When dominance is determined by use of a user-specified precision to simplify sorting. Pareto epsilon (ε) optimality and Pareto epsilon (ε) front are defined accordingly. (See also Pareto dominance).⁵

Epsilon (ε) dominance archive

Epsilon (ε)-box dominance archive

ε-MOEA

A steady-state MOEA, the first to incorporate ε-dominance archiving into its search process.

Epsilon (ε) indicator

Additive ε-indicator (ε₊-indicator) (performance metric)

The smallest distance ε that the approximation set must be translated by in order to completely dominate the reference set.⁴

Epsilon (ε) progress

Equifinality

Evolutionary algorithms

A class of search and optimization algorithms inspired by processes of natural evolution.⁵

Multi-objective evolutionary algorithms (MOEAs)

Evolutionary algorithms used for solving multi-objective problems.⁵

Evolutionary operators

They operate on the population of an evolutionary algorithm attempting to generate solutions with higher and higher fitness.⁵

Mutation evolutionary operators

They perturb the decision variables of a single solution to look for improvement in its vicinity.

Recombination evolutionary operators

They combine decision variables from two or more solutions to create new solutions.

Selection evolutionary operators

They determine which solutions are allowed to proceed to the next cycle.

Executioner

A cross-language automation tool for running models. (See also Project Platypus).

Exploratory Modeling and Analysis (EMA)

A research methodology that uses computational experiments to analyze complex and uncertain systems.⁹

Data-driven exploratory modeling

Used to reveal implications of a data set by searching through an ensemble of models for instances that are consistent with the data.

Question-driven exploratory modeling

Searches over an ensemble of models believed to be plausible to answer a question of interest or illuminate policy choices.

Model-driven exploratory modeling

Investigates the properties of an ensemble of models without reference to a data set or policy question. It is rather a theoretical investigation into the properties of a class of models.

Feasible region

The set of all decision variables in the decision space that are feasible (i.e. satisfy all constraints).⁵

GDE3

Generational algorithms

A class of MOEAs that replace the entire population during each full mating, mutation, and selection iteration of the algorithm.⁵ (See also Steady-state algorithms).

Generational distance (performance metric)

The average distance from every solution in the approximation set to the nearest solution in the reference set.⁴

Gini index

A generalization of the binomial variance used in Classification and Regression Trees (CART). (See also Classification and Regression Trees (CART)).

High performance computing

Hypervolume (performance metric)

The volume of the space dominated by the approximation set.⁴

Inverted generational distance (performance metric)

The average distance from every solution in the reference set to the nearest solution in the approximation set.⁴

A free desktop application for producing and sharing high-dimensional, interactive scientific visualizations. (See also Project Platypus).

Kernel density estimation

Latin Hypercube Sampling (LHS)

Stratified technique used to generate samples of parameter values.

Markov chain

Method of moments

MOEA Framework

A free and open source Java library for developing and experimenting with multi-objective evolutionary algorithms and other general-purpose optimization algorithms.⁴

Monte Carlo

Morris method

NSGA-II

The Non-dominated Sorting Genetic Algorithm-II. MOEA featuring elitism, efficient non-domination sorting, and parameter free diversity maintenance.¹⁰

ε-NSGA-II

A generational algorithm that uses e-dominance archiving, adaptive population sizing and time continuation.

Number of function evaluations (NFE)

Objectives

The criteria used to compare solutions in an optimization problem.

OMOPSO

A particle swarm optimization algorithm—the first to include e-dominance as a means to solve many-objective problems.¹¹

Optimization

The process of identifying the best solution (or a set of best solutions) among a set of alternatives.

Multi-objective optimization

Multi-objective optimization employs two or more criteria to identify the best solution(s) among a set of alternatives

Intertemporal optimization

Parallel computing

Parametric generator

Pareto optimality

The notion that a solution is superior or inferior to another solution only when it is superior in all objectives or inferior in all objectives respectively.

Pareto dominance

A dominating solution is superior to another in all objectives. A dominated solution is inferior to another in all objectives. A non-dominated solution is superior in one objective but inferior in another.

Pareto front

Contains the objective values of all non-dominated solutions (in the objective function space).

Pareto optimal set

Contains the decision variables of all non-dominated solutions (in the decision variable space).

Particle swarm optimization

Population-based stochastic optimization technique where the potential solutions, called particles, fly through the problem space by following the current optimum particles.

Patient Rule Induction method (PRIM)

A rule induction algorithm.

Performance metrics

Procedures used to compare the performance of approximation sets.

Pointer

Population

The set of encoded solutions that are manipulated and evaluated during the application of an evolutionary algorithm.

Principle Component Analysis (PCA)

Project Platypus

A Free and Open Source Python Library for Multiobjective Optimization. For more information see: https://github.com/Project-Platypus

Radial basis function

Reference set

The set of globally optimal solutions in an optimization problem.

Rhodium

Python Library for Robust Decision Making and Exploratory Modelling. (See also Project Platypus).

Robust Decision Making (RDM)

An analytic framework that helps identify potential robust strategies for a particular problem, characterize the vulnerabilities of such strategies, and evaluate trade-offs among them.¹²

Multi-objective robust decision making (MORDM)

An extension of Robust Decision Making (RDM) to explicitly include the use of multi-objective optimization to discover robust strategies and explore the trade-offs among multiple competing performance objectives.¹³

OpenMORDM

An open source implementation of MORDM with the tools necessary to perform a complete MORDM analysis.¹⁴For more information see: https://github.com/OpenMORDM

Safe operating space

SALib

Seeding

Sobol sampling

Spacing (performance metric)

The uniformity of the spacing between the solutions in an approximation set.

SPEA2

MOEA that assigns a fitness value to each solution based on the number of competing solutions it dominates.

State of the world

A fundamental concept in decision theory which refers to a feature of the world that the agent/decision maker has no control over and is the origin of the agent’s uncertainty about the world.

Steady-state algorithms

A class of MOEAs that only replace one solution in the population during each full mating, mutation, and selection iteration of the algorithm. (See also Generational algorithms).

Time continuation

The injection of new solutions in the population to reinvigorate search.

Tournament

The set of candidate solutions selected randomly from a population.

Trace

Visual analytics

The rapid analysis of large datasets using interactive software that enables multiple connected views of planning problems.

More information on the concepts

Haario, H., Saksman, E. & Tamminen, J. An adaptive Metropolis algorithm. Bernoulli 7, 223–242 (2001).
Akaike, H. Akaike’s information criterion. in International Encyclopedia of Statistical Science 25–25 (Springer, 2011).
Vrugt, J. A. & Robinson, B. A. Improved evolutionary optimization from genetically adaptive multimethod search. Proc. Natl. Acad. Sci. 104, 708–711 (2007).
Hadka, D. Beginner’s Guide to the MOEA Framework. (CreateSpace Independent Publishing Platform, 2016).
Coello, C. A. C., Lamont, G. B. & Van Veldhuizen, D. A. Evolutionary algorithms for solving multi-objective problems. 5, (Springer, 2007).
Hadka, D. & Reed, P. Borg: An Auto-Adaptive Many-Objective Evolutionary Computing Framework. Evol. Comput. 21, 231–259 (2012).
Breiman, L. Classification and Regression Trees. (Wadsworth International Group, 1984).
Reed, P. M., Hadka, D., Herman, J. D., Kasprzyk, J. R. & Kollat, J. B. Evolutionary multiobjective optimization in water resources: The past, present, and future. Adv. Water Resour. 51, 438–456 (2013).
Bankes, S. Exploratory Modeling for Policy Analysis. Oper. Res. 41, 435–449 (1993).
Deb, K., Pratap, A., Agarwal, S. & Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. Evol. Comput. IEEE Trans. On 6, 182–197 (2002).
Sierra, M. R. & Coello, C. C. Improving PSO-based multi-objective optimization using crowding, mutation and e-dominance. in Evolutionary multi-criterion optimization 3410, 505–519 (Springer, 2005).
Lempert, R. J., Groves, D. G., Popper, S. W. & Bankes, S. C. A general, analytic method for generating robust strategies and narrative scenarios. Manag. Sci. 52, 514–528 (2006).
Kasprzyk, J. R., Nataraj, S., Reed, P. M. & Lempert, R. J. Many objective robust decision making for complex environmental systems undergoing change. Environ. Model. Softw. (2013). doi:10.1016/j.envsoft.2012.12.007
Hadka, D., Herman, J., Reed, P. & Keller, K. An open source framework for many-objective robust decision making. Environ. Model. Softw. 74, 114–129 (2015).

May 12, 2017 by David Gold

An Introduction To Econometrics: Part 1- Ordinary Least Squares Regression

I took a PhD level econometrics course this semester in the Applied Economics and Management department here at Cornell and I thought I’d share some of what I learned. Overall, I enjoyed the course and learned a great deal. It was very math and theory heavy, but the Professor Shanjun Li did a nice job keeping the class lively and interesting. I would recommend the class to future EWRS students who may be looking for some grounding in econometrics, provided they’ve taken some basic statics and linear algebra courses.

So lets start with the basics, what does the term “econometrics” even mean? Hansen (2010) defined econometrics as “the unified study of economic models, mathematical statistics and economic data”. After taking this introductory course, I’m inclined to add my own definition: econometrics is “a study of the problems with regression using Ordinary Least Squares (OLS) and how to solve them”. This is obviously a gross oversimplification of the field, however, regression through OLS was the primary tool used for finding insights and patterns within data, and we spent the vast majority of the course examining it. In this post I’ll briefly summarize OLS mechanics and classical OLS assumptions. In my next post, I’ll detail methods for dealing with violations of OLS assumptions. My hope is that reading this may help you understand some key terminology and the reasoning behind why certain econometric tools are employed.

OLS mechanics

Our primary interest when creating an econometric model is to estimate some dependent variable, y, using a observations from a set of independent variables, X. Usually y is a vector of length n, where n is the number of observations, and X is a matrix of size (n x k) where k is the number of explanatory variables (you can think of X as a table of observations, where each column contains a different variable and each row represents an observation of that variable). The goal of OLS regression is to estimate the coefficients, beta, for the model:

$y = X\beta+\epsilon$

Where beta is a k by 1 vector of coefficients on X and epsilon is a k by 1 vector of error terms.

OLS regression estimates beta by minimizing the sum of the square error term (hence the name “least squares”). Put in matrix notation, OLS estimates beta using the equation:

$\hat{\beta} = argmin_{\beta} SSE_N(\beta) = \epsilon ' \epsilon$

The optimal beta estimate can be found through the following equations:

$\epsilon = y-X\hat{\beta}$

$\epsilon ' \epsilon = (y-X\hat{\beta})'(y-X\hat{\beta})$

Taking the derivative and setting it equal to zero:

$2X'y+2X'X\hat{\beta} = 0$

Then solving for the beta estimate:

$\hat{\beta} = (X'X)^{-1}X'y$

Estimation of y using OLS regression can be visualized as the orthogonal projection of the vector y onto the column space of X. The estimated error term, epsilon, is the orthogonal distance between the projection and the true vector y. Figure 1 shows this projection for a y that is regressed on two explanatory variables, X1 and X2.

Figure 1: OLS regression as an orthogonal projection of vector y onto the column space of matrix X. The error term, $\hat{\epsilon}$ , is the orthogonal distance between y and $X\hat{\beta}$ . (Image source: Wikipedia commons)

Assumptions and properties of OLS regression

The Gauss-Markov Theorem states that under a certain set of assumptions, the OLS estimator is the Best Linear Unbiased Estimator (BLUE) for vector y.

To understand the full meaning of the Gauss-Markov theorem, it’s important to define two fundamental properties that can be used to describe estimators, consistency and efficiency. An estimator is consistent if its value will converge to the true parameter value as the number of observations goes to infinity. An estimator is efficient if its asymptotic variance is no larger than the asymptotic variance of any other possible consistent estimator for the parameter. In light of these definitions, the Gauss-Markov Theorem can be restated as: estimators found using OLS will be the most efficient consistent estimator for beta as long as the classical OLS assumptions hold. The remainder of this post will be devoted to describing the necessary assumptions for the OLS estimator to be the BLUE and detailing fixes for when these assumptions are violated.

The four classical assumptions for OLS to be the BLUE are:

Linearity: The relationship between X and y is linear, following the functional form:

$y = X\beta+\epsilon$ .

2. Strict exogeneity: The error $\epsilon$ terms should be independent of the value of the explanatory variables, X. Put in equation form, this assumption requires:

$E(\epsilon_i|X) = 0$

$E(\epsilon_i) =0$

3. No perfect multicollinearity: columns of X should not be correlated with each other (see my earlier post on dealing with mulitcollinearity for fixes for violations of this assumption).

4. Spherical Error: Error terms should be homoskedastic, meaning they are evenly distributed around the X values. Put in equation form:

$E(\epsilon_i^2|X) =\sigma^2$

Where $\sigma^2$ is a constant value.

$E(\epsilon_i \epsilon_j|X)=0$

Using assumption 4, we can define the variance of $\hat{\beta}$ as:

$var(\hat{\beta}_{OLS}) = \sigma^2(X'X)^{-1}$

If assumptions 1-4 hold, then the OLS estimate for beta is the BLUE, if however, any of the assumptions are broken, we must employ other methods for estimating our regression coefficients.

In my next post I’ll detail the methods econometricians use when these assumptions are violated.

References:

Hansen, Bruce. “Econometrics”. 2010. University of Wisconsin

Click to access Econometrics2010.pdf