Ensemble forecasting – theory

‘Does the flap of a butterfly’s wing in Brazil stir up a tornado in Texas?’ These words reflect the so-called ‘butterfly effect’ typically ascribed to Edward Lorenz (MIT), who was a pioneer in the field of numerical weather prediction (NWP). (Interestingly, Lorenz himself first referred to the ‘flap of a seagulls’ wing’; the butterfly swap out was the work of a creative conference session convener. Somewhat coincidentally, the shape of the ‘Lorenz attractor’ is also reminiscent of a butterfly, see Figure 1). Motivated by the very salient challenge in the post-WWII era to improve weather prediction capabilities for strategic purposes, he developed a theoretical framework for the predictability of non-linear dynamical systems in a seminal paper, ‘Deterministic nonperiodic flow’ (Lorenz, 1963), that would come to be known as ‘chaos theory’ or ‘chaotic dynamics’. This theory was foundational to the development of ensemble forecasting (and a whole host of other things), and still underpins our current theoretical understanding of the inherent predictability of complex dynamical systems like the atmosphere (Lorenz, 2006).


Figure 1. Book cover (left panel) and the characteristic ‘butterfly wing’ shape of the Lorenz attractor (right panel) from his seminal 1963 work. (Palmer, 1993)

In actuality, the whole ‘flap of a butterfly’s wing creating a tornado in Texas’ relates specifically to only one aspect of the theoretical framework developed by Lorenz. In this post, I will attempt to give a slightly fuller treatment of this theoretical framework and its development. My hope is that this post will be a theoretical primer for two follow-on practically oriented posts: 1) a discussion of the state-of-the-science of hydrometeorological ensemble forecasting and its application in emerging water resources management practices like forecast informed reservoir operations, and 2) a deeper dive into ensemble verification techniques. While the techniques for (2) are grounded in the theoretical statistical properties of stochastic-dynamic predictive ensembles, they have broad applications to any scenario where one needs to diagnose the performance of an ensemble.

The Lorenz model and ‘chaos’ theory

When most people hear the word ‘chaos’, they tend to think of the dictionary definition: ‘complete disorder and confusion’ (Oxford), which in a scientific context might aptly describe some sort of white noise process. As you will see, this is somewhat of a far cry from the ‘chaos’ described in Lorenz’s theory. The ‘chaos’ terminology was actually coined by a later paper building on Lorenz’s work (Li & Yorke, 1975) and as noted by Wilks (2019): ‘…this label is somewhat unfortunate in that it is not really descriptive of the sensitive-dependence phenomenon’. The sensitive-dependence phenomenon is one of a set of 4 key properties (to be discussed later) that characterize the behaviors of the sort of non-linear, non-periodic deterministic systems that Lorenz argued were most representative of the atmosphere. In contrast to ‘disorder’ and ‘confusion’, these properties, in fact, lend some sense of order to the evolution of these dynamical systems, although the nature of that order is highly dependent on the system state.

A deterministic, non-periodic systems model

First, let’s dive into a few details of Lorenz’s 1963 work, using some figures and descriptions from a later paper (Palmer, 1993) that are easier to follow. While the 1963 paper is quite dense, much of the mathematical logic is dedicated to justifying the use of a particular system of equations that forms the basis of the study. This system of 3 equations and 3 variables (X, Y, Z) describes a non-linear, dissipative model of convection in a fluid of uniform depth, where there is a temperature difference between the upper and lower surfaces. Lorenz derived the set of 3 equations shown in the upper left panel of Figure 2 from earlier work by Rayleigh (1916) on this particular problem. In short, X relates to the intensity of convective motion, Y relates to the temperature difference between ascending and descending currents, and Z relates to the distortion of the vertical temperature profile from linearity; the details of these variables are not actually all that important to the study. What is important is that this system of equations has no general solutions (aside from the steady state solution) and must be numerically integrated in the time dimension to determine the convective flow dynamics. The ‘trajectories’ of these integrations shown in the phase space diagrams in Figure 2 exhibit the sorts of unstable, non-periodic behaviors that Lorenz thought were the most useful analog to atmospheric dynamics, albeit in a much simpler system. (‘Much’ is an understatement here; modern NWP models have a phase space on the order of 109 in contrast to the 3-dimensional phase space of this problem, Wilks, 2019. Of note, the use of phase-space diagrams (i.e. plots where each axis corresponds to a dependent variable in the system of equations) preceded Lorenz, but his use of them is perhaps one of the best-known early instances of this kind of visualization. Other uses of phase space relationships can be found in Rohini’s post or Dave’s post)

Figure 2. a) Lorenz equations and the XY phase space diagram of their integrations. b-c) X variable timeseries of two separate integrations of the Lorenz equations from very similar initial conditions. (Palmer, 1993)

Regime structure and multiple distinct timescales

What ‘behaviors’ then, are we talking about? Referring again to Figure 2a, we see the projection of a long series of model integration trajectories onto the XY-plane of the phase space diagram, where each dot is a timestep in the integration. What is immediately apparent in the form of these integrations is the two lobed ‘butterfly’ shape of the diagram. Each lobe has a central attractor where the trajectories often reside for multiple revolutions, but can then transition to the other attractor when passing near the center of the diagram. These so-called ‘Lorenz attractors’ comprise one of the key properties of chaotic dynamics, namely regime structure, which is tendency of the trajectories to reside around phase space attractors for some period of time. This residence time in a regime is generally quite a bit longer than the timescales of the trajectories’ individual revolutions around an attractor. This attribute is referred to as multiple distinct timescales and is evidenced in Figure 2b-c, where smaller amplitude sections of the timeseries show residence in one or the other regime and large amplitude sections describe transitions between regimes. Often, but not always, the residence in these regimes occurs for multiple revolutions, suggesting that there are shorter timescale evolutions of the system that take place within these regimes, while infrequent, random shifts to the other regimes occur at longer timescales.

Sensitive-dependence and state-dependent variation in predictability

Figure 3. a-c) Different trajectories through the XY phase space dependent on the initial condition state-space (black circle). (Palmer, 1993)

Returning now to the ‘butterfly effect’; what, then, is this whole sensitive-dependence thing mentioned earlier? Figure 3a-c provide a nice graphical representation of this phenomenon. In each panel, a set of nearby initial states are chosen at different location in the phase space diagram and then are followed through their set of trajectories. In 3a, these trajectories neatly transition from one regime to the other, remaining relatively close to each other at the trajectories’ end. In 3b, a set of initial states not so far from 3a are chosen and instead of neatly transitioning to the other regime, they diverge strongly near the center of the diagram, with some trajectories remaining in the original regime, and some transitioning. However, for about half of the timespan, the trajectories remain very close to one another. Finally, an initial state chosen near the center of the diagram (3c) diverges quickly into both regimes, with trajectories ending up at nearly opposite ends of the phase space (black tails at upper right and left of diagram). Figures b-c, in particular, showcase the sensitive-dependence on initial conditions attributes of the system. In other words, from a set of very close-by initial states, the final trajectories in phase space can yield strongly divergent results. Importantly, this divergence in trajectories over some period of time can occur right away (3c), at some intermediate time (3b), or not at all (3a).

This is the basic idea behind the last core property of these systems, state-dependent variation in predictability. Clearly, a forecast initialized from the starting point in 3a could be a little bit uncertain about the exact starting state and still end up in about the right spot for an accurate future prediction at the end of the forecast horizon. At medium ranges, this is also the case for 3b, but the longer range forecast is highly uncertain. For 3c, all forecast ranges are highly uncertain; in other words, the flap of a butterfly’s wing can mean the difference between one or the other trajectory! Importantly, one can imagine in this representation that an average value of 3c’s trajectories (i.e. the ensemble mean) would fall somewhere in the middle of the phase space and be representative of none of the two physically plausible trajectories into the right or left regimes. This is an important point that we’ll return to at the end of this post.

From the Lorenz model to ensemble forecasting

The previous sections have highlighted this idealized dynamical system (Lorenz model) that theoretically has properties of the actual atmosphere. Those 4 properties (the big 4!) were: 1) sensitive-dependence on initial conditions, 2) regime structure, 3) multiple distinct timescales, and 4) state-dependent variation in predictability. In this last section, I will tie these concepts into a theoretical discussion of ensemble forecasting. Notably, in the previous sections, I discussed ‘timescales’ without any reference to the real atmosphere. The dynamics of the Lorenz system prove to be quite illuminating about the actual atmosphere when timescales are attached to the system that are roughly on the order of the evolution of synoptic scale weather systems. If one assumes that a lap around the Lorenz attractor equates to a synoptic timescale of ~5 days, then the Lorenz model can be conceptualized in terms of relatively homogenous synoptic weather progressions occurring within two distinct weather regimes that typically persist on the order of weeks to months (see Figure 3b-c). This theoretical foundation jives nicely with the empirically derived weather-regime based approach (see Rohini’s post or Nasser’s post) where incidentally, 4 or more weather regimes are commonly found (The real atmosphere is quite a bit more complex than the Lorenz model after all). This discussion of the Lorenz model has hopefully given you some intuition that conceptualizing real world weather progressions outside the context of an appropriate regime structure could lead to some very unphysical representations of the climate system.

Ensemble forecasting as a discrete approximation of forecast uncertainty

In terms of weather prediction, though, the big 4 make things really tough. While there are certainly conditions in the atmosphere that can lead to excellent long range predictability (i.e. ‘forecasts of opportunity’, see Figure 3a), the ‘typical’ dynamics of the atmosphere yield the potential for multiple regimes and associated transitions within the short-to-medium range timeframe (1-15 days) where synoptic-scale hydro-meteorological forecasting is theoretically possible. (Note, by ‘synoptic-scale’ here, I am referring to the capability to predict actual weather events, not prevailing conditions. Current science puts the theoretical dynamical limit to predictability at ~2 weeks with current NWP technology achieving ‘usable’ skill out to ~7 days, give or take a few dependent on the region)

Early efforts to bring the Lorenz model into weather prediction sought to develop analytical methods to propagate initial condition uncertainty into a useful probabilistic approximation of the forecast uncertainty at various lead times. This can work for simplified representation of the atmosphere like the Lorenz model, but quickly becomes intractable as the complexity and scale of the governing equations and boundary conditions increases.

Figure 4. Conceptual depiction of ensemble forecasting including sampling of initial condition uncertainty, forecasts at different lead times, regime structure, and ensemble mean comparison to individual ensemble members. Solid arrow represents a ‘control’ member of the ensemble. Histograms represent an approximate empirical distribution of the ensemble forecast. (Wilks, 2019)

Thankfully, rapid advances in computing power led to an alternate approach, ensemble forecasting! Ensemble forecasting is a stochastic-dynamic approach that couples a discrete, probabilistic sampling of the initial condition uncertainty (Figure 4 ‘Initial time’) and propagates each of those initial condition state vectors through a dynamical NWP model. Each of these NWP integrations is a unique trajectory through state space of the dynamical model that yields a discrete approximation of the forecast uncertainty (Figure 4 ‘Intermediate/Final forecast lead time’). This discrete forecast uncertainty distribution theoretically encompasses the full space of potential hydro-meteorologic trajectories and allows a probabilistic representation of forecast outcomes through analysis of the empirical forecast ensemble distribution. These forecast trajectories highlight many of the big 4 properties discussed in previous sections, including regime structure and state-dependent predictability (Figure 4 trajectories are analogous to the Figure 3b trajectories for the Lorenz model). The ensemble mean prediction is an accurate and dynamically consistent prediction at the intermediate lead time, but at the final lead time where distinct regime partitioning has occurred, it is no longer dynamically consistent and delineates a region of low probability in the full ensemble distribution. I will explore properties of ensemble averaging, both good and bad, in a future post.

Lastly, I will note that the ensemble forecasting approach is a type of Monte Carlo procedure. Like other Monte Carlo approaches with complex systems, the methodology for sampling the initial condition uncertainty has a profound effect on the uncertainty quantification contained within the final NWP ensemble output, especially when considering the high degree of spatiotemporal relationships within the observed climatic variables that form the initial state vector. This is a key area of continued research and improvement in ensemble forecasting models.

Final thoughts

I hope that you find this discussion of the theoretical underpinnings of chaotic dynamics and ensemble forecasting to be useful. I have always found these foundational concepts to be particularly fascinating. Moreover, the basic theory has strong connections outside of ensemble forecasting, including the ties to weather regime theory mentioned in this post. I also think these foundational concepts are necessary to understand how much actual ensemble forecasting techniques can diverge from the theoretical stochastic-dynamic framework. This will be the subject of some future posts that will delve into more practical and applied aspects of ensemble forecasting with a focus on water resources management.

Reference

Lorenz, E. N. (1963). Deterministic nonperiodic flow. Journal of the Atmospheric Sciences, 20, 130–141.

Lorenz, E. N. (2006). Reflections on the Conception , Birth , and Childhood of Numerical Weather Prediction. Annual Review of Earth and Planetary Science, 34, 37–45. https://doi.org/10.1146/annurev.earth.34.083105.102317

Palmer, T. N. (1993). Extended-range atmospheric prediction and the Lorenz model. Bulletin – American Meteorological Society, 74(1), 49–65. https://doi.org/10.1175/1520-0477(1993)074<0049:ERAPAT>2.0.CO;2

Rayleigh, L. (1916). On convective currents in a horizontal layer when the higher temperature is on the underside. Phil. Mag., 32, 529-546.

Wilks, D. S., (2019). Statistical Methods in the Atmospheric Sciences, 4th ed. Cambridge, MA: Elsevier.

Root finding algorithm basics

Motivation

Finding the zero(s), or “root(s)”, of an equation is often necessary in engineering or mathematical problems. Fortunately for us, there are a suite of different pre-packaged root finding algorithms available for different programming languages. Julie Quinn wrote a blog post in 2016 summarizing some of the root finding tools available in MATLAB, R, Python, and C++ which can be found here.

Although these tools make the process of finding the root of an equation quick and easy, it is important for users to have some basic knowledge regarding the numerical methods being used.

In this post, I will provide background on the fundamentals of some common root finding algorithms, point out the strengths and weaknesses of the different algorithms, and highlight some points that you might want to take into consideration when using these algorithms. Ultimately, the goal here is to encourage more informed use of well-established tools.

The Bisection Method

The bisection method is a simple bracketing method. The start and end points of a range are specified, [a, b], and if the signs of the value of the function are opposite (i.e., sign(f(a)) ≠ sign(f(b))), then the intermediate value theorem guarantees the existence of a root within that range. The range is sequentially split in half, or bisected, and the sign of the midpoint function value determines which half of the range is used in the subsequent iteration.

The algorithm generally follows:

The simplest version of this algorithm, implemented in Python follows:

import numpy as np

def Bisect(Function, range_start, range_end, maxNFE = 1000, tolerance = 10**-6):
 
    A = range_start
    B = range_end
    
    # Check that range bounds have alternate signs
    if np.sign(Function(A)) == np.sign(Function(B)):
        print("Start and end values for the range must correspond to function values of different sign.")
        
    RootRange = [A, B]
    
    # Perform Bisection method until tolerance is achieved
    NFE = 0
    spread = RootRange[1] - RootRange[0]
        
    while (spread > tolerance) and (NFE < maxNFE):
        
        MidPoint = (RootRange[1] + RootRange[0])/2
            
        if Fluxes(MidPoint == 0):
            Root = MidPoint
            NFE += 1
            return Root, NFE
            
        elif np.sign(Function(MidPoint)) != np.sign(Function(RootRange[1])):
            RootRange[0] = MidPoint
                
        else:
            RootRange[1] = MidPoint
            
        NFE += 1        
        spread = RootRange[1] - RootRange[0]
        
    Root = (RootRange[1] + RootRange[0])/2
    
    return Root, NFE

The Newton-Raphson Method

The Newton-Raphson method is an iterative root finding approach which uses information about the value and derivative of the function to produce subsequently better approximations of the root. The user must provide some initial guess of the root. If the value of the function inadequately far from zero, then the slope of the function at that point is used to determine the next estimate value.

The mathematic representation of the algorithm follows:

The use of the derivative makes this algorithm much quicker than the bisection method. The analytical derivative can either be provided to the algorithm, or an estimation can be used. Here, I use use a simple estimation of the derivative by calculating the slope of the function across some very small step size.

def df(Function, x , h = 10**-6):
    
    df = (Function(x+h) - Function(x))/h
    
    return df

With the derivative estimation function ready, I can implement the Newton-Raphson method:

def NewtonRaphson(Function, x0, maxNFE = 1000, tol = 10**-10):
    
    NFE = 0

    while abs(Function(x0)) > tol and NFE < maxNFE:
        NFE += 1
        derivative = df(Function, x0, h = 10**-6)
        
        x0 = x0 - Function(x0)/derivative
        
    return x0, NFE

When the Newton-Raphson method does converge to a solution, it converges at a quadratic rate, which is much quicker than the linear convergence of the bisection method. When choosing to implement a root finding technique, it is important to have some understand of that technique’s unique strengths and weakness and choose an algorithm accordingly. For example, while the Newton-Raphson method is quicker than the bisection method, it does not provide guaranteed convergence to the intended root in the way that the bisection method does.

An example in custom root finding algorithms

Here, I would like to provide an example of how to implement a custom root finding algorithm, to demonstrate some challenges that can occur when using generic root finding algorithms, and how custom algorithms can help avoid potential pitfalls.

The shallow lake problem [1, 2], which has been used as an example in many other posts on this site, is a great example of a dynamic environmental system with multiple equilibria of interest. The problem centers around a shallow lake which receives both natural, Yt~LN(μ,σ2), and anthropogenic, at, phosphorous (P) pollution inputs, along with a natural recycling of the P back into the water column characterized by the shape parameter q. The lake experiences some natural P removal at rate b. The combination of these inputs and outputs results in the presence of three different equilibrium states, where the inputs are equal to the outputs. The figure below (Quinn, 2016) helps to illustrate this behavior.

Figure generated by Prof. Julie Quinn (2016).

Notice that of the three equilibrium states, two are stable and one is unstable. The first stable equilibrium, the oligotrophic equilibrium, is the preferred state and will result in healthy ecosystem conditions. The unstable equilibrium presents a dangerous tipping point, referred to as the critical P threshold, beyond which the lake will irreversible transition into an unhealthy, stable eutrophic equilibrium.

When considering pollution control policies, it is critical that the analyst identify the unstable, critical P threshold so that it can be avoided!

For the purposes of this example, I will consider the lake dynamics under the influence of only the natural removal and recycling, and make assumptions regarding these parameter values (b = 0.42, q = 2.0). The net flux of P experienced by the lake can be defined by the function:

def Fluxes(x):
    # Set parameters
    b = 0.42
    q = 2
    flux = (x**q) / (1 + x**q) - b*x
    return flux

Let’s use the Newton-Raphson root solver defined above to try and find the critical P threshold. In the case that I do not have prior knowledge of the critical P threshold, it seems reasonable to begin the Newton-Raphson algorithm with an arbitrary initial guess of 0.25.

Root, NFE = NewtonRaphson(Fluxes, x0 = 0.25)

From the above, I find that:
Root = 1.836

Just out of curiosity, I would like to test my Newton-Raphson function with a different initial estimate… if I place my initial estimate closer to the previously found root, then I should find the same root again, right?

Root, NFE = NewtonRaphson(Fluxes, 1.0)

Root = 0.0

In fact, the Newton-Raphson approach can result in some very unexpected convergence behavior, and might result in the convergence to a wildly different root if not used carefully.

For the sake of completeness, I will use the method once more with an initial guess between my first two:

Root, NFE = NewtonRaphson(Fluxes, 0.75)

Root = 0.5445

Finally, I find the unstable tipping point that I have been looking for. However, this process has revealed how unpredictable the convergence of this method can be.

The figure below shows which equilibria was found for different initial guesses of the root. The color of the horizontal line shows which of the three equilibria will be found given that value as the initial estimate.

This is where one might recognize the need to construct a unique algorithm better suited for this analysis. With regard to the lake problem, I want to be certain that I am identifying the correct critical P threshold. To do this, I will design an algorithm which is capable of finding all roots of the flux equation as well as identify the stability condition of each found equilibria.

The bisection method is preferable for this purpose because the bracketing method will avoid the unpredictable convergence patterns of the Newton-Raphson method, and can guarantee that I find every root within the specified range.

I’ve created a new algorithm, building off of the basic Bisect() function included above, which take the entire range of interest and first discretizes the range into a set of subspaces. It then checks the signs of the function value at the boundary of each subspace to identify those subspaces which have opposite boundary value signs, and thus identifies each subspace containing a zero.

After identify the number and subspaces containing zeros, the algorithm performs the traditional bisection method as shown above. Lastly, I’ve included the evaluation of stability conditions once each root has been found within the specified tolerance.

def ManyRootBisect(Function, range_start, range_end, step_size, maxNFE = 1000, tolerance = 10**-6):
 
    A = range_start
    B = range_end
    
    # Check that range bounds have alternate signs
    if np.sign(Function(A)) == np.sign(Function(B)):
        print("Start and end values for the range must correspond to function values of different sign.")
        
    # Discretize the range space
    n = np.floor((B-A)/step_size)
    nodes = np.linspace(A, B, int(n), endpoint = True)
    
    # Count sign-changes (roots) and record ranges where roots exist
    nRoots = 0
    RootRange = []
    
    for i in list(range(len(nodes)-1)):
        
        if np.sign(Function(nodes[i])) != np.sign(Function(nodes[i+1])):
            nRoots += 1
            AddRange = [nodes[i], nodes[i+1]]
            RootRange.append(AddRange)
    
    # Perform Bisection method on each range with a root
    
    Roots = []
    NFE = 0

    for r in list(range(nRoots)):
        
        spread = RootRange[r][1] - RootRange[r][0]
        
        while (spread > tolerance) and (NFE < maxNFE):
        
            MidPoint = (RootRange[r][1] + RootRange[r][0])/2
            
            if Fluxes(MidPoint == 0):
                
                RootRange[r][0] = MidPoint
                RootRange[r][1] = MidPoint
            
            elif np.sign(Function(MidPoint)) != np.sign(Function(RootRange[r][1])):
                RootRange[r][0] = MidPoint
                
            else:
                RootRange[r][1] = MidPoint
                
            NFE += 1
            spread = RootRange[r][1] - RootRange[r][0]
    
    for i in list(range(nRoots)):
        
        Root = (RootRange[i][1] + RootRange[i][0])/2
        
        Roots.append(Root)
    
    # Check the stability of each root
    Stability = []
    for i in list(range(nRoots)):
        
        if ((Function(Roots[i] + tolerance)) < 0):
            Stability.append("Stable")
        else:
            Stability.append("Unstable")
        
    return Roots, Stability, nRoots, NFE

The above function is much more robust, and is capable of identify each of the three roots and their stability conditions.

Roots = [0.0, 0.5445, 1.8384]
Stability = [‘Stable’, ‘Unstable’, ‘Stable’]

The goal of this blog post was not to argue that any particular root finding algorithm is superior to any other. Rather, I hoped to encourage more informed use of these common tools.

P.s. For a very interesting and tangentially related exploration of how the Newton-Raphson method can produce fractal geometries, check out the video by 3Blue1Brown here.

References

[1] Carpenter, Stephen R., Donald Ludwig, and William A. Brock. “Management of eutrophication for lakes subject to potentially irreversible change.” Ecological applications 9.3 (1999): 751-771.

[2] Quinn, Julianne D., Patrick M. Reed, and Klaus Keller. “Direct policy search for robust multi-objective management of deeply uncertain socio-ecological tipping points.” Environmental Modelling & Software 92 (2017): 125-141.