Root finding algorithm basics

Motivation

Finding the zero(s), or “root(s)”, of an equation is often necessary in engineering or mathematical problems. Fortunately for us, there are a suite of different pre-packaged root finding algorithms available for different programming languages. Julie Quinn wrote a blog post in 2016 summarizing some of the root finding tools available in MATLAB, R, Python, and C++ which can be found here.

Although these tools make the process of finding the root of an equation quick and easy, it is important for users to have some basic knowledge regarding the numerical methods being used.

In this post, I will provide background on the fundamentals of some common root finding algorithms, point out the strengths and weaknesses of the different algorithms, and highlight some points that you might want to take into consideration when using these algorithms. Ultimately, the goal here is to encourage more informed use of well-established tools.

The Bisection Method

The bisection method is a simple bracketing method. The start and end points of a range are specified, [a, b], and if the signs of the value of the function are opposite (i.e., sign(f(a)) ≠ sign(f(b))), then the intermediate value theorem guarantees the existence of a root within that range. The range is sequentially split in half, or bisected, and the sign of the midpoint function value determines which half of the range is used in the subsequent iteration.

The algorithm generally follows:

The simplest version of this algorithm, implemented in Python follows:

import numpy as np

def Bisect(Function, range_start, range_end, maxNFE = 1000, tolerance = 10**-6):
 
    A = range_start
    B = range_end
    
    # Check that range bounds have alternate signs
    if np.sign(Function(A)) == np.sign(Function(B)):
        print("Start and end values for the range must correspond to function values of different sign.")
        
    RootRange = [A, B]
    
    # Perform Bisection method until tolerance is achieved
    NFE = 0
    spread = RootRange[1] - RootRange[0]
        
    while (spread > tolerance) and (NFE < maxNFE):
        
        MidPoint = (RootRange[1] + RootRange[0])/2
            
        if Fluxes(MidPoint == 0):
            Root = MidPoint
            NFE += 1
            return Root, NFE
            
        elif np.sign(Function(MidPoint)) != np.sign(Function(RootRange[1])):
            RootRange[0] = MidPoint
                
        else:
            RootRange[1] = MidPoint
            
        NFE += 1        
        spread = RootRange[1] - RootRange[0]
        
    Root = (RootRange[1] + RootRange[0])/2
    
    return Root, NFE

The Newton-Raphson Method

The Newton-Raphson method is an iterative root finding approach which uses information about the value and derivative of the function to produce subsequently better approximations of the root. The user must provide some initial guess of the root. If the value of the function inadequately far from zero, then the slope of the function at that point is used to determine the next estimate value.

The mathematic representation of the algorithm follows:

The use of the derivative makes this algorithm much quicker than the bisection method. The analytical derivative can either be provided to the algorithm, or an estimation can be used. Here, I use use a simple estimation of the derivative by calculating the slope of the function across some very small step size.

def df(Function, x , h = 10**-6):
    
    df = (Function(x+h) - Function(x))/h
    
    return df

With the derivative estimation function ready, I can implement the Newton-Raphson method:

def NewtonRaphson(Function, x0, maxNFE = 1000, tol = 10**-10):
    
    NFE = 0

    while abs(Function(x0)) > tol and NFE < maxNFE:
        NFE += 1
        derivative = df(Function, x0, h = 10**-6)
        
        x0 = x0 - Function(x0)/derivative
        
    return x0, NFE

When the Newton-Raphson method does converge to a solution, it converges at a quadratic rate, which is much quicker than the linear convergence of the bisection method. When choosing to implement a root finding technique, it is important to have some understand of that technique’s unique strengths and weakness and choose an algorithm accordingly. For example, while the Newton-Raphson method is quicker than the bisection method, it does not provide guaranteed convergence to the intended root in the way that the bisection method does.

An example in custom root finding algorithms

Here, I would like to provide an example of how to implement a custom root finding algorithm, to demonstrate some challenges that can occur when using generic root finding algorithms, and how custom algorithms can help avoid potential pitfalls.

The shallow lake problem [1, 2], which has been used as an example in many other posts on this site, is a great example of a dynamic environmental system with multiple equilibria of interest. The problem centers around a shallow lake which receives both natural, Yt~LN(μ,σ2), and anthropogenic, at, phosphorous (P) pollution inputs, along with a natural recycling of the P back into the water column characterized by the shape parameter q. The lake experiences some natural P removal at rate b. The combination of these inputs and outputs results in the presence of three different equilibrium states, where the inputs are equal to the outputs. The figure below (Quinn, 2016) helps to illustrate this behavior.

Figure generated by Prof. Julie Quinn (2016).

Notice that of the three equilibrium states, two are stable and one is unstable. The first stable equilibrium, the oligotrophic equilibrium, is the preferred state and will result in healthy ecosystem conditions. The unstable equilibrium presents a dangerous tipping point, referred to as the critical P threshold, beyond which the lake will irreversible transition into an unhealthy, stable eutrophic equilibrium.

When considering pollution control policies, it is critical that the analyst identify the unstable, critical P threshold so that it can be avoided!

For the purposes of this example, I will consider the lake dynamics under the influence of only the natural removal and recycling, and make assumptions regarding these parameter values (b = 0.42, q = 2.0). The net flux of P experienced by the lake can be defined by the function:

def Fluxes(x):
    # Set parameters
    b = 0.42
    q = 2
    flux = (x**q) / (1 + x**q) - b*x
    return flux

Let’s use the Newton-Raphson root solver defined above to try and find the critical P threshold. In the case that I do not have prior knowledge of the critical P threshold, it seems reasonable to begin the Newton-Raphson algorithm with an arbitrary initial guess of 0.25.

Root, NFE = NewtonRaphson(Fluxes, x0 = 0.25)

From the above, I find that:
Root = 1.836

Just out of curiosity, I would like to test my Newton-Raphson function with a different initial estimate… if I place my initial estimate closer to the previously found root, then I should find the same root again, right?

Root, NFE = NewtonRaphson(Fluxes, 1.0)

Root = 0.0

In fact, the Newton-Raphson approach can result in some very unexpected convergence behavior, and might result in the convergence to a wildly different root if not used carefully.

For the sake of completeness, I will use the method once more with an initial guess between my first two:

Root, NFE = NewtonRaphson(Fluxes, 0.75)

Root = 0.5445

Finally, I find the unstable tipping point that I have been looking for. However, this process has revealed how unpredictable the convergence of this method can be.

The figure below shows which equilibria was found for different initial guesses of the root. The color of the horizontal line shows which of the three equilibria will be found given that value as the initial estimate.

This is where one might recognize the need to construct a unique algorithm better suited for this analysis. With regard to the lake problem, I want to be certain that I am identifying the correct critical P threshold. To do this, I will design an algorithm which is capable of finding all roots of the flux equation as well as identify the stability condition of each found equilibria.

The bisection method is preferable for this purpose because the bracketing method will avoid the unpredictable convergence patterns of the Newton-Raphson method, and can guarantee that I find every root within the specified range.

I’ve created a new algorithm, building off of the basic Bisect() function included above, which take the entire range of interest and first discretizes the range into a set of subspaces. It then checks the signs of the function value at the boundary of each subspace to identify those subspaces which have opposite boundary value signs, and thus identifies each subspace containing a zero.

After identify the number and subspaces containing zeros, the algorithm performs the traditional bisection method as shown above. Lastly, I’ve included the evaluation of stability conditions once each root has been found within the specified tolerance.

def ManyRootBisect(Function, range_start, range_end, step_size, maxNFE = 1000, tolerance = 10**-6):
 
    A = range_start
    B = range_end
    
    # Check that range bounds have alternate signs
    if np.sign(Function(A)) == np.sign(Function(B)):
        print("Start and end values for the range must correspond to function values of different sign.")
        
    # Discretize the range space
    n = np.floor((B-A)/step_size)
    nodes = np.linspace(A, B, int(n), endpoint = True)
    
    # Count sign-changes (roots) and record ranges where roots exist
    nRoots = 0
    RootRange = []
    
    for i in list(range(len(nodes)-1)):
        
        if np.sign(Function(nodes[i])) != np.sign(Function(nodes[i+1])):
            nRoots += 1
            AddRange = [nodes[i], nodes[i+1]]
            RootRange.append(AddRange)
    
    # Perform Bisection method on each range with a root
    
    Roots = []
    NFE = 0

    for r in list(range(nRoots)):
        
        spread = RootRange[r][1] - RootRange[r][0]
        
        while (spread > tolerance) and (NFE < maxNFE):
        
            MidPoint = (RootRange[r][1] + RootRange[r][0])/2
            
            if Fluxes(MidPoint == 0):
                
                RootRange[r][0] = MidPoint
                RootRange[r][1] = MidPoint
            
            elif np.sign(Function(MidPoint)) != np.sign(Function(RootRange[r][1])):
                RootRange[r][0] = MidPoint
                
            else:
                RootRange[r][1] = MidPoint
                
            NFE += 1
            spread = RootRange[r][1] - RootRange[r][0]
    
    for i in list(range(nRoots)):
        
        Root = (RootRange[i][1] + RootRange[i][0])/2
        
        Roots.append(Root)
    
    # Check the stability of each root
    Stability = []
    for i in list(range(nRoots)):
        
        if ((Function(Roots[i] + tolerance)) < 0):
            Stability.append("Stable")
        else:
            Stability.append("Unstable")
        
    return Roots, Stability, nRoots, NFE

The above function is much more robust, and is capable of identify each of the three roots and their stability conditions.

Roots = [0.0, 0.5445, 1.8384]
Stability = [‘Stable’, ‘Unstable’, ‘Stable’]

The goal of this blog post was not to argue that any particular root finding algorithm is superior to any other. Rather, I hoped to encourage more informed use of these common tools.

P.s. For a very interesting and tangentially related exploration of how the Newton-Raphson method can produce fractal geometries, check out the video by 3Blue1Brown here.

References

[1] Carpenter, Stephen R., Donald Ludwig, and William A. Brock. “Management of eutrophication for lakes subject to potentially irreversible change.” Ecological applications 9.3 (1999): 751-771.

[2] Quinn, Julianne D., Patrick M. Reed, and Klaus Keller. “Direct policy search for robust multi-objective management of deeply uncertain socio-ecological tipping points.” Environmental Modelling & Software 92 (2017): 125-141.

Milton Friedman’s thermostat and sensitivity analysis of control policies

If you’re reading this blog, you probably already know that correlation does not necessarily imply causation. However, does a lack of correlation necessarily imply a lack of causation? Despite widespread misconception, even by Nobel Laureates, it does not. This is especially true for controlled systems, as explained in Milton Friedman’s thermostat example. The act of controlling a system for stability (e.g., a thermostat that expends energy to stabilize indoor air temperature) will tend to remove correlations between variables that are, in a certain sense, causally related (e.g., between energy expenditure and indoor temperature). More generally, the control of a dynamical system can introduce complex, path-dependent behavior that complicates the analysis of observed data using standard statistical techniques. This blog post will introduce these issues in the context of reservoir operations. I will then connect this to recent developments in using sensitivity analysis to improve understanding of how complex control policies respond to changing information over time, and demonstrate the benefits of using simulated state trajectories rather than random sampling approaches when analyzing controlled systems.

A Jupyter Notebook containing the Python code used for this analysis can be found in this GitHub repository.

Reservoir storage stabilization policy

First, consider a reservoir with a capacity of 20 MG, and daily inflows that average 10 MGD, with moderate persistence. Let’s assume the reservoir operator follows a “storage stabilization” policy that attempts to keep the storage constant at 10 MG. This simply requires the operator to release (S – S’ + I*) each day, where S is the current storage, S’ is the storage target, and I* is the projected inflow over the course of the next time step. How will this system evolve over time? With a perfect forecast, we have a situation analogous to Milton Friedman’s thermostat: the control policy keeps storage constant over time, leading it to become uncorrelated from both inflows and releases. Meanwhile, because storage is unvarying, the release depends only on inflows in a perfect linear relationship.

Figure 1: Time series of state variables for storage stabilization policy with perfect forecast.
Figure 2: Pairwise relationships between state variables for storage stabilization policy with perfect forecast.

The fact that inflow is uncorrelated with storage may appear paradoxical at first, but it makes sense once we understand the storage variable contributes no useful information to the system, since it is unchanging over time.

With imperfect forecasts, the operator sometimes needs to increase the release to avoid overtopping the reservoir, or decrease the release to avoid negative storage. This adds noise to the relationships above, but the reservoir releases are still minimally correlated to storages.

Figure 3: Pairwise relationships between state variables for storage stabilization policy with imperfect forecasts.

Power production policy

The reservoir storage stabilization policy and Milton Friedman’s thermostat example demonstrate just one particular aspect of a much broader point: the act of controlling a dynamical system will fundamentally alter the trajectories of the system through state-space. This has important implications for how observed data from these systems should be understood.

But first, consider an alternative reservoir control policy that attempts to take advantage of power price information when making releases for the purpose of hydropower production. The power price has a mean of 10 and lower persistance than the inflow distribution. We assume the following simple set of rules: (1) when price is below 8, release S/4 MG; (2) when price is between 8 and 12, release S/2; (3) when price is above 12, release (S + I*/2). Again, we also adjust the release to avoid overtopping the reservoir or incurring negative storages. Despite the simplicity of this policy, it produces surprisingly complex dynamics.

Figure 4: Time series of state variables for power production policy with perfect forecast.
Figure 5: Pairwise relationships between state variables for power production policy with imperfect forecasts.

In the bottom row of Figure 5, we see interesting and highly non-linear relationships driving the release policy as a function of the projected inflow, storage, and power price. Additionally, the relationships between the state variables themselves (i.e., power price and storage) can display non-linear and thresholding-type behavior. Comparing the storage stabilization policy and the power production policy underscores the importance of the control process in dictating the dynamics of the system and the regions of state-space that are explored.

Figure 6: Comparison of state-space inhabited by storage stabilization policy and power production policy.

Implications for sensitivity analysis of control policies

So what does any of this have to do with sensitivity analysis? Recent advances in adaptive control (e.g. Direct Policy Search, Reinforcement Learning) have allowed for the development of complex operating policies for managing water resource systems under uncertainty by adapting to evolving conditions over time (see recent reviews by Giuliani et al., 2021, and Herman et al., 2020). These operating policies can be built from Artificial Neural Networks, Radial Basis Functions, Decision Trees, or other functional forms, and can exhibit highly non-linear behavior. This complicates efforts to understand how they work “under the hood,” leading to skepticism from some researchers and decision-makers who prefer simpler and more transparent operating policies. Recent work has attempted to “open the black box” by combining sensitivity analysis and visualization in order to illuminate how different operating policies monitor and respond to different sources of information (e.g., Quinn et al., 2019; Hamilton et al., 2022).

In most applications of global sensitivity analysis, researchers want to understand the impacts of uncertain model parameters (e.g., soil permeability) or inputs (e.g., precipitation) on model outputs (e.g., runoff) (see Pianosi et al., 2016, for a review of sensitivity analysis in environmental models). Importantly, in general the inputs to the sensitivity analysis are considered to be exogenous factors and forcings that do not respond to the dynamics of the rest of the system model. For this reason, (quasi)random sampling approaches are often used to generate samples from across the feasible range of each parameter.

However, when sensitivity analysis is applied to the problem of better understanding adaptive control policies (e.g., reservoir releases), then the policy inputs may be either exogenous (e.g., inflows, power prices) or endogenous (e.g., storage). The latter, as we have seen, can cause highly non-linear and path-dependent dynamics within the system, resulting in simulated trajectories that are not uniformly distributed across state space. Any non-uniformity in the distribution of states is potentially valuable “information” about the dynamics of the control policy and the broader system, which we would like to capture with our sensitivity analysis. For this reason, it is preferable to use simulated system trajectories as inputs to the sensitivity analysis, rather than randomly generated inputs, in order to ensure that our sensitivity analysis reflects the actual data generating process of the system.

Figure 7: Comparison of state-space inhabited by storage stabilization policy, using randomly generated input data vs. simulated state trajectories.
Figure 8: Comparison of state-space inhabited by power production policy, using randomly generated input data vs. simulated state trajectories.

Due to the non-linear, non-independent, non-normally distributed nature of these simulated data, many common variance-based global sensitivity analysis methods may not be appropriate. However, moment-independent methods such as information theoretic sensitivity analysis and the delta-moment independent method can help overcome some of these challenges. See Hamilton et al., 2022, Hadjimichael et al., 2020, and references therein, as well as this blog post by Keyvan Malek, for more discussion of these issues.

For the sake of simplicity, I use R-squared as a simple indicator of variance-based sensitivity rather than more sophisticated measures such as Sobol sensitivity. I also apply a discretized version of information theoretic sensitivity analysis (ITSA). Both indices range between 0 and 1, but they cannot be directly compared to each other (e.g., R2 is often higher than ITSA given the same data) . Tables 1 and 2 show the sensitivity indices for the two operating policies.

ITSA, simR2, simITSI, unifR2, unif
Projected inflow0.520.950.260.69
Storage0.060.040.140.29
Power price0.050.020.070.00
Table 1: Comparison of sensitivity indicies for policy inputs governing reservoir releases, under storage stabilization policy
ITSA_simR2_simITSA_unifR2_unif
Projected inflow0.290.480.140.30
Storage0.410.450.170.31
Power price0.090.010.120.21
Table 2: Comparison of sensitivity indicies for policy inputs governing reservoir releases, under power production policy

The most obvious contrast is between the uniformly sampled data and the simulated data; the former tends to estimate significantly lower sensitivity. This is due to the missing information on the path-dependent relationships between states within the system, as a result of using randomly sampled input data rather than actual simulated trajectories. Another interesting comparison is ITSA_sim vs R2_sim for the power production policy. While R2 classifies projected inflow and storage as having roughly equivalent influence on release decisions, ITSA finds substantially more influence from storage. This finding makes sense when comparing the two relationships in Figure 5; both have similar shapes and variances if you squint, but the storage-release relationship has more well-defined fine structure that cannot be discerned by variance-based approaches.

References

Giuliani, M., Lamontagne, J. R., Reed, P. M., & A. Castelletti. (2021). A State-of-the-Art Review of Optimal Reservoir Control for Managing Conflicting Demands in a Changing World. Water Resources Research, 57, e2021WR029927. https://doi.org/10.1029/2021WR029927

Hadjimichael, A., Quinn, J., & P. Reed. (2020). Advancing diagnostic model evaluation to better understand water shortage mechanisms in institutionally complex river basins. Water Resources Research, 56, e2020WR028079. https://doi.org/10.1029/2020WR028079

Hamilton, A. L., Characklis, G. W., & P. M. Reed. (2022). From stream flows to cash flows: Leveraging Evolutionary Multi-Objective Direct Policy Search to manage hydrologic financial risks. Water Resources Research, 58, e2021WR029747. https://doi.org/10.1029/2021WR029747

Herman, J. D., Quinn, J. D., Steinschneider, S., Giuliani, M., & S. Fletcher (2020). Climate adaptation as a control problem: Review and perspectives on dynamic water resources planning under uncertainty. Water Resources Research, 56, e24389. https://doi.org/10.1029/2019WR025502

Pianosi, F., Beven, K., Freer, J., Hall, J. W., Rougier, J., Stephenson, D. B., & T. Wagener. (2016). Sensitivity analysis of environmental models: A systematic review with practical workflow. Environmental Modelling & Software, 79, 214-232. http://dx.doi.org/10.1016/j.envsoft.2016.02.008

Quinn, J. D., Reed, P. M., Giuliani, M., & A. Castelletti (2019). What is controlling our control rules? Opening the black box of multireservoir operating policies using time-varying sensitivity analysis. Water Resources Research, 55, 5962–5984. https://doi.org/10.1029/2018WR024177

Understanding Information Usage in Machine Learning Models

Deep learning models still largely remain black box concepts, where users have little understanding of how and when input variables are being used for prediction. In some applications, simpler and more interpretable models may be appropriate, but there are instances where deep learning models are simply more accurate. The question is, does one have to sacrifice accuracy for interpretability? Ideally, no. If you truly understand how a model works, this puts you in an optimal position to determine what changes need to be made to make it more accurate. Techniques for better understanding the inner workings of more complex ML models likely will lead to the acceptance of these models in process-driven fields like hydrology. We should aspire to move away from the practice of blindly tuning hyperparameters to get good validation results, and rather to focus more about what these parameters may physically represent (which is certainly not always easy or applicable, otherwise it would be done more often). The goal of this blog post is to outline some concepts within the computer science community that can help users understand information usage in their ML models that can be applied for hydrology applications.

An example of a complex ML model

The current state of the art for recurrent-style neural networks is a Long Short-Term Memory (LSTM) network which has become increasingly popular in hydrology applications. These RNN style networks contain a cell state that has the ability of learn long-term dependencies as well as gates to regulate the flow of information in and out the cell.

A single LSTM cell and its components (Colah, 2020)

The top horizontal line denotes the cell state, Ct ,which is continually being updated, either removing irrelevant information from subsequent time periods or incorporating new recent information. The first gate of the cell state is denoted as the forget gate, or ft. The forget gate takes the previous hidden state ht and the current input xt and decides what to forget from the previous time step’s cell state Ct-1 through the application of a sigmoid function, which effectively can set values in the cell state to a value between 0 and 1 (i.e. representing completely forget to completely retain). Next the input gate, it ,decides which values to update using a sigmoid function. A tanh layer creates new cell state candidate, C~t ,which are multiplied by the values from the input gate to create an update. The cell state can then be updated first by forgetting the prescribed values at the forget gate, and then adding the scaled values, it* C~t. Finally the output gate is used to describe what will be output to the next LSTM unit, which is represented by ht  at the bottom of the unit. The input xt is run through a sigmoidal function. The cell state is pushed through a tanh function from above and then multiplied with the transformed input to decide what the hidden state will be that is relayed to the next unit. The LSTM layer can be made up of hundreds of these cells and multiple layers can be used. Most importantly, the LSTM allows for the preservation and memory of past inputs and outputs, which can help expedite training in a system with slow changing dynamics. 

Clearly an LSTM is one of the more complex deep learning models. However, it has shown great success in hydrology applications, particularly where it’s important to capture physical processes that occur at different time scales. For example, Kratzert et al. (2018) demonstrate how an LSTM network used for rainfall-runoff across 241 catchments is able to perform comparably to SAC-SMA in a fraction of the computational time. The authors also do a great job of unboxing some of the LSTM behavior, including demonstrating how the cell state captures temperature and snow dynamics. However, very few other studies investigate their ML models to this degree.

Techniques for Interpretability

Model-Agnostic Local Models

Many interpretability techniques have been introduced that utilize simple local methods to explain model predictions for a single sample to better understand the behavior of a more complex model. Local interpretable model-agnostic explanations (LIME) is one such technique introduced by Ribeiro et al. (2016) that utilizes a local surrogate model. LIME generates perturbations of input and output data, weighs the samples according to proximity to the baseline sample, and trains a localized (usually linear) model. The goal is to find a model that will minimize loss (minimize the distance between the prediction and estimation) and also minimize complexity.

Another localized technique utilizes Shapley values. Originating from cooperative game theory, Shapley values assign a payout depending on a player’s contribution to the total payout (Shapley, 1953). The analog to ML is that the Shapley value becomes the marginal contribution of a feature across many coalitions (subsets) of possible feature combinations. Because Shapley values are calculated across many possible coalitions, computation time to compute them can be cumbersome.

Model-Specific Local Models

DeepLift is a backpropogation-based approach that propagates the contributions of all neurons in the network to every feature of the input. DeepLIFT compares the activation of each neuron to its ‘reference activation’ and assigns contribution scores according to the difference. DeepSHAP is a modification of the DeepLift algorithm used to efficiently estimate Shapley values over the input feature space for a given instance. DeepShap has been shown to be faster than model-agnostic methods and can explain series of complex models more than model specific local models (Chen et al., 2021). Rather than passing multipliers comparing neuron activation, DeepShap will backpropagate SHAP values.

DeepShap

The authors of DeepShap have created a GitHub repository here that allows you to implement SHAP for any machine learning model and DeepShap for deep learning models implemented using TensorFlow and Keras. The package has some really beautiful figures that you can create with single lines of code. My goal was to see if I could harvest some tools from DeepShap to help make sense of a rainfall runoff LSTM model for an ephemeral subbasin in the Tuolumne River Basin that I had previously coded.

I utilize the following features to predict outflow: Precipitation, Temperature, DOY (represented as sine/cosine of day in this case), prior precipitation aggregated for the past three days ago and two weeks, and then interaction variables. In a prior blog post, I demonstrate why these aggregate variables and interaction variables help to capture different regimes of flow. The results look great, but can we use DeepShap to think more about the parameters of the LSTM?

LSTM Prediction

One parameter of interest is the lag of information that the LSTM feeds in at every time step. In my LSTM, I decide to feed my inputs in in batches of 30, so at every time step, the LSTM is seeing the past 30 days of inputs. I use the DeepExplainer function to calculate the average Shapley values across a set of 500 time steps of the input data and I plot these features for the past 30 days.

#Deep Shap Implementation (Training)
#Choose random points from the dataset to train the DeepExplainer
random_ind = np.random.choice(X_Train.shape[0], 1000, replace=False)
data = X_Train[random_ind[0:500]]
e = shap.DeepExplainer((model.layers[0].input, model.layers[-1].output),data,K.get_session())

#Deep Shap Testing
#Create a test set out of another part of the training set 
test = X_Train[random_ind[500:1000]]
shap_values = e.shap_values(test)
shap_val = np.array(shap_values)
shap_val = np.reshape(shap_val,(int(shap_val.shape[1]),int(shap_val.shape[2]),int(shap_val.shape[3])))
shap_abs = np.absolute(shap_val)
sum_0 = np.mean(shap_abs,axis=0)

#Plot Shapley Values
shap_plot = pd.DataFrame(sum_0, columns=["Precipitation","Temperature","DOY","Three Day Precip","Two Week Precip","Precip_Times_Temperature","Temperature_Times_Day","Precip_Times_Temperature_Times_Day"])
shap_plot['days'] = [i-16 for i in list(range(1,16))]
shap_plot.head()
shap_plot.plot.area(x='days',figsize=(10, 6), cmap='rainbow')
plt.title("Deep SHAP - Feature Importance")
plt.savefig('SHAP.png', bbox_inches='tight')
plt.show()
Feature importance for a 30-day lag

The figure above shows at what times in the batch certain information is being utilized based on the magnitude of the Shapley value associated with that information. We see primarily that current precipitation is the most prominent driver of the next day’s outflow along with the precipitation*temperature interactive variable which intuitively makes sense.

Let’s look back 5 days though. Many of the interactive variables are still relevant but see that temperature and day of the year are now given a higher priority over precipitation .

Shapley factors for 5 days prior to the current time step

Lags of these variables may give the algorithm more information about seasonality that is more informative than looking at these variables in their current state. Furthermore, precipitation at this lag may not be relevant. There’s lots of additional work to be done, but hopefully this tutorial illustrates that If you have the tools to look into your model and a good understanding of the system, it then becomes possible to better attribute your model’s behavior to a process-based understanding.

References

Chen, H., Lundberg, S. M., & Lee, S. I. (2021). Explaining a Series of Models by Propagating Shapley Values. arXiv preprint arXiv:2105.00108.

Kratzert, F., Klotz, D., Brenner, C., Schulz, K., & Herrnegger, M. (2018). Rainfall–runoff modelling using long short-term memory (LSTM) networks. Hydrology and Earth System Sciences22(11), 6005-6022.

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). ” Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135-1144).

Shapley, Lloyd S. “A value for n-person games.” Contributions to the Theory of Games 2.28 (1953): 307-317.

I followed tutorials shown in these blogs as well:

https://colah.github.io/posts/2015-08-Understanding-LSTMs/

https://github.com/danielhkt/deep-forecasting/blob/main/notebook/test_forecasting.ipynb

https://medium.datadriveninvestor.com/time-step-wise-feature-importance-in-deep-learning-using-shap-e1c46a655455

Containerizing Rhodium with Docker

Ever installed a new library only for it to throw version depreciation errors up on your terminal? Or have warnings print in your output line instead of the figure you so painstakingly coded? Fear not – containerization is here to save the day! But before we get too excited, there are a few things (and terms) to learn about containerization using Docker.

In this post, we will be walking through a brief description of containerization and explain a few of its key terms. At the end, we will perform an exercise by containerizing the Rhodium robust decision-making library. More information about this library and how it can be used for exploratory modeling can be found in this post by Andrew Dircks. For a specific application of this library to the Lake Problem, please refer to this post by Antonia Hadjimichael.

Explaining containerization (and what is a base image?)

Visualizing containerization as a portable, app-specific virtual environment (Source: https://www.armor.com/resources/blog/containerization-the-need-to-know/)

Using the image above, picture your hardware (laptop, desktop, supercomputer) as a large cargo ship, with its engines being its operating system. In the absence of containerization, an application (app) is developed in a specific computing environment, akin to placing cargo in a permanent storage hold under the deck of a ship. Methods for cargo loading and removal are strongly dictated by the shape and size of the ship. Similarly, a non-containerized app can only be reliably executed given that it is installed in a computing environment that is nearly or almost completely identical to that in which is was developed in.

On the contrary, containerization bundles everything an app might need to run in a ‘container’ – the code, its required libraries, and their associated dependencies – therefore enabling an app to be run consistently on any infrastructure. By extension, this renders a containerized application version- and operating system (OS)-independent. These ‘containers’ are thus easily loaded and installed onto any ‘cargo ship’. The piece of software that enables the efficient execution of containerized apps is the container engine. This nifty tool is responsible for handling app user input and ensuring the correct installation, startup and running of the containerized app. The engine also pulls, loads, and builds the container image, which is a (misleadingly-named) file, or repository of files, that contains all the information that the engine will need to build the app on a new machine.

In this post, we will be walking through the containerization of the Rhodium library using Docker, which is a container hub that let’s you develop, store and build your container images. It is the first and most commonly-used container hub (at the moment).

Let’s containerize!

Setup

If you use either a Windows or Mac machine, please install Docker Desktop from this site. Linux machines should first install Docker and then Docker Compose. Make sure to create an account and login.

Next, clone the PRIM, Platypus, and Rhodium repositories onto your local machine. You can directly download a .zip file of the repository here or you can clone the repository via your command line/terminal into a folder within a directory of your choice:

C:\Users\username> cd directory-of-choice
C:\Users\username\directory-of-choice>git clone https://github.com/lbl59/Rhodium.git
C:\Users\username\directory-of-choice>git clone https://github.com/Project-Platypus/Platypus.git
C:\Users\username\directory-of-choice>git clone https://github.com/Project-Platypus/PRIM.git

Great, your repositories are set and ready to go! These should result in three new folders: Rhodium, Platypus, and PRIM. Now, in the same terminal window, navigate to the PRIM folder and run the following:

C:\Users\username\directory-of-choice\PRIM>python setup.py develop

Repeat for the Platypus folder. This is to make sure that you have both PRIM and Project Platypus installed and setup on your local machine.

Updating the requirements.txt file

Now, navigate back to the original directory-of-choice. Open the Rhodium folder, locate and open the requirements.txt file. Modify it so it looks like this:

matplotlib==3.5.1
numpy==1.22.1
pandas==1.4.0
mpldatacursor==0.7.1
six==1.16.0
scipy==1.7.3
prim
platypus-opt
sklearn==1.0.2

This file tells Docker that these are the required versions of libraries to install when building and installing your app.

Creating a Dockerfile

To begin building the container image for Docker to pack and build as Rhodium’s container, first create a new text file and name is Dockerfile within the Rhodium folder. Make sure to remove the .txt extension and save it as “All types” to avoid appending an extension. Open it using whichever text file you are comfortable with. The contents of this file should look like as follows. Note that the comments are for explanatory purposes only.

# state the base version of Python you are working with
# for my machine, it is Python 3.9.1
FROM python:3.9.1

# set the Rhodium repository as the container
WORKDIR /rhodium_app

# copy the requirements file into the new working directory
COPY requirements.txt .

# install all libraries and dependencies declared in the requirements file
RUN pip install -r requirements.txt

# copy the rhodium subfolder into the new working directory
# find this subfolder within the main Rhodium folder
COPY rhodium/ .

# this is the command the run when the container starts
CMD ["python", "./setup.py"]

The “.” indicates that you will be copying the file from your present directory into your working directory.

Build the Docker image

Once again in your terminal, check that you are in the same directory as before. Then, type in the following:

$ docker build -t rhodium_image .

Hit enter. If the containerization succeeded, you should see the following in your terminal (or something similar to it):

Containerization successful!

Congratulations, you have successfully containerized Rhodium! You are now ready for world domination!

References

2022. [online] Available at: <https://www.redhat.com/en/topics/cloud-native-apps/what-is-containerization&gt; [Accessed 1 February 2022].

Education, I., 2022. containerization. [online] Ibm.com. Available at: <https://www.ibm.com/cloud/learn/containerization&gt; [Accessed 1 February 2022].

Iordache, A., Scott, D., Turner, S. and Dalleau, F., 2022. Containerized Python Development – Part 1 – Docker Blog. [online] Docker Blog. Available at: <https://www.docker.com/blog/containerized-python-development-part-1/&gt; [Accessed 1 February 2022].

Runnable Docker Guides. 2022. Dockerize your Python Application. [online] Available at: <https://runnable.com/docker/python/dockerize-your-python-application&gt; [Accessed 1 February 2022].