Extensions of SALib for more complex sensitivity analyses

Over the past few weeks, I’ve had some helpful discussions with users of SALib that I thought would be worth sharing. These questions mostly deal with using the existing library in clever ways for more complicated modeling scenarios, but there is some extra information about library updates at the end.

1. How to sample parameters in log space

All three methods in the library (Sobol, Morris, and extended FAST) currently assume an independent uniform sampling of the parameters to be analyzed. (This is described in the documentation). However, lots of models have parameters that should be sampled in log space. This is especially true of environmental parameters, like hydraulic conductivity for groundwater models. In this case, uniform sampling over several orders of magnitude will introduce bias away from the smaller values.

One approach is instead to uniformly sample the exponent of the parameter. For example, if your parameter value ranges from [0.001, 1000], sample from [-3, 3]. Then transform the value back into real space after you read it into your model (and of course, before you do any calculations!) This way you can still use uniform sampling while ensuring fair representation in your parameter space.

2. How to sample discrete scenarios

In some sensitivity analysis applications, the uncertain factor you’re sampling isn’t a single value, but an entire scenario! This could be, for example, a realization of future streamflow or climate conditions—we would like to compare the sensitivity of some model output to streamflow and climate scenarios, without reducing the latter to a single value.

This can be done in SALib as follows. Say that you have an ensemble of 1,000 possible streamflow scenarios. Sample a uniform parameter on the range [0, 999]. Then, in your model, round it down to the nearest integer, and use it as an array index to access a particular scenario. This is the approach used in the “General Probabilistic Framework” described by Baroni and Tarantola (2014). Discretizing the input factor should not affect the Sobol and FAST methods. It will affect the Morris method, which uses the differences between input factors to determine elementary effects, so use with caution.

This approach was recently used by Matt Perry to analyze the impact of climate change scenarios on forest growth.

3. Dealing with model-specific configuration files

In Matt’s blog post (linked above), he mentioned an important issue: the space-separated columns of parameter samples generated by SALib may not be directly compatible with model input files. Many models, particularly those written in compiled languages, will have external configuration files (plaintext/XML/etc.) to specify their parameters. Currently SALib doesn’t have a solution for this—you’ll have to roll your own script to convert the parameter samples to the format of your model’s configuration file. (Update 11/16/14: here is an example of using Python to replace template variables with parameter samples from SALib).

One idea for how to do this in the future would be to have the user specify a “template file”, which is a configuration file where the parameter values are replaced with tags (for example, “{my_parameter}”. The location of this file could be specified as a command line parameter. Then, while generating parameter samples, SALib could make a copy of the template for each model run, overwriting the tags with parameter values along the way. The downside of this approach is that you would have thousands of input files instead of one. I’m going to hold off on this for now, but feel free to submit a pull request.

4. Confidence intervals for Morris and FAST

Previously, only the Sobol method printed confidence intervals for the sensitivity indices. These are generated by bootstrapping with subsets of the sample matrix. I updated the Morris method with a similar technique, where confidence intervals are bootstrapped by sampling subsets of the distribution of elementary effects.

For FAST (and extended FAST), there does not appear to be a clear way to get confidence intervals by bootstrapping. The original extended FAST paper by Saltelli et al. displayed confidence intervals on sensitivity indices, but these were developed by replicating the experiment, adding a random phase shift to generate alternate sequences of points as given in Section 2.2 of the linked paper. I added this random phase shift to SALib such that a different random seed will produce a different sampling sequence for FAST (previously this was not the case).

However, my attempts to bootstrap the FAST results were unsuccessful. The sequence of model outputs are FFT‘d to develop the sensitivity indices, which means that they cannot be sub-sampled or taken out of order. So for now, FAST does not provide confidence intervals. You can generate your own confidence intervals by replicating the full sensitivity analysis with different random seeds. This is usually very difficult for environmental models, given the computational expense, but not for test functions.

Thanks for reading. Email me at jdh366-at-cornell-dot-edu if you have any questions, or want to share a successful (or unsuccessful) application of SALib!

Advertisements

2 thoughts on “Extensions of SALib for more complex sensitivity analyses

  1. Pingback: Python’s template class – Water Programming: A Collaborative Research Blog

  2. Pingback: Water Programming Blog Guide (Part I) – Water Programming: A Collaborative Research Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s