Running Sobol Sensitivity Analysis using SALib

This post was updated on August 11, 2014 to update the SALib module structure.

The Sensitivity Analysis Library (SALib) is an open-source Python library for common sensitivity analysis routines, including the Sobol, Morris, and FAST methods.

Step 0: Get the library

The easiest way to install is pip install SALib, which will pull the latest version from the Python package index. Or, you can download at the above link and run python setup.py install. If you’re interested in contributing, you can clone the git repository:

git clone https://github.com/jdherman/SALib .

Step 1: Choose sampling bounds for your parameters

First, you will need to create a simple text file with the form [parameter] [lower bound] [upper bound]. For example, such a file for the Hymod model might look like this:

Cmax 0.0 1000.0
B 0.0 3.0
Alpha 0.0 1.0
Kq 0.15 1.0
Ks 0.0 0.15

The bounds are used to sample parameter values. The names of the parameters themselves are only used to print the final sensitivity indices, so you can name them whatever you want. Let’s call this file params.txt.

Step 2: Generate parameter sets using the Sobol Sequence

Put your params.txt file in the same directory as the SALib folder. Move to this directory and type the following command:

python -m SALib.sample.saltelli -n 1000 -p params.txt -o my_samples_file.txt

The -n flag specifies the number of initial samples to generate from the pseudo-random Sobol sequence. The -p flag specifies the parameter bounds file that you created in the first step.

In this example, 1000 parameter sets are generated from the Sobol sequence. After that, the Saltelli method of cross-sampling is applied (for more information, see: Saltelli 2008, “Global Sensitivity Analysis: The Primer“). The cross-sampling scheme creates a total of 2N(p+1) total parameter sets to be run in your model; for the Hymod example, we would have 1000*(5+1) = 6000 total model runs. The parameter samples will be printed to the file specified with the -o flag, which in this case is called my_samples_file.txt.

Note that the Sobol method can be computationally intensive depending on the model being analyzed. Even for a simple model like Hymod, from personal experience I would recommend a sample size of at least N = 10,000 (which translates to 60,000 model runs). More complex models will be slower to run and will also require more samples to calculate accurate estimates of Sobol indices. Once you complete this process, pay attention to the confidence bounds on your sensitivity indices to see whether you need to run more samples.

Step 3: Run the parameter sets through your model

The parameter sets are now saved in my_samples_file.txt. Run these parameter sets through your model, and save the output(s) to a text file. The output file should contain one row of output values for each model run. This process is performed outside of SALib, so the details will be language-specific. Be careful to read in your parameter sets in the same order they were sampled.

Step 4: Calculate Sobol Indices

You now have the output values for all of your model runs saved to a file. For the sake of example, let’s call that file objectiveValues.txt. We need to send this information back to SALib to compute the sensitivity indices, using following command:

python -m SALib.analyze.sobol -p params.txt -Y objectiveValues.txt -c 0

The options here are: the parameter file (-p), the file containing calculated outputs (-Y), and the column of the objective values file to read (-c). The columns are assumed to be zero-indexed; if you have calculated multiple objective values, you would continue on to -m 1, etc., repeating the same command as above. By default, this will print sensitivity indices to the command line. You may want to print them to a file using the “>” operator.

Step 5: Interpret your results

Say that you saved the indices from the above command into the file sobolIndices.txt. If you open this file in a text editor, it will look something like this:

Parameter First_Order First_Order_Conf Total_Order Total_Order_Conf
x1 0.696371 0.183873 0.704233 0.211868
x2 0.232399 0.119619 0.264305 0.129080
x3 -0.021573 0.048209 0.027243 0.066093
...(etc.)

Parameter_1 Parameter_2 Second_Order Second_Order_Conf
x1 x2 -0.142104 0.307560
x1 x3 -0.009698 0.271062
x1 x4 -0.049298 0.283457
...(etc.)

The parameter names will match the ones you specified in params.txt (Here they don’t, but this is just an example). The first order, total order, and second order sensitivities are specified as indicated, along with their respective confidence intervals. Most of the indices are omitted here for the sake of brevity. Typically we use the total order indices to get a broad picture of model behavior, since they estimate all of the interactions between parameters. If the confidence intervals of your dominant indices are larger than roughly 10% of the value itself, you may want to consider increasing your sample size as computation permits. For total-order indices to be important, they will usually need to be above 0.05 at the very least (the most dominant parameters will have values upward of 0.8).

For a full description of available methods and options, please consult the readme and examples in the Github repository or on the SALib website. Github users can also post issues or submit pull requests. Thanks for reading!

Advertisements

18 thoughts on “Running Sobol Sensitivity Analysis using SALib

  1. Pingback: Method of Morris (Elementary Effects) using SALib | Water Programming: A Collaborative Research Blog

  2. I am adding to the function “scale_samples” an option to apply log-uniform scaling of the pseudo-randomly generated [0,1] parameter values (using an extra input column). Will such an addition be compatible with the way that the Sobol and FAST analysis routines are designed? I wonder if a certain type of scaling is being assumed within these routines, because the model parameter values themselves are not needed (unlike in the case of the Morris method). It seems like the log-scaling might change some of the values computed in the analysis, but I could very well be mistaken.

    Thanks,
    Steve

  3. Steve — if I understand this right, I think you’ll want to do the transformation out of log space inside your model. (For example, do uniform sampling between -3 and 5, then do 10^x in your model). Sobol and FAST are assuming uniform parameter ranges, so if you need to log-scale a parameter it’s best to do it this way.

  4. Jon,

    Yeah…you’re right. I forgot that the analyses assume uniform sampling between the lower and upper bounds contained in the input file.

    Thanks for your help, and the great library.

    Steve

  5. Hello guys,
    I have the system which is described by several ODE. The solution looks good for me. Now I need to implement the sensitivity analyses of parameters which I used in the model. Therefore, I have the following questions:

    I have an output in the form of the vector(solution in time). But the sensitivity routines (such SALib) requires the output to be 1 variable F(X) but not a vector. What should I do? I found in several resources that they model some “true” solution without any variance in parameters and later (when they perform the sensitivity) they compare it with “true” solution via RMSD. Is it correct way of doing that? (could you provide some references)

    Parameter boundaries: how to choose the range of parameters? In the some papers the modelers do just ±5% of their values of parameters. What is the most typical way of doing this? based on the literature values and taking [min-5% ; max+5%] will be good? What if I dont have any literature values?

  6. Hi Igor,
    I’d say you have two options

    (A) Somehow aggregate the time series into a single number. If you have a “true” solution, yes you can calculate RMS error or some other error measure. If you don’t have a true solution to compare against, you can aggregate using some other meaningful statistic (like the sum, or the peak, etc. depending on the application). But in general with this option you’re calculating the sensitivity of just one number representing the whole time series.

    (B) Or, you can calculate the sensitivity of the output value at every timestep. You would create a loop and run the analyze() function once for every timestep. This is more computationally intensive, but can result in some more dynamic information.

    As for the parameter boundaries, that is very important and very domain-specific. First look and see if you can find some ranges from another paper where they used your model, and cite those. If not, choosing a percent value is okay, but 5% seems a bit narrow — you might try 10% instead. The most important thing is that your outputs still make sense. If your ranges produce parameter samples that are “non-physical”, in other words the output has no bearing on reality anymore, then you should narrow them. There is a lot of subjectivity here and it can change your results quite a bit, so spend some time on this.

  7. Hi, thanks for all the information. I was wondering if it is possible to define different distributions to perform the sampling.
    Thanks for the help.

  8. Hi Oscar,

    Currently only uniform distributions are supported. However, if you sample your variable on the uniform [0,1] range, you can use a CDF transform in your simulation model to convert this into any distribution you want. In other words you sample in the percentile space, then use something like (for example) the Matlab “norminv” command to get a normal distribution, or “gammainv”, etc.

    Hope this helps,
    Jon

  9. Hi,
    I am a little confused because of the step of running my model. Indeed, my model is written in Fortran and the output file includes daily estimates such as soil moisture and leaf area index. This means that an output file containing 365 raw of soil moisture and Leaf area index each time the model run. How can I combine all this outputs in only 1 file in which each line represents one time run of the model

  10. Hi Omar,

    I’m not sure I understand the question. The only thing SALib expects is a column of output values for each sensitivity analysis. How you read those values into Python, or how you store them in files, is entirely up to you.

    Thanks for using the library.

    • well, this issue is solved. However, I am interested to perform the analysis among groups of parameters. as posted below:
      How about performing the analysis among groups of parameters. For example, if there are 100 parameters divided into 10 groups based on the physical characteristics. is that possible to perform the sobol sensitivity analysis among the group. if yes, how to assign the group names in the script.

  11. hi,
    How about performing the analysis among groups of parameters. For example, if there are 100 parameters divided into 10 groups based on the physical characteristics. is that possible to perform the sobol sensitivity analysis among the group. if yes, how to assign the group names in the script.

    Regards

  12. Pingback: Water Programming Blog Guide (Part I) – Water Programming: A Collaborative Research Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s