This post discusses the changes to the Python library SALib in version 0.7.1, with some examples of how to use the new capabilities. The two major additions in this version were: group sampling for Sobol’ sensitivity analysis and specifying nonuniform distributions for variables.

**Sobol’ Indices Group Sampling**

Previous versions of SALib allowed one to calculate the first-order, total-order, and second-order indices for individual input parameters. These same indices can be defined for groups of input parameters (see Saltelli (2002) for more discussion). The main change is adding an item called ‘groups’ to the problem dictionary, which specifies the group of each parameter. Here is some example code. Notice in the ‘groups’ entry in the problem definition.

from SALib.sample import saltelli from SALib.analyze import sobol from SALib.util import read_param_file import numpy as np # example function def sampleModel(x): y = x[:,0]**1.5 + x[:,1] + 2*x[:,2] + x[:,3] + np.exp(0.3*x[:,4]) + x[:,5] \ + 2*x[:,1]*x[:,4] + (x[:,0]*x[:,5])**2 return y # problem definition prob_gps_code = { 'num_vars':6, 'names': ['P1','P2','P3','P4','P5','P6'], 'bounds':[[0.0, 1.0], [2.0, 3.0], [0.5, 1.0], [0.0, 5.0], [-0.5, 0.5], [0.0, 1.0]], 'groups':['group1','group2','group1','group1','group2','group3'] } # generating parameter values param_vals_gps_code = saltelli.sample(prob_gps_code, 10000,calc_second_order=True) # calculating model output values Y_gps_code = sampleModel(param_vals_gps_code) # completing Sobol' sensitivity analysis Si_gps_code = sobol.analyze(prob_gps_code,Y_gps_code,calc_second_order=True,print_to_console=True)

The output from this code is given below. In this case the first-order indices (S1’s) are the index that is closed for the group. The S1 for group1 ( P1, P3, and P4) would be equivalent to summing: S1 for P1, P3, and P4; S2 for (P1 & P3), (P1 & P4), and (P3 & P4); and S3 for (P1 & P3 & P4). All of the equations used in calculating the sensitivity indices are the same, but now they are for groups of variables.

Group S1 S1_conf ST ST_conf group1 0.471121 0.081845 0.472901 0.011769 group2 0.498600 0.078950 0.497005 0.013081 group3 0.030502 0.019188 0.031736 0.001041 Group_1 Group_2 S2 S2_conf group1 group2 0.000618 0.159951 group1 group3 0.002170 0.161403 group2 group3 -0.003324 0.155224

**Note**: You can also use the `read_param_file()`

function to define the problem. For the above example the problem file would look like:

P1,0.0,1.0,group1 P2,2.0,3.0,group2 P3,0.5,1.0,group1 P4,0.0,5.0,group1 P5,-.5,.5,group2 P6,0.0,1.0,group3

**Nonuniform Distributions**

Often the variables in a sensitivity analysis are assumed to be distributed uniformly over some interval. In the updated version of the SALib it is possible to specify whether the each input parameter is triangular, normal, lognormal, or uniform. Each of these distributions interprets the ‘bounds’ in the problem dictionary separately, as listed below.

- Triangular, “triang” (assumed lower bound of 0)
- first “bound” is width of distribution (scale, must be greater than 0)
- second “bound” is location of peak as a fraction of the scale (must be on [0,1])

- Normal, “norm”
- first “bound” is the mean (location)
- second “bound” is the standard deviation (scale, must be greater than 0)

- Lognormal, “lognorm” (natural logarithms, assumed lower bound of 0)
- first “bound” is the ln-space mean
- second “bound” is the ln-space standard deviation (must be greater than 0)

- Uniform, “unif”
- first “bound” is the lower bound
- second “bound” is the upper bound (must be greater than lower bound)

Triangular and lognormal distributions with a non-zero lower bound can be obtained by adding the lower bound to the generated parameters before sending the input data to be evaluated by the model.

Building on the same example as above, the problem dictionary and related analysis would be completed as follows.

# problem definition prob_dists_code = { 'num_vars':6, 'names': ['P1','P2','P3','P4','P5','P6'], 'bounds':[[0.0,1.0], [1.0, 0.75], [0.0, 0.2], [0.0, 0.2], [-1.0,1.0], [1.0, 0.25]], 'dists':['unif','triang','norm','lognorm','unif','triang'] } # generating parameter values param_vals_dists_code = saltelli.sample(prob_dists_code, 10000,calc_second_order=True) # calculating model output Y_dists_code = sampleModel(param_vals_dists_code) # complete Sobol' sensitivity analysis Si_dists_code = sobol.analyze(prob_dists_code,Y_dists_code,calc_second_order=True,print_to_console=True)</pre>

The output from this analysis is given below, which is consistent with the format in previous versions of SALib.

Parameter S1 S1_conf ST ST_conf P1 0.106313 0.030983 0.110114 0.003531 P2 0.037785 0.027335 0.085197 0.003743 P3 0.128797 0.029834 0.128702 0.003905 P4 0.034284 0.016997 0.034193 0.001141 P5 0.579715 0.071509 0.627896 0.017935 P6 0.062791 0.021743 0.065357 0.002221 Parameter_1 Parameter_2 S2 S2_conf P1 P2 0.001783 0.060174 P1 P3 0.001892 0.060389 P1 P4 0.001753 0.060155 P1 P5 0.001740 0.062130 P1 P6 0.004774 0.060436 P2 P3 -0.003539 0.051611 P2 P4 -0.003500 0.051186 P2 P5 0.044591 0.054837 P2 P6 -0.003585 0.051388 P3 P4 -0.000562 0.058972 P3 P5 -0.000533 0.059584 P3 P6 -0.000480 0.059923 P4 P5 -0.000364 0.034382 P4 P6 -0.000191 0.034301 P5 P6 -0.001293 0.137576

**Note 1**: You can also use the `read_param_file()`

function to define the problem. The one catch is when you want to use nonuniform distributions without grouping the variables. In this case the fourth column in the input file (column for ‘groups’) must be the parameter name repeated from the first column. For the above example the problem file would look like:

P1,0.0,1.0,P1,unif P2,1.0,.75,P2,triang P3,0.0,.2,P3,norm P4,0.0,.2,P4,lognorm P5,-1.0,1.0,P5,unif P6,1.0,.25,P6,triang

**Note 2**: If you are uncertain that the distribution transformation yielded the desired results, especially since the ‘bounds’ are interpreted differently by each distribution, you can check by plotting histograms of the data. The histograms of the data used in the example are shown below. (The data was actually saved to a .txt file for reference and then imported to `R`

to plot these histograms, but matplotlib has a function `histogram()`

.)

**References**

Saltelli, Andrea (2002). Making best use of model evaluations to compute sensitivity indices. *Computer Physics Communications 145*(2):280-297. doi:10.1016/S0010-4655(02)00280-1

Pingback: Water Programming Blog Guide (Part I) – Water Programming: A Collaborative Research Blog

Hello,

Thank you, I have a question : I’ve used your code and I get almost the same results but not exactly. It seems it’s not a “random” thing since results don’t change when I execute again the code. Is there a reason why we could get different results ?

Thank you for the question. It could be related to the version of the libraries that you are using. I reran the code in my example and I got the values reported, but after I updated SAlib the values changed slightly. For instance, in the first example with the new libraries I got a value of 0.471054 for the group1 S1 index compared to 0.471121 reported above. The new values are consistent with a different computer that also has a more recent installation of the libraries. I hope this helps.

Hello,

I am intending to use a Morris approach in combination with a computational extensive numerical model. For this reason I try to keep the number of trajectories low. Only after plotting histograms of the sample parameters I found out that they are not yielding an exact uniform distribution. I used the default grid levels (4) and a grid jump size of (2). This should result in a delta of 2/3, which should guarantee equal probability sampling according to Global Sensitivity Analysis by Saltelli et. al 2008.

I am wondering why this problem arises and how to solve it.

Thank you in advance.

thank you for your answer

Pingback: Open Source Sensitivity Analysis Tools – Water Programming: A Collaborative Research Blog

Hello Calvin,

Thank you for the examples. I have a question related to the groups analysis, I hope you could help me.

I’m trying to use SALib for a study in which I have five input parameters. I’d like to perform two analysis, the first considering each parameter individually and the second considering that two of parameters will form a group. I know I could do the calculations (problem definition, saltelli sampling, model evaluation and sobol analysis) for each case independently, but my model evaluation takes too much time. That way, was wondering if it would be possible to use the same parameter samples and model evaluations to perform both sobol analysis using SALib. If it is possible, could you give an example?