This post discusses the changes to the Python library SALib in version 0.7.1, with some examples of how to use the new capabilities. The two major additions in this version were: group sampling for Sobol’ sensitivity analysis and specifying nonuniform distributions for variables.

**Sobol’ Indices Group Sampling**

Previous versions of SALib allowed one to calculate the first-order, total-order, and second-order indices for individual input parameters. These same indices can be defined for groups of input parameters (see Saltelli (2002) for more discussion). The main change is adding an item called ‘groups’ to the problem dictionary, which specifies the group of each parameter. Here is some example code. Notice in the ‘groups’ entry in the problem definition.

from SALib.sample import saltelli from SALib.analyze import sobol from SALib.util import read_param_file import numpy as np # example function def sampleModel(x): y = x[:,0]**1.5 + x[:,1] + 2*x[:,2] + x[:,3] + np.exp(0.3*x[:,4]) + x[:,5] \ + 2*x[:,1]*x[:,4] + (x[:,0]*x[:,5])**2 return y # problem definition prob_gps_code = { 'num_vars':6, 'names': ['P1','P2','P3','P4','P5','P6'], 'bounds':[[0.0, 1.0], [2.0, 3.0], [0.5, 1.0], [0.0, 5.0], [-0.5, 0.5], [0.0, 1.0]], 'groups':['group1','group2','group1','group1','group2','group3'] } # generating parameter values param_vals_gps_code = saltelli.sample(prob_gps_code, 10000,calc_second_order=True) # calculating model output values Y_gps_code = sampleModel(param_vals_gps_code) # completing Sobol' sensitivity analysis Si_gps_code = sobol.analyze(prob_gps_code,Y_gps_code,calc_second_order=True,print_to_console=True)

The output from this code is given below. In this case the first-order indices (S1’s) are the index that is closed for the group. The S1 for group1 ( P1, P3, and P4) would be equivalent to summing: S1 for P1, P3, and P4; S2 for (P1 & P3), (P1 & P4), and (P3 & P4); and S3 for (P1 & P3 & P4). All of the equations used in calculating the sensitivity indices are the same, but now they are for groups of variables.

Group S1 S1_conf ST ST_conf group1 0.471121 0.081845 0.472901 0.011769 group2 0.498600 0.078950 0.497005 0.013081 group3 0.030502 0.019188 0.031736 0.001041 Group_1 Group_2 S2 S2_conf group1 group2 0.000618 0.159951 group1 group3 0.002170 0.161403 group2 group3 -0.003324 0.155224

**Note**: You can also use the `read_param_file()`

function to define the problem. For the above example the problem file would look like:

P1,0.0,1.0,group1 P2,2.0,3.0,group2 P3,0.5,1.0,group1 P4,0.0,5.0,group1 P5,-.5,.5,group2 P6,0.0,1.0,group3

**Nonuniform Distributions**

Often the variables in a sensitivity analysis are assumed to be distributed uniformly over some interval. In the updated version of the SALib it is possible to specify whether the each input parameter is triangular, normal, lognormal, or uniform. Each of these distributions interprets the ‘bounds’ in the problem dictionary separately, as listed below.

- Triangular, “triang” (assumed lower bound of 0)
- first “bound” is width of distribution (scale, must be greater than 0)
- second “bound” is location of peak as a fraction of the scale (must be on [0,1])

- Normal, “norm”
- first “bound” is the mean (location)
- second “bound” is the standard deviation (scale, must be greater than 0)

- Lognormal, “lognorm” (natural logarithms, assumed lower bound of 0)
- first “bound” is the ln-space mean
- second “bound” is the ln-space standard deviation (must be greater than 0)

- Uniform, “unif”
- first “bound” is the lower bound
- second “bound” is the upper bound (must be greater than lower bound)

Triangular and lognormal distributions with a non-zero lower bound can be obtained by adding the lower bound to the generated parameters before sending the input data to be evaluated by the model.

Building on the same example as above, the problem dictionary and related analysis would be completed as follows.

# problem definition prob_dists_code = { 'num_vars':6, 'names': ['P1','P2','P3','P4','P5','P6'], 'bounds':[[0.0,1.0], [1.0, 0.75], [0.0, 0.2], [0.0, 0.2], [-1.0,1.0], [1.0, 0.25]], 'dists':['unif','triang','norm','lognorm','unif','triang'] } # generating parameter values param_vals_dists_code = saltelli.sample(prob_dists_code, 10000,calc_second_order=True) # calculating model output Y_dists_code = sampleModel(param_vals_dists_code) # complete Sobol' sensitivity analysis Si_dists_code = sobol.analyze(prob_dists_code,Y_dists_code,calc_second_order=True,print_to_console=True)</pre>

The output from this analysis is given below, which is consistent with the format in previous versions of SALib.

Parameter S1 S1_conf ST ST_conf P1 0.106313 0.030983 0.110114 0.003531 P2 0.037785 0.027335 0.085197 0.003743 P3 0.128797 0.029834 0.128702 0.003905 P4 0.034284 0.016997 0.034193 0.001141 P5 0.579715 0.071509 0.627896 0.017935 P6 0.062791 0.021743 0.065357 0.002221 Parameter_1 Parameter_2 S2 S2_conf P1 P2 0.001783 0.060174 P1 P3 0.001892 0.060389 P1 P4 0.001753 0.060155 P1 P5 0.001740 0.062130 P1 P6 0.004774 0.060436 P2 P3 -0.003539 0.051611 P2 P4 -0.003500 0.051186 P2 P5 0.044591 0.054837 P2 P6 -0.003585 0.051388 P3 P4 -0.000562 0.058972 P3 P5 -0.000533 0.059584 P3 P6 -0.000480 0.059923 P4 P5 -0.000364 0.034382 P4 P6 -0.000191 0.034301 P5 P6 -0.001293 0.137576

**Note 1**: You can also use the `read_param_file()`

function to define the problem. The one catch is when you want to use nonuniform distributions without grouping the variables. In this case the fourth column in the input file (column for ‘groups’) must be the parameter name repeated from the first column. For the above example the problem file would look like:

P1,0.0,1.0,P1,unif P2,1.0,.75,P2,triang P3,0.0,.2,P3,norm P4,0.0,.2,P4,lognorm P5,-1.0,1.0,P5,unif P6,1.0,.25,P6,triang

**Note 2**: If you are uncertain that the distribution transformation yielded the desired results, especially since the ‘bounds’ are interpreted differently by each distribution, you can check by plotting histograms of the data. The histograms of the data used in the example are shown below. (The data was actually saved to a .txt file for reference and then imported to `R`

to plot these histograms, but matplotlib has a function `histogram()`

.)

**References**

Saltelli, Andrea (2002). Making best use of model evaluations to compute sensitivity indices. *Computer Physics Communications 145*(2):280-297. doi:10.1016/S0010-4655(02)00280-1

Pingback: Water Programming Blog Guide (Part I) – Water Programming: A Collaborative Research Blog