# SALib v0.7.1: Group Sampling & Nonuniform Distributions

This post discusses the changes to the Python library SALib in version 0.7.1, with some examples of how to use the new capabilities. The two major additions in this version were: group sampling for Sobol’ sensitivity analysis and specifying nonuniform distributions for variables.

## Sobol’ Indices Group Sampling

Previous versions of SALib allowed one to calculate the first-order, total-order, and second-order indices for individual input parameters. These same indices can be defined for groups of input parameters (see Saltelli (2002) for more discussion). The main change is adding an item called ‘groups’  to the problem dictionary, which specifies the group of each parameter. Here is some example code. Notice in the ‘groups’  entry in the problem definition.

```
from SALib.sample import saltelli
from SALib.analyze import sobol
import numpy as np

# example function
def sampleModel(x):
y = x[:,0]**1.5 + x[:,1] + 2*x[:,2] + x[:,3] + np.exp(0.3*x[:,4]) + x[:,5] \
+ 2*x[:,1]*x[:,4] + (x[:,0]*x[:,5])**2
return y

# problem definition
prob_gps_code = {
'num_vars':6,
'names': ['P1','P2','P3','P4','P5','P6'],
'bounds':[[0.0, 1.0], [2.0, 3.0], [0.5, 1.0], [0.0, 5.0], [-0.5, 0.5], [0.0, 1.0]],
'groups':['group1','group2','group1','group1','group2','group3']
}
# generating parameter values
param_vals_gps_code = saltelli.sample(prob_gps_code, 10000,calc_second_order=True)

# calculating model output values
Y_gps_code = sampleModel(param_vals_gps_code)

# completing Sobol' sensitivity analysis
Si_gps_code = sobol.analyze(prob_gps_code,Y_gps_code,calc_second_order=True,print_to_console=True)

```

The output from this code is given below. In this case the first-order indices (S1’s) are the index that is closed for the group. The S1 for group1 ( P1, P3, and P4) would be equivalent to summing:  S1 for P1, P3, and P4; S2 for (P1 & P3), (P1 & P4), and (P3 & P4); and S3 for (P1 & P3 & P4). All of the equations used in calculating the sensitivity indices are the same, but now they are for groups of variables.

```Group S1 S1_conf ST ST_conf
group1 0.471121 0.081845 0.472901 0.011769
group2 0.498600 0.078950 0.497005 0.013081
group3 0.030502 0.019188 0.031736 0.001041

Group_1 Group_2 S2 S2_conf
group1 group2 0.000618 0.159951
group1 group3 0.002170 0.161403
group2 group3 -0.003324 0.155224```

Note: You can also use the `read_param_file()` function to define the problem. For the above example the problem file would look like:

```P1,0.0,1.0,group1
P2,2.0,3.0,group2
P3,0.5,1.0,group1
P4,0.0,5.0,group1
P5,-.5,.5,group2
P6,0.0,1.0,group3```

## Nonuniform Distributions

Often the variables in a sensitivity analysis are assumed to be distributed uniformly over some interval. In the updated version of the SALib it is possible to specify whether the each input parameter is triangular, normal, lognormal, or uniform. Each of these distributions interprets the ‘bounds’ in the problem dictionary separately, as listed below.

• Triangular, “triang” (assumed lower bound of 0)
• first “bound” is width of distribution (scale, must be greater than 0)
• second “bound” is location of peak as a fraction of the scale (must be on [0,1])
• Normal, “norm”
• first “bound” is the mean (location)
• second “bound” is the standard deviation (scale, must be greater than 0)
• Lognormal, “lognorm” (natural logarithms, assumed lower bound of 0)
• first “bound” is the ln-space mean
• second “bound” is the ln-space standard deviation (must be greater than 0)
• Uniform, “unif”
• first “bound” is the lower bound
• second “bound” is the upper bound (must be greater than lower bound)

Triangular and lognormal distributions with a non-zero lower bound can be obtained by adding the lower bound to the generated parameters before sending the input data to be evaluated by the model.

Building on the same example as above, the problem dictionary and related analysis would be completed as follows.

```# problem definition
prob_dists_code = {
'num_vars':6,
'names': ['P1','P2','P3','P4','P5','P6'],
'bounds':[[0.0,1.0], [1.0, 0.75], [0.0, 0.2], [0.0, 0.2], [-1.0,1.0], [1.0, 0.25]],
'dists':['unif','triang','norm','lognorm','unif','triang']
}

# generating parameter values
param_vals_dists_code = saltelli.sample(prob_dists_code, 10000,calc_second_order=True)

# calculating model output
Y_dists_code = sampleModel(param_vals_dists_code)

# complete Sobol' sensitivity analysis
Si_dists_code = sobol.analyze(prob_dists_code,Y_dists_code,calc_second_order=True,print_to_console=True)&lt;/pre&gt;
```

The output from this analysis is given below, which is consistent with the format in previous versions of SALib.

```Parameter S1 S1_conf ST ST_conf
P1 0.106313 0.030983 0.110114 0.003531
P2 0.037785 0.027335 0.085197 0.003743
P3 0.128797 0.029834 0.128702 0.003905
P4 0.034284 0.016997 0.034193 0.001141
P5 0.579715 0.071509 0.627896 0.017935
P6 0.062791 0.021743 0.065357 0.002221
Parameter_1 Parameter_2 S2 S2_conf
P1 P2 0.001783 0.060174
P1 P3 0.001892 0.060389
P1 P4 0.001753 0.060155
P1 P5 0.001740 0.062130
P1 P6 0.004774 0.060436
P2 P3 -0.003539 0.051611
P2 P4 -0.003500 0.051186
P2 P5 0.044591 0.054837
P2 P6 -0.003585 0.051388
P3 P4 -0.000562 0.058972
P3 P5 -0.000533 0.059584
P3 P6 -0.000480 0.059923
P4 P5 -0.000364 0.034382
P4 P6 -0.000191 0.034301
P5 P6 -0.001293 0.137576```

Note 1: You can also use the `read_param_file()` function to define the problem. The one catch is when you want to use nonuniform distributions without grouping the variables. In this case the fourth column in the input file (column for ‘groups’) must be the parameter name repeated from the first column. For the above example the problem file would look like:

```P1,0.0,1.0,P1,unif
P2,1.0,.75,P2,triang
P3,0.0,.2,P3,norm
P4,0.0,.2,P4,lognorm
P5,-1.0,1.0,P5,unif
P6,1.0,.25,P6,triang```

Note 2: If you are uncertain that the distribution transformation yielded the desired results, especially since the ‘bounds’ are interpreted differently by each distribution, you can check by plotting histograms of the data. The histograms of the data used in the example are shown below. (The data was actually saved to a .txt file for reference and then imported to `R` to plot these histograms, but matplotlib has a function `histogram()`.)

## References

Saltelli, Andrea (2002). Making best use of model evaluations to compute sensitivity indices. Computer Physics Communications 145(2):280-297. doi:10.1016/S0010-4655(02)00280-1

## 7 thoughts on “SALib v0.7.1: Group Sampling & Nonuniform Distributions”

1. Matthias Degrave |

Hello,

Thank you, I have a question : I’ve used your code and I get almost the same results but not exactly. It seems it’s not a “random” thing since results don’t change when I execute again the code. Is there a reason why we could get different results ?

• Thank you for the question. It could be related to the version of the libraries that you are using. I reran the code in my example and I got the values reported, but after I updated SAlib the values changed slightly. For instance, in the first example with the new libraries I got a value of 0.471054 for the group1 S1 index compared to 0.471121 reported above. The new values are consistent with a different computer that also has a more recent installation of the libraries. I hope this helps.

2. Vincent van Zelst |

Hello,

I am intending to use a Morris approach in combination with a computational extensive numerical model. For this reason I try to keep the number of trajectories low. Only after plotting histograms of the sample parameters I found out that they are not yielding an exact uniform distribution. I used the default grid levels (4) and a grid jump size of (2). This should result in a delta of 2/3, which should guarantee equal probability sampling according to Global Sensitivity Analysis by Saltelli et. al 2008.

I am wondering why this problem arises and how to solve it.