Note: this post describes a version of Sobol written in C/C++ that is no longer supported. We now recommend the version of Sobol analysis included in SALib (https://github.com/SALib/SALib), an open-source Python library for sensitivity analysis methods. The link describes how to run different methods from the command line. If you prefer, there is another version of Sobol analysis included in MOEAFramework (http://moeaframework.org), written in Java. There is a separate post here describing how to use the MOEAFramework version.
Sobol sensitivity analysis is used to attribute the variance in a model’s output to its parameters and their interactions. Our group maintains a C/C++ version of the analysis that can be used as a black box. This post will describe how to hook your model into the Sobol framework and compute some sensitivity indices.
Step 1: Set up the environment
First, you will need to create a simple text file with the form [parameter] [lower bound] [upper bound]. For example, such a file for the Hymod model might look like this:
Cmax 0.0 1000.0
B 0.0 3.0
Alpha 0.0 1.0
Kq 0.15 1.0
Ks 0.0 0.15
The bounds are used to sample parameter values. The names of the parameters themselves are only used to print the final sensitivity indices (i.e., you can name them whatever you want). For this example, I will refer to this file as params.txt.
Obtain the Sobol source code from one of the group members and extract the folder somewhere in a UNIX environment (Cygwin on Windows will work, but you might as well put it on the cluster somewhere). Go to the Sobol directory and compile the source code (just type make). This will create three executables named generate, generate_sobol, and sobol.
Step 2: Generate parameter sets using the Sobol Sequence
From the Sobol directory, type the following command:
./generate -p params.txt -n 1000 -m 1 | ./generate_sobol -o sobolParams.txt
We are running two different executables here. The first, generate, reads the parameter file you give it and generates 1000 samples (specified by the -n flag) using the Sobol Sequence (specified by the -m flag). This output is piped (the “|” operator) to the executable generate_sobol, which resamples the parameters holding each one fixed at a time. This results in N(p+1) total parameter sets to be run in your model; for the Hymod example, we would have 1000*(5+1) = 6000 total model runs. The parameter sets are saved in sobolParams.txt.
Note that the Sobol method is very computationally intensive. Even for a simple model like Hymod, from personal experience I would recommend a sample size of at least N = 10,000 (which translates to 60,000 model runs). More complex models will be slower to run and will also require more samples to calculate accurate estimates of Sobol indices. Pay attention to the confidence bounds on your sensitivity indices to see whether you need to run more samples.
Step 3: Run the parameter sets through your model
The parameter sets are now saved in sobolParams.txt—the analysis so far has no idea what model you’re running. You need to set up your model to read in parameter sets from the file sobolParams.txt, run them through the model, and save your desired outputs in a file called objectiveValues.txt. You can use multiple objective values here—each row in the objectives file will correspond to a model run. Here is an example of reading in parameter sets for the Hymod model in C/C++ in the main() function:
//begin main()
string sampleFile = "sobolParams.txt"; //Input Sobol samples file to read
int numSamples = 60000; //make sure your program knows how many samples it has to read
int nParams = 5; //how many parameters do you have?
int nObj = 4; //how many objectives do you have?
ifstream in; //Input file stream
ofstream of; //output file stream for objective values
//obj is an array holding the objective values, and xreal holds the parameter values for each run
double *obj = new double [nObj];
double *xreal = new double [nParams];
//Intialize the model only once
init_HyMod(argc, argv);
//Open the output file and set some properties for it
of.open("objectiveValues.txt", ios::app);
of.setf(ios_base::fixed);
of.precision(8);
of.width(20);
in.open(sampleFile.c_str(), ios_base::in);
if (!in)
{
cout << "Error: sampleFile could not be opened!" << endl;
exit(1);
}
//Loop through parameter samples in sobolParams file
for (int s = 0; s < numSamples; s++)
{
if (s%100 == 0) cout << s << endl; //keep track of your progress
//Read in the parameter values from Sobol
for (int i=0; i<nParams; i++) { in >> xreal[i]; //read in each parameter individually for this set
}
in.ignore(1000,'\n'); //ignore the rest of the line
//Run this sample once all parameters have been read in
calc_HyMod(indPtr);
//Print the objective value outputs
for(int i = 0; i < nObj; i++)
{
of << setw(15) << obj[i];
}
of << endl; } //close the "samples" loop //Close the output file of.close(); //Remember to clean up any memory you allocated! deleteAllStuff(&hymodPtr->data, &hymodPtr->model, &hymodPtr->snowModel);
//end main()
As you can see, even this fairly simple task becomes complicated in C/C++. It might only take you a handful of lines to do the same thing in Matlab. But if you want speed, C/C++ is the way to go. Be careful to read in your parameters in the same order in which you sampled them!
Step 4: Calculate Sobol Indices
You now have the objective values for all of your model runs, arranged in columns in the file objectiveValues.txt. We need to send this back to the final executable, sobol, to compute the sensitivity indices. Do this with the following command:
./sobol -p params.txt -i objectiveValues.txt -o sobolIndices.txt -m 0
As shown here, the sobol executable requires several flags: the parameter file (-p), the file containing calculated objective values (-i), the output file for the sensitivity indices (-o), and the column of the objective values file to read (-m). The columns are assumed to be zero-indexed; if you have calculated multiple objective values, you would continue on to -m 1, etc., repeating the same command as above.
Optional parameters for the sobol program include the (-r) flag, which specifies the number of boostrap resamples, and the (-s) flag, which specifies the random seed for bootstrapping. These parameters are used to calculate the confidence intervals of the sensitivity indices. If these flags are excluded, the program defaults to use the system time as the random seed, with 1000 bootstrap resamples.
Step 5: Interpret your results
Open the file sobolIndices.txt in a regular text editor. You will see something like this:
Parameter Sensitivity [Confidence] Rank
1st-Order Sensitivity Indices
Ks +0.004141 [ +0.004347 ] 5
Kq +0.128900 [ +0.022733 ] 2
DDF +0.000007 [ +0.000006 ] 8
Tb +0.000007 [ +0.000006 ] 7
Tth +0.000007 [ +0.000006 ] 6
Alpha +0.262308 [ +0.037787 ] 1
B +0.039529 [ +0.018653 ] 4
CMax +0.112149 [ +0.037824 ] 3
Total-Order Sensitivity Indices
Ks +0.011856 [ +0.003890 ] 5
Kq +0.298414 [ +0.036949 ] 3
DDF -0.000000 [ +0.000000 ] 8
Tb +0.000000 [ +0.000000 ] 7
Tth +0.000000 [ +0.000000 ] 6
Alpha +0.617916 [ +0.038536 ] 1
B +0.172242 [ +0.029841 ] 4
CMax +0.379264 [ +0.053695 ] 2
2nd-Order Sensitivity Indices
Ks * Kq +0.000842 [ +0.006566 ] 12
Ks * DDF +0.005705 [ +0.006541 ] 10
Ks * Tb +0.005705 [ +0.006556 ] 9
Ks * Tth +0.005705 [ +0.006331 ] 8
Ks * Alpha +0.006472 [ +0.005962 ] 6
.....(second order indices continued).....
The parameter names will match the ones you specified in params.txt. The first order, total order, and second order sensitivities are specified as indicated. Confidence intervals for each indice are shown in brackets. Rankings are included in the rightmost column. Typically we use the total order indices to get a broad picture of model behavior—here you can see that the Alpha parameter is controlling most of the variance in the output. If the confidence intervals of your dominant indices are larger than ~10% of the value itself, you may want to consider increasing your sample size as computation permits.
Those are the basics for running Sobol. There are some cool extensions to this framework that I plan to cover in future posts, including time-varying sensitivities, sensitivity of parameter groups, and how to visualize indices in Matlab. Feel free to email jdh33 (at) psu (dot) edu with questions. Thanks for reading.