Factor prioritization and factor fixing: how to know what’s important

There have been several blogposts on sensitivity analysis (SA) on this blog, focusing primarily on tools to perform it (e.g., SALib) and visualize outputs. Today I’ll be providing some more information on how to decide which factors are most important in affecting our output and which are largely inconsequential. Picking what is actually important for what we care about is obviously largely subjective and case-dependent, but this post is meant to provide some support to that exercise. I will performing a Global Sensitivity Analysis of a system resulting in a rank-ordering of the most important factors driving variability in the output (i.e., factor prioritization), which can be used to decide which are the least influential factors that can be fixed to simplify the model (i.e., factor fixing) [1].

The scripts I’ll be using can be found here, and I’ll be using a fishery model to demonstrate, as a simplified representation of a socio-ecological system we’re trying to manage. The procedure I’ll be following has been based on the work found in [2-4].

The idea is this:
I generate 1000 samples of uncertain factors that might be driving variability in my outcome (let’s call this Set 1). I apply a certain SA method on the samples and the outcomes and get sensitivity indices for each of my factors, ranking them from most important to least. Where do I draw the line between important and not important?
We can create a Set 2, using only the T most important factors from our Set 1 sample, and fixing all other factors to their default values.
We can also create a Set 3, now fixing the T most important factors to defaults and using the sampled values of all other factors from Set 1.

If we classified our important and unimportant factors correctly, then the correlation coefficient between the model outputs of Set 2 and Set 1 should approximate 1 (since we’re fixing all factors that don’t matter), and the correlation coefficient between outputs from Set 3 and Set 1 should approximate 0 (since the factors we sampled are inconsequential to the output).

Here’s how it’s done using SALib and the Delta Method (in the interest of space I’ll only share the most important snippets of code, you need the full scripts to make it run, which are in this repository) :

First we set up our problem using SALib nomenclature, generate 1000 samples using all factors (which will be our Set 1) and run the model for all 1000 samples. Finally we analyze our output using the Delta method. (This should take a couple minutes to run on your personal computer.)

# Set up dictionary with system parameters
problem = {
  'num_vars': 9,
  'names': ['a', 'b', 'c', 'd','h',
            'K','m','sigmaX','sigmaY'],
  'bounds': [[ 0.002, 2],
             [0.005, 1],
             [0.2, 1],
             [0.05, 0.2],
             [0.001, 1],
             [100, 5000],
             [0.1, 1.5],
             [0.001, 0.01],
             [0.001, 0.01]]
}

defaultvalues = np.array([0.005, 0.5, 0.5, 0.1, 0.1, 2000, 0.7, 0.004, 0.004])

# Generate samples
nsamples = 1000
X_Set1 = latin.sample(problem, nsamples) # This is Set 1

# Run model for all samples
output = [fish_game(*X_Set1[j,:]) for j in range(nsamples)]

# Perform analysis
results = delta.analyze(problem, X_Set1, np.asarray(output), print_to_console=True)

This will produce output like below, telling as the Delta indices of each of the sampled parameters, the confidence internals of those, the First order Sobol indices of the parameters, and their equivalent confidence intervals.

Parameter delta delta_conf S1 S1_conf
a 0.102206 0.021648 0.052453 0.033510
b 0.139056 0.018379 0.065019 0.022922
c 0.090550 0.016505 0.006749 0.007823
d 0.076542 0.005375 0.003923 0.009140
h 0.097057 0.016910 0.021070 0.009275
K 0.267461 0.020434 0.190670 0.057397
m 0.252351 0.040149 0.315562 0.031664
sigmaX 0.076175 0.014001 0.005930 0.005333
sigmaY 0.075390 0.015346 0.004970 0.011557

Without further analysis, one simple way of determining whether a parameter is unimportant is to check whether the confidence interval of its value overlaps 0 (i.e., subtract delta_conf from delta). For our particular results, this doesn’t seem to be the case for any of our delta values, though it does happen for some of the S1 values (c, d, sigmaY). You can refer to this post for discussion on what this might mean.
Looking at the delta values, we can clearly see two factors coming up top (K and m), followed by b, and a closely behind it. The rest of the parameters are reduced in their importance in small decrements after that. So where should we draw the line of importance? Another simple way is to use a threshold (say, 0.1) as a cutoff value [3], but one could argue over including a and not h, given how close their indices are and the wider confidence interval of a (see also the appendix below on this).

But, let’s continue with our analysis. What I am doing below is the following. First, I sort the factors from most to least important based on my results for the delta indices. Then, I create my Sets 2 and 3 on which I’ll be iteratively replacing the values of important factors with either those from Set 1 or with defaults. Finally, I loop through all possible numbers of important factors (1 to 9), generate Sets 2 and 3, calculate outputs for all samples in each, and calculate their correlation with the outputs from Set 1. (This should take 20-30 minutes to run on your personal computer.)

# Sort factors by importance
factors_sorted = np.argsort(results['delta'])[::-1]

# Set up DataFrame of default values to use for experiment
X_defaults = np.tile(defaultvalues,(nsamples, 1))

# Create initial Sets 2 and 3
X_Set2 = np.copy(X_defaults)
X_Set3 = np.copy(X_Set1)

for f in range(1, len(factors_sorted)+1):
    ntopfactors = f
    
    for i in range(ntopfactors): #Loop through all important factors
        X_Set2[:,factors_sorted[i]] = X_Set1[:,factors_sorted[i]] #Fix use samples for important
        X_Set3[:,factors_sorted[i]] = X_defaults[:,factors_sorted[i]] #Fix important to defaults
    
    # Run model for all samples    
    output_Set2 = [fish_game(*X_Set2[j,:]) for j in range(nsamples)]
    output_Set3 = [fish_game(*X_Set3[j,:]) for j in range(nsamples)]
    
    # Calculate coefficients of correlation
    coefficient_S1_S2 = np.corrcoef(output,output_Set2)[0][1]
    coefficient_S1_S3 = np.corrcoef(output,output_Set3)[0][1]

I can also plot the outputs from each iteration, which should look something like this (this is animated to show all figures, in the interest of space):

The figures above tell us the following:
If we choose one important factor (K) and fix all other parameters our outputs don’t really capture the variability of outcomes produced when considering all nine (this is also a case against one-at-a-time type analyses). The coefficient of correlation between Sets 1 and 2 is pretty low (0.44) suggesting we’re still missing important parameters. We’re doing a better job by actually fixing our most important parameter and varying all others (figure on the right, with R=0.763).
Adding the second most important factor (m), shifts things significantly to the right direction, by increasing our coefficient on the right and reducing the one on the left to R=0.203.
There is only a slight improvement with the addition of the third factor (b), but with the inclusion of the fourth (a), our reduced model is already looking very close to the full, with R=0.94. Our counter model excluding these four factors (on the right) also has a very low coefficient of R=0.025.
One could consider this performance sufficient, with the model reduced to four parameters instead of nine. Further adding parameter h and then c would further improve the values to a near perfect match between Set 2 and Set 1, but this is where subjectivity takes over, depending on the cost of adding these variables and how much we care about fidelity in this case.
It is also clear that it is likely safe to fix the last three parameters, as in this case they don’t have any consequential effects on our outcomes.

References:
[1] Saltelli, Andrea, et al.  Global Sensitivity Analysis: The Primer. (2008)
[2] T. H. Andres, “Sampling methods and sensitivity analysis for large parameter sets,” Journal of Statistical Computation and Simulation, vol. 57, no. 1–4, pp. 77–110, Apr. 1997, doi: 10.1080/00949659708811804.
[3] Y. Tang, P. Reed, T. Wagener, and K. van Werkhoven, “Comparing sensitivity analysis methods to advance lumped watershed model identification and evaluation,” Hydrology and Earth System Sciences, vol. 11, no. 2, pp. 793–817, Feb. 2007, doi: https://doi.org/10.5194/hess-11-793-2007.
[4] J. Nossent, P. Elsen, and W. Bauwens, “Sobol’ sensitivity analysis of a complex environmental model,” Environmental Modelling & Software, vol. 26, no. 12, pp. 1515–1525, Dec. 2011, doi: 10.1016/j.envsoft.2011.08.010.


Appendix:
Another way to identify a threshold of importance to classify parameters, is to add a dummy parameter to your model, that does nothing. Reperforming my SA for this same system including the dummy, produces this:

Parameter delta delta_conf S1 S1_conf
a 0.105354 0.019236 0.040665 0.020949
b 0.144955 0.023576 0.050471 0.014810
c 0.075516 0.009578 0.003889 0.006113
d 0.081177 0.011604 0.004186 0.007235
h 0.101583 0.010008 0.032759 0.021343
K 0.261329 0.022876 0.174340 0.038246
m 0.258345 0.024750 0.325690 0.052234
sigmaX 0.071862 0.008620 0.001681 0.006720
sigmaY 0.077337 0.009344 0.003131 0.006918
dummy 0.072546 0.008313 0.004176 0.009567

Even though the dummy does absolutely nothing in our model, it was still given a non-zero delta index by the analysis (0.07). One could use this as the cutoff value of non-importance and choose to fix parameters c, sigmaX, and sigmaY.

Magnitude-varying sensitivity analysis and visualization (Part 2)

In my last post, I talked about producing these flow-duration-curve-type figures for an output time-series one might be interested in, and talked about their potential use in an exploratory approach for the purpose of robust decision making. Again, the codes to perform the analysis and visualization are in this Github repository.

experiment_data_range

Fig. 1: Historical data vs. range of experiment outputs

As already discussed, there are multiple benefits for visualizing the output in such manner: we are often concerned with the levels and frequencies of extremes when making decisions about systems (e.g. “how bad is the worst case?”, “how rare is the worst case?”), or we might like to know how often we exceed a certain threshold (e.g. “how many years exceed an annual shortage of 1000 af?“). The various percentiles tell a different part of the story of how a system operates, the 5th percentile tells as that its level is exceeded 95% of the time, the 99th tells as that its level is only reached once in every 100 years in our records. These might seem obvious to the readers of this blog, but often times we perform our analyses for only some of these percentiles, “the worst event”, “the average”, etc., which is certainly very informative, but can potentially miss part of the bigger picture.

In this post I’m going to walk the reader through performing a sensitivity analysis using the output of an experiment using multiple Latin Hypercube Samples. The analysis will be magnitude-varying, i.e., it will be performed at different magnitudes of our output of interest. For this particular example, we aim to see what are the most significant drivers of shortage at the different levels it’s experienced by this user. In other words, if some factors appear to be driving the frequent small shortages experienced, are those factors the same for the rare large shortages?

To perform the sensitivity analysis, I am going to use SALib (featured in this blog multiple times already), to perform a Delta Moment-Independent Analysis [1] (also produces a first order Sobol sensitivity index [2]). You’ll probably need to install SALib if it’s not a package you’ve used already. I’m also going to use statsmodels, to perform a simple linear regression on the outputs and look at their R2 values. But, why, you might ask, perform not one, not two, but three sensitivity analyses for this? There are nuanced, yet potentially important differences between what the three methods capture:

Delta method: Look for parameters most significantly affecting the density function of observed shortages. This method is moment-independent, i.e., it looks at differences in the entire distribution of the output we’re interested in.
First order Sobol (S1): Look for parameters that most significantly affect the variance of observed outputs, including non-linear effects.
R2: Look for parameters best able to describe the variance of observed outputs, limited to linear effects.

Another important thing to note is that using the First order Sobol index, the total variance resulting from the parameters should equal 1. This means that if we sum up the S1’s we get from our analysis, the sum represents the variance described by the first order effects of our parameters, leaving whatever is left to interactions between our variables (that S1 cannot capture). The same holds using R2, as we are repeatedly fitting our parameters and scoring them on how much of the output variance they describe as a sole linear predictor (with no interactions or other relationships).

The following Python script will produce all three as well as confidence intervals for the Delta index and S1. The script essentially loops through all percentiles in the time-series and performs the two analyses for each one. In other words, we’re are looking at how sensitive each magnitude percentile is to each of the sampled parameters.

import numpy as np
import pandas as pd
import statsmodels.api as sm
from SALib.analyze import delta
# Load parameter samples
LHsamples = np.loadtxt('./LHsamples.txt')
params_no = len(LHsamples[0,:])
param_bounds=np.loadtxt('./uncertain_params.txt', usecols=(1,2))
# Parameter names
param_names=['IWRmultiplier','RESloss','TBDmultiplier','M_Imultiplier',
'Shoshone','ENVflows','EVAdelta','XBM_mu0','XBM_sigma0',
'XBM_mu1','XBM_sigma1','XBM_p00','XBM_p11']
# Define problem class
problem = {
'num_vars': params_no,
'names': param_names,
'bounds': param_bounds.tolist()
}
# Percentiles for analysis to loop over
percentiles = np.arange(0,100)
# Function to fit regression with Ordinary Least Squares using statsmodels
def fitOLS(dta, predictors):
# concatenate intercept column of 1s
dta['Intercept'] = np.ones(np.shape(dta)[0])
# get columns of predictors
cols = dta.columns.tolist()[-1:] + predictors
#fit OLS regression
ols = sm.OLS(dta['Shortage'], dta[cols])
result = ols.fit()
return result
# Create empty dataframes to store results
DELTA = pd.DataFrame(np.zeros((params_no, len(percentiles))), columns = percentiles)
DELTA_conf = pd.DataFrame(np.zeros((params_no, len(percentiles))), columns = percentiles)
S1 = pd.DataFrame(np.zeros((params_no, len(percentiles))), columns = percentiles)
S1_conf = pd.DataFrame(np.zeros((params_no, len(percentiles))), columns = percentiles)
R2_scores = pd.DataFrame(np.zeros((params_no, len(percentiles))), columns = percentiles)
DELTA.index=DELTA_conf.index=S1.index=S1_conf.index = R2_scores.index = param_names
# Read in experiment data
expData = np.loadtxt('./experiment_data.txt')
# Identify magnitude at each percentiles
syn_magnitude = np.zeros([len(percentiles),len(LHsamples[:,0])])
for j in range(len(LHsamples[:,0])):
syn_magnitude[:,j]=[np.percentile(expData[:,j], i) for i in percentiles]
# Delta Method analysis
for i in range(len(percentiles)):
if syn_magnitude[i,:].any():
try:
result= delta.analyze(problem, LHsamples, syn_magnitude[i,:], print_to_console=False, num_resamples=2)
DELTA[percentiles[i]]= result['delta']
DELTA_conf[percentiles[i]] = result['delta_conf']
S1[percentiles[i]]=result['S1']
S1_conf[percentiles[i]]=result['S1_conf']
except:
pass
S1.to_csv('./S1_scores.csv')
S1_conf.to_csv('./S1_conf_scores.csv')
DELTA.to_csv('./DELTA_scores.csv')
DELTA_conf.to_csv('./DELTA_conf_scores.csv')
# OLS regression analysis
dta = pd.DataFrame(data = LHsamples, columns=param_names)
# fig = plt.figure()
for i in range(len(percentiles)):
shortage = np.zeros(len(LHsamples[:,0]))
for k in range(len(LHsamples[:,0])):
shortage[k]=syn_magnitude[i,k]
dta['Shortage']=shortage
for m in range(params_no):
predictors = dta.columns.tolist()[m😦m+1)]
result = fitOLS(dta, predictors)
R2_scores.at[param_names[m],percentiles[i]]=result.rsquared
R2_scores.to_csv('./R2_scores.csv')

The script produces the sensitivity analysis indices for each magnitude percentile and stores them as .csv files.

I will now present a way of visualizing these outputs, using the curves from Fig. 1 as context.  The code below reads in the values for each sensitivity index, normalizes them to the range of magnitude at each percentile, and then plots them using matplotlib’s stackplot fuction, which stacks the contribution of each parameter to the sum (in this case the maximum of the resulting range)

I’ll go through what the code does in more detail:

First, we take the range boundaries (globalmax and globalmin) which give us the max and min values for each percentile. We then read in the values for each sensitivity index and normalize them to that range (i.e. globalmaxglobalmin for each percentile). The script also adds two more arrays (rows in the pandas dataframe), one representing interaction and one representing the globalmin, upon which we’re going to stack the rest of the values. [Note: This is a bit of a roundabout way of getting the figures how we like them, but it’s essentially creating a pseudo-stack for the globalmin, that we’re plotting in white.] 

The interaction array is only used when normalizing the S1 and R2 values, where we attribute to it the difference between 1 and the sum of the calculated indices (i.e. we’re attributing the rest to interaction between the parameters). We don’t need to do this for the delta method indices (if you run the code the array remains empty), but the reason I had to put it there was to make it simpler to create labels and a single legend later.

The plotting simply creates three subplots and for each one uses stackplot to plot the normalized values and then the edges in black. It is important to note that the colorblocks in each figure do not represent the volume of shortage attributed to each parameter at each percentile, but rather the contribution of each parameter to the change in the metric, namely, the density distribution (Delta Method), and the variance (S1 and R2). The code for this visualization is provided at the bottom of the post.

experiment_sensitivity_curves.png

Fig. 2: Magnitude sensitivity curves using three sensitivity indeces

The first thing that pops out from this figure is the large blob of peach, which represents the irrigation demand multiplier in our experiment. The user of interest here was an irrigation user, which would suggest that their shortages are primarily driven by increases in their own demands and of other irrigation users. This is important, because irrigation demand is an uncertainty for which we could potentially have direct or indirect control over, e.g. through conservation efforts.

Looking at the other factors, performing the analysis in a magnitude-varying manner, allowed us to explore the vulnerabilities of this metric across its different levels. For example, dark blue and dark green represent the mean flow of dry and wet years, respectively. Across the three figures we can see that the contribution of mean wet-year flow is larger in the low-magnitude percentiles (left hand side) and diminishes as we move towards the larger-magnitude percentiles.

Another thing that I thought was interesting to note was the difference between the S1 and the R2 plots. They are both variance-based metrics, with R2 limited to linear effects in this case. In this particular case, the plots are fairly similar which would suggest that a lot of the parameter effects on the output variance are linear. Larger differences between the two would point to non-linearities between changes in parameter values and the output.

The code to produce Fig. 2:

# Percentiles for analysis to loop over
percentiles = np.arange(0,100)
# Estimate upper and lower bounds
globalmax = [np.percentile(np.max(expData_sort[:,:],1),p) for p in percentiles]
globalmin = [np.percentile(np.min(expData_sort[:,:],1),p) for p in percentiles]
delta_values = pd.read_csv('./DELTA_scores.csv')
delta_values.set_index(list(delta_values)[0],inplace=True)
delta_values = delta_values.clip(lower=0)
bottom_row = pd.DataFrame(data=np.array([np.zeros(100)]), index= ['Interaction'], columns=list(delta_values.columns.values))
top_row = pd.DataFrame(data=np.array([globalmin]), index= ['Min'], columns=list(delta_values.columns.values))
delta_values = pd.concat([top_row,delta_values.loc[:],bottom_row])
for p in range(len(percentiles)):
total = np.sum(delta_values[str(percentiles[p])])-delta_values.at['Min',str(percentiles[p])]
if total!=0:
for param in param_names:
value = (globalmax[p]-globalmin[p])*delta_values.at[param,str(percentiles[p])]/total
delta_values.set_value(param,str(percentiles[p]),value)
delta_values = delta_values.round(decimals = 2)
delta_values_to_plot = delta_values.values.tolist()
S1_values = pd.read_csv('./S1_scores.csv')
S1_values.set_index(list(S1_values)[0],inplace=True)
S1_values = S1_values.clip(lower=0)
bottom_row = pd.DataFrame(data=np.array([np.zeros(100)]), index= ['Interaction'], columns=list(S1_values.columns.values))
top_row = pd.DataFrame(data=np.array([globalmin]), index= ['Min'], columns=list(S1_values.columns.values))
S1_values = pd.concat([top_row,S1_values.loc[:],bottom_row])
for p in range(len(percentiles)):
total = np.sum(S1_values[str(percentiles[p])])-S1_values.at['Min',str(percentiles[p])]
if total!=0:
diff = 1-total
S1_values.set_value('Interaction',str(percentiles[p]),diff)
for param in param_names+['Interaction']:
value = (globalmax[p]-globalmin[p])*S1_values.at[param,str(percentiles[p])]
S1_values.set_value(param,str(percentiles[p]),value)
S1_values = S1_values.round(decimals = 2)
S1_values_to_plot = S1_values.values.tolist()
R2_values = pd.read_csv('./R2_scores.csv')
R2_values.set_index(list(R2_values)[0],inplace=True)
R2_values = R2_values.clip(lower=0)
bottom_row = pd.DataFrame(data=np.array([np.zeros(100)]), index= ['Interaction'], columns=list(R2_values.columns.values))
top_row = pd.DataFrame(data=np.array([globalmin]), index= ['Min'], columns=list(R2_values.columns.values))
R2_values = pd.concat([top_row,R2_values.loc[:],bottom_row])
for p in range(len(percentiles)):
total = np.sum(R2_values[str(percentiles[p])])-R2_values.at['Min',str(percentiles[p])]
if total!=0:
diff = 1-total
R2_values.set_value('Interaction',str(percentiles[p]),diff)
for param in param_names+['Interaction']:
value = (globalmax[p]-globalmin[p])*R2_values.at[param,str(percentiles[p])]
R2_values.set_value(param,str(percentiles[p]),value)
R2_values = R2_values.round(decimals = 2)
R2_values_to_plot = R2_values.values.tolist()
color_list = ["white", "#F18670", "#E24D3F", "#CF233E", "#681E33", "#676572", "#F3BE22", "#59DEBA", "#14015C", "#DAF8A3", "#0B7A0A", "#F8FFA2", "#578DC0", "#4E4AD8", "#F77632"]
fig, (ax1, ax2, ax3) = plt.subplots(1,3, figsize=(14.5,8))
ax1.stackplot(percentiles, delta_values_to_plot, colors = color_list, labels=parameter_names_long)
l1 = ax1.plot(percentiles, globalmax, color='black', linewidth=2)
l2 = ax1.plot(percentiles, globalmin, color='black', linewidth=2)
ax1.set_title("Delta index")
ax1.set_xlim(0,100)
ax2.stackplot(np.arange(0,100), S1_values_to_plot, colors = color_list, labels=parameter_names_long)
ax2.plot(percentiles, globalmax, color='black', linewidth=2)
ax2.plot(percentiles, globalmin, color='black', linewidth=2)
ax2.set_title("S1")
ax2.set_xlim(0,100)
ax3.stackplot(np.arange(0,100), R2_values_to_plot, colors = color_list, labels=parameter_names_long)
ax3.plot(percentiles, globalmax, color='black', linewidth=2)
ax3.plot(percentiles, globalmin, color='black', linewidth=2)
ax3.set_title("R^2")
ax3.set_xlim(0,100)
handles, labels = ax3.get_legend_handles_labels()
ax1.set_ylabel('Annual shortage (af)', fontsize=12)
ax2.set_xlabel('Shortage magnitude percentile', fontsize=12)
ax1.legend((l1), ('Global ensemble',), fontsize=10, loc='upper left')
fig.legend(handles[1:], labels[1:], fontsize=10, loc='lower center',ncol = 5)
plt.subplots_adjust(bottom=0.2)
fig.savefig('./experiment_sensitivity_curves.png')

References:

[1]: Borgonovo, E. “A New Uncertainty Importance Measure.” Reliability Engineering & System Safety 92, no. 6 (June 1, 2007): 771–84. https://doi.org/10.1016/j.ress.2006.04.015.

[2]: Sobol, I. M. (2001). “Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates.” Mathematics and Computers in Simulation, 55(1-3):271-280, doi:10.1016/S0378-4754(00)00270-6.

Some ideas for your Bash submission scripts

I’ve been playing around with some design options for PBS submission scripts that may help people doing cluster work.  Some things to look for in the source code:

  • You can use a list in bash that contains multiple text entries, and then access those text entries to create strings for your submissions.  Note that you can actually display the text first (see the ‘echo ${PBS}’) before you do anything; that way you aren’t requesting thousands of jobs that have a typo in them!
  • Using “read” allows the bash programmer to interact with the user.  Well, in reality you are usually both the programmer and the user.  But lots of times, I want to write a script and try it out first, before I submit hundreds of hours of time on the cluster.  The flags below can help with that process.
  • I added commands to compile the source code before actually submitting the jobs.  Plus, by using flags and pauses intelligently, you can bail out of the script if there’s a problem with compilation.
#!/bin/bash
NODES=32
WALLHOURS=5

PROBLEMS=("ProblemA" "ProblemB")
NSEEDS=10
SEEDS=$(seq 1 ${NSEEDS}) #note there are multiple ways to declare lists and sequences in bash

NFES=1000000
echo "NFEs is ${NFES}" #echo statements can improve usability of the script, especially if you're modifying it a lot for various trials

ASSUMEPERMISSIONFLAG=No #This is for pausing the submission script later

echo "Compile? Y or N."

read COMPILEFLAG
if [ "$COMPILEFLAG" = "Y" ]; then
    echo "Cleaning.."
    make clean -f MakefileParallel
    echo "Compiling.."
    make -f MakefileParallel
else
        echo "Not compiling."
fi

for PROBINDEX in ${!PROBLEMS[*]}
do
    PROBLEM=${PROBLEMS[$PROBINDEX]} #note the syntax to pull a list member out here
    echo "Problem is ${PROBLEM}"

    for SEED in ${SEEDS}
    do
        NAME=${PROBLEM}_${SEED} #Bash is really nice for manipulating strings like this
        echo "Submitting: ${NAME}"

        #Here is the actual PBS command, with bash variables used in place of different experimental parameters.  Note the use of getopt-style command line parsing to pass different arguments into the myProgram executable.  This implementation is also designed for parallel processing, but it can also be used for serial jobs too.

        PBS="#PBS -l nodes=32\n\
        #PBS -N ${NAME}\n\
        #PBS -l walltime=05:00:00\n\
        #PBS -j oe\n\
        #PBS -o ${NAME}.out\n\
        cd \$PBS_O_WORKDIR\n\
        module load openmpi/intel\n\
        mpirun ./myProgram -b ${PROBLEM} -c combined -f ${NFES} -s ${SEED}"

        #The first echo shows the user what is about to be passed to PBS.  The second echo then pipes it to the command qsub, and actually submits the job.

        echo ${PBS}

        if [ "$ASSUMEPERMISSIONFLAG" = "No" ]; then

            echo "Continue submitting? Y or N."

            read SUBMITFLAG

            #Here, the code is designed to just keep going after the user says Y once.  You can redesign this for your own purposes.  Also note that this code is fairly brittle in that the user MUST say Y, not y or yes.  You can build that functionality into the if statements if you'd like it.

            if [ "$SUBMITFLAG" = "Y" ]; then
                 ASSUMEPERMISSIONFLAG=Yes #this way, the user won't be asked again
                 echo -e ${PBS} | qsub
                 sleep 0.5
                 echo "done."
            fi
        else
            echo -e ${PBS} | qsub
            sleep 0.5
            echo "done."
         fi
    done
done