In this post, we will break down the key concepts underlying synthetic streamflow generation, and how it fits within the Many Objective Robust Decision Making (MORDM) framework (Kasprzyk, Nataraj et. al, 2012). This post is the first in a series on MORDM which will begin here: with generating and validating the data used in the framework. To provide some context as to what we are about to attempt, please refer to this post by Jon Herman.
What is synthetic streamflow generation?
Synthetic streamflow generation is a non-parametric, direct statistical approach used to generate synthetic streamflow timeseries from a reasonably long historical record. It is used when there is a need to diversify extreme event scenarios, such as flood and drought, or when we want to generate flows to reflect a shift in the hydrologic regime due to climate change. It is favored as it relies on a re-sampling of the historical record, preserving temporal correlation up to a certain degree, and results in a more realistic synthetic dataset. However, its dependence on a historical record also implies that this approach requires a relatively long historical inflow data. Jon Lamontagne’s post goes into further detail regarding this approach.
Why synthetic streamflow generation?
An important step in the MORDM framework is scenario discovery, which requires multiple realistic scenarios to predict future states of the world (Kasprzyk et. al., 2012). Depending solely on the historical dataset is insufficient; we need to generate multiple realizations of realistic synthetic scenarios to facilitate a comprehensive scenario discovery process. As an approach that uses a long historical record to generate synthetic data that has been found to preserve seasonal and annual correlation (Kirsch et. al., 2013; Herman et. al., 2016), this method provides us with a way to:
- Fully utilize a large historical dataset
- Stochastically generate multiple synthetic datasets while preserving temporal correlation
- Explore many alternative climate scenarios by changing the mean and the spread of the synthetic datasets
The basics of synthetic streamflow generation in action
To better illustrate the inner workings of synthetic streamflow generation, it is helpful to use a test case. In this post, the historical dataset is obtained from the Research Triangle Region in North Carolina. The Research Triangle region consists of four main utilities: Raleigh, Durham, Cary and the Orange County Water and Sewer Authority (OWASA). These utilities are receive their water supplies from four water sources: the Little River Reservoir, Lake Wheeler, Lake Benson, and the Jordan Lake (Figure 1), and historical streamflow data is obtained from ten different stream gauges located at each of these water sources. For the purpose of this example, we will be using 81 years’ worth of weekly streamflow data available here.
The statistical approach that drive synthetic streamflow generation is called the Kirsch Method (Kirsch et. al., 2013). In plain language, this method does the following:
- Converts the historical streamflows from real space to log space, and then standardize the log-space data.
- Bootstrap the log-space historical matrix to obtain an uncorrelated matrix of historical data.
- Obtain the correlation matrix of the historical dataset by performing Cholesky decomposition.
- Impose the historical correlation matrix upon the uncorrelated matrix obtained in (2) to generate a standardized synthetic dataset. This preserves seasonal correlation.
- De-standardize the synthetic data, and transform it back into real space.
- Repeat steps (1) to (5) with a historical dataset that is shifted forward by 6 months (26 weeks). This preserves year-to-year correlation.
This post by Julie Quinn delves deeper into the Kirsch Method’s theoretical steps. The function that executes these steps can be found in the
stress_dynamic.m Matlab file, which in turn is executed by the
wsc_main_rate.m file by setting the input variable p = 0 as shown on Line 27. Both these files are available on GitHub here.
However, this is simply where things get interesting. Prior to this, steps (1) to (6) would have simply generated a synthetic dataset based on only historical statistical characteristics as validated here in Julie’s second blog post on a similar topic. Out of the three motivations for using synthetic streamflow generation, the third one (exploration of multiple scenarios) has yet to be satisfied. This is a nice segue into out next topic:
Generating multiple scenarios using synthetic streamflow generation
The true power of synthetic streamflow generation lies in its ability to generate multiple climate (or in this case, streamflow) scenarios. This is done in
stress_dynamic.m using three variables:
|Input variable||Data type|
|p||The lowest x% of streamflows|
|n||A vector where each element ni is the number of copies of the p-lowest streamflow years to be added to the bootstrapped historical dataset.|
|m||A vector where each element mi is the number of copies of the (1-p)-highest streamflow years to be added to the bootstrapped historical dataset.|
These three variables bootstrap (increase the length of) the historical record while allow us to perturb the historical streamflow record streamflows to reflect an increase in frequency or severity of extreme events such as floods and droughts using the following equation:
new_hist_years = old_historical_years + [(p*old_historical_years)*ni ] + (old_hist_years – [(p*old_historical_years)mi])
stress_dynamic.m file contains more explanation regarding this step.
This begs the question: how do we choose the value of p? This brings us to the topic of the standardized streamflow indicator (SSI6).
The SSI6 is the 6-month moving average of the standardized streamflows to determine the occurrence and severity of drought on the basis of duration and frequency (Herman et. al., 2016). Put simply, this method determines the occurrence of drought if the the value of the SSI6 < 0 continuously for at least 3 months, and SSI6 < -1 at least once during the 6-month interval. The periods and severity (or lack thereof) of drought can then be observed, enabling the decision on the length of both the n and m vectors (which correspond to the number of perturbation periods, or climate event periods). We will not go into further detail regarding this method, but there are two important points to be made:
- The SSI6 enables the determination of the frequency (likelihood) and severity of drought events in synthetic streamflow generation through the values contained in p, n and m.
- This approach can be used to generate flood events by exchanging the values between the n and m vectors.
A good example of point (2) is done in this test case, in which more-frequent and more-severe floods was simulated by ensuring that most of the values in m where larger than those of n. Please refer to Jon Herman’s 2016 paper titled ‘Synthetic drought scenario generation to support bottom-up water supply vulnerability assessments’ for further detail.
A brief conceptual letup
Now we have shown how synthetic streamflow generation satisfies all three factors motivating its use. We should have three output folders:
- synthetic-data-stat: contains the synthetic streamflows based on the unperturbed historical dataset
- synthetic-data-dyn: contains the synthetic streamflows based on the perturbed historical dataset
Comparing these two datasets, we can compare how increasing the likelihood and severity of floods has affected the resulting synthetic data.
To exhaustively compare the statistical characteristics of the synthetic streamflow data, we will perform two forms of validation: visual and statistical. This method of validation is based on Julie’s post here.
Done by generating flow duration curves (FDCs) . Figure 2 below compares the unperturbed (left) and perturbed (right) synthetic datasets.
The bottom plots in Figure 2 shows an increase in the volume of weekly flows, as well as an smaller return period, when the the historical streamflows were perturbed to reflect an increasing frequency and magnitude of flood events. Together with the upper plots in Figure 2, this visually demonstrates that the synthetic streamflow generation approach (1) faithfully reconstructs historical streamflow patterns, (2) increases the range of possible streamflow scenarios and (3) can model multiple extreme climate event scenarios by perturbing the historical dataset. The file to generate this Figure can be found in the
The mean and standard deviation of the perturbed and unperturbed historical datasets are compared to show if the perturbation resulted in significant changes in the synthetic datasets. Ideally, the perturbed synthetic data would have higher means and similar standard deviations compared to the unperturbed synthetic data.
The mean and tails of the synthetic streamflow values of the bottom plots in Figure 3 show that the mean and maximum values of the synthetic flows are significantly higher than the unperturbed values. In addition, the spread of the standard deviations of the perturbed synthetic streamflows are similar to that of its unperturbed counterpart. This proves that synthetic streamflow generation can be used to synthetically change the occurrence and magnitude of extreme events while maintaining the periodicity and spread of the data. The file to generate Figure 3 can be found in
Synthetic streamflow generation and internal variability
The generation of multiple unperturbed realizations of synthetic streamflow is vital for characterizing the internal variability of a system., otherwise known as variability that arises from natural variations in the system (Lehner et. al., 2020). As internal variability is intrinsic to the system, its effects cannot be eliminated – but it can be moderated. By evaluating multiple realizations, we can determine the number of realizations at which the internal variability (quantified here by standard deviation as a function of the number of realizations) stabilizes. Using the synthetic streamflow data for the Jordan Lake, it is shown that more than 100 realizations are required for the standard deviation of the 25% highest streamflows across all years to stabilize (Figure 4). Knowing this, we can generate sufficient synthetic realizations to render the effects of internal variability insignificant.
internal-variability.py contains the code to generate the above figure.
How does this all fit within the context of MORDM?
So far, we have generated synthetic streamflow datasets and validated them. But how are these datasets used in the context of MORDM?
Synthetic streamflow generation lies within the domain of the second part of the MORDM framework as shown in Figure 5 above. Specifically, synthetic streamflow generation plays an important role in the design of experiments by preserving the effects of deeply uncertain factors that cause natural events. As MORDM requires multiple scenarios to reliably evaluate all possible futures, this approach enables the simulation of multiple scenarios, while concurrently increasing the severity or frequency of extreme events in increments set by the user. This will allow us to evaluate how coupled human-natural systems change over time given different scenarios, and their consequences towards the robustness of the system being evaluated (in this case, the Research Triangle).
Typically, this evaluation is performed in two main steps:
- Generation and evaluation of multiple realizations of unperturbed annual synthetic streamflow. The resulting synthetic data is used to generate the Pareto optimal set of policies. This step can help us understand how the system’s internal variability affects future decision-making by comparing it with the results in step (2).
- Generation and evaluation of multiple realizations of perturbed annual synthetic streamflow. These are the more extreme scenarios in which the previously-found Pareto-optimal policies will be evaluated against. This step assesses the robustness of the base state under deeply uncertain deviations caused by the perturbations in the synthetic data and other deeply uncertain factors.
Overall, synthetic streamflow generation is an approach that is highly applicable in the bottom-up analysis of a system. It preserves historical characteristics of a streamflow timeseries while providing the flexibility to modify the severity and frequency of extreme events in the face of climate change. It also allows the generation of multiple realizations, aiding in the characterization and understanding of a system’s internal variability, and a more exhaustive scenario discovery process.
This summarizes the basics of data generation for MORDM. In my next blog post, I will introduce risk-of-failure (ROF) triggers, their background, key concepts, and how they are applied within the MORDM framework.
Herman, J. D., Reed, P. M., Zeff, H. B., & Characklis, G. W. (2015). How should robustness be defined for water systems planning under change? Journal of Water Resources Planning and Management, 141(10), 04015012. doi:10.1061/(asce)wr.1943-5452.0000509
Herman, J. D., Zeff, H. B., Lamontagne, J. R., Reed, P. M., & Characklis, G. W. (2016). Synthetic drought scenario generation to support bottom-up water supply vulnerability assessments. Journal of Water Resources Planning and Management, 142(11), 04016050. doi:10.1061/(asce)wr.1943-5452.0000701
Kasprzyk, J. R., Nataraj, S., Reed, P. M., & Lempert, R. J. (2013). Many objective robust decision making for complex environmental systems undergoing change. Environmental Modelling & Software, 42, 55-71. doi:10.1016/j.envsoft.2012.12.007
Kirsch, B. R., Characklis, G. W., & Zeff, H. B. (2013). Evaluating the impact of alternative hydro-climate scenarios on transfer agreements: Practical improvement for generating synthetic streamflows. Journal of Water Resources Planning and Management, 139(4), 396-406. doi:10.1061/(asce)wr.1943-5452.0000287
Mankin, J. S., Lehner, F., Coats, S., & McKinnon, K. A. (2020). The value of initial condition large ensembles to Robust Adaptation Decision‐Making. Earth’s Future, 8(10). doi:10.1029/2020ef001610
Trindade, B., Reed, P., Herman, J., Zeff, H., & Characklis, G. (2017). Reducing regional drought vulnerabilities and multi-city robustness conflicts using many-objective optimization under deep uncertainty. Advances in Water Resources, 104, 195-209. doi:10.1016/j.advwatres.2017.03.023