Introducing the R Package “biascorrection”

For variety of reasons, we need hydrological models for our short- and long-term predictions and planning.  However, it is no secret that these models always suffer from some degree of bias. This bias can stem from many different and often interacting sources. Some examples are biases in underlying model assumptions, missing processes, model parameters, calibration parameters, and imperfections in input data (Beven and Binley, 1992).

The question of how to use models, given all these uncertainties, has been an active area of research for at least 50 years and will probably remain so for the foreseeable future, but going through that is not the focus of this blog post.

In this post, I explain a technique called bias correction that is frequently used in an attempt to improve model predictions. I also introduce an R package for bias correction that I recently developed; the package is called “biascorrection.” Although most of the examples in this post are about hydrological models, the arguments and the R package might be useful in other disciplines, for example with atmospheric models that have been one of the hotspots of bias correction applications (for example, here, here and here). The reason is that the algorithm follows a series of simple mathematical procedures that can be applied to other questions and research areas.

What Do I Mean by Bias Correction?

Bias correction in the context of hydrological modeling usually means “manual” improvement of model simulations to ensure that they match the observed data. Here’s a simple example: Let’s say we know that our model tends to underestimate summer flow by 30%. Then we might reason that to improve our predictions, we could add 30% to our simulated summer flows. In this blog post, I use a more sophisticated method, but this example should give you an idea of what I’m talking about.

Bias Correction using Quantile Mapping

Quantile mapping is a popular bias correction method that has been used in various applications. In this method, we first create quantiles of observed and simulated data. After that, whenever we have a simulated value we can find its simulated quantile and replace it with the value of the closest observed quantile.

Figure 1 – A simplified workflow of the quantile mapping technique

We generally follow the following steps to do the bias correction (see here):

  • Monthly Quantiles

We need to have two monthly time series of observed and simulated streamflow. If both series use daily time steps, we must aggregate the daily values to monthly values first, to create the monthly quantiles. We then sort the observed and simulated streamflows and assign each value to a quantile. This process generates streamflow quantiles for each month (January, February, etc.).

  • Monthly Bias Correction

When the quantiles are ready, we can start from the first month of the simulated results and determine what quantile its values belong to. Then we can go to the same quantile of the observed values and use it instead of the simulated one. This creates a monthly bias-corrected stream.

  • Annual Adjustment

Hamlet and Lettenmaier 1999, (also here) argue that the monthly bias correction can dramatically change the magnitude of annual streamflow predictions. However, although hydrologic models usually perform poorly at monthly time steps, they are pretty good at capturing annual variations. Therefore, we tend to rely on them. We calculate the annual difference between the bias-corrected and simulated flows and then we apply that to each individual month. This way, we can make sure that while the monthly variations are consistent with the recorded streamflow, our model is still able to determine how the average annual flow looks.

  • Disaggregation to Daily

After we apply the annual adjustments, we can use simulated or historical observed values to disaggregate the monthly time series to a daily one. The “biascorrection” package provides two methods for doing that: (1) Rescaling the raw simulated daily time series to match the monthly bias-corrected values. For example, if the total simulated streamflow for a month is half of the bias-corrected values for that month, the disaggregation module multiplies the raw daily simulated values by two for that specific month. (2) Sampling from the daily observed historical times series. In this case, the model uses KNN to find some of the closest months in the historical period (in terms of average monthly values) and randomly selects one of them.

In some situations, in large river basins, several upstream stations contribute to a downstream station. Bias correction can add or remove water from the system, and that can cause spatial inconsistencies and water budget problems. To ensure that the basin-wide water balance isn’t violated in these cases, we start from the station furthest downstream station and move upward to make sure that the total water generated upstream of each station and the incremental flow between the stations sum up to the same total downstream flow. This is not included in our model.

In some situations, in large river basins, several upstream stations contribute to a downstream station. Bias correction can add or remove water from the system, and that can cause spatial inconsistencies and water budget problems. To ensure that the basin-wide water balance isn’t violated in these cases, we start from the station furthest downstream station and move upward to make sure that the total water generated upstream of each station and the incremental flow between the stations sum up to the same total downstream flow. This is not included in our model.

In climate change studies, if the historical, simulated, and observed quantiles do not differ widely from the projected future scenarios, we can use the historical quantiles. However, if the future values are fundamentally different from the historical time series, this might not be justifiable. In such a case, synthetic generation of future data may be the way to go. The R package doesn’t include this either.

There are just two quick disclaimers:

  1. The R package has been tested and seems to perform well, but its complete soundness is not guaranteed.
  2. This blog post and the R package only introduce this bias correction as a common practice; I do not endorse it as a remedy for all the problems of hydrological models. Readers should keep in mind that there are serious criticisms of bias correction (for example here). I will discuss some in the following sections.

Arguments Against Bias Correction

One of the main advantages of hydrological models is that they can simultaneously and, arguably, consistently simulate different water and energy balance processes. However, bias correction manually perturbs streamflow without taking into account its effects on other components of the system. Therefore, it takes away one of the main advantages of hydrological models. In some cases, this can even distort climate change signals.

The other problem is that bias correction tries only to match the overall, aggregate statistics of the simulated flow with the observed flow, although streamflow has many more attributes. For example, current bias-correction algorithms can systematically ignore extremes that occur in daily to weekly time steps.

The “biascorrection” Package

I recently developed an R package that can be used for bias correction in simulated streamflow. The package has four main functions. Its workflow is consistent with the four bias-correction steps described above: (1) quantile creation, (2) monthly quantile mapping, (3) annual adjustment, (4) disaggregation to daily. The package doesn’t have a prescribed unit, and it can be used in different applications that require bias correction.

How to Install “biascorrection” Package

The package is available on GitHub (here) and it can be installed using the following command:


You can also go to my “blog” folder on GitHub to download simulated and observed datasets of streamflow at inlet of the Arrow dam in British Columbia (AKA Keenleyside Dam). The simulated flow has been generated using the Variable Infiltration Capacity model (VIC), and the observed flow is from Bonneville Power Administration’s No Regulation-No Irrigation streamflow datasets.

observed_input<-read.table("sample_data/Arrow_observed.txt", header = T)
simulated_input<-read.table("sample_data/Arrow_simulated.txt", header = T)

Note that the two datasets have different starting and ending dates. I intentionally used them to show how biascorrection package handles these types of datasets.

However, I plotted the overlapping period of the two datasets to demonstrate the difference between them. The simulated data set tends to underestimate streamflow during low-flow periods and overestimate during the high-flow periods.

Figure 2 – Observed vs simulated inflow to Arrow dam

Monthly Bias-Corrections

To run the monthly bias-correction function, first, you need to define the following starting and ending dates of your observed and simulated data frames:


You also need to define the following two conditions:

date_type<-"JY" ## Water year (WY) or Julian Year (JY)

Finally, we can use the following to calculate the monthly bias corrected flow:


df_bc_month<-bias_correct_monthly(observed_flow, simulatred_flow, start_date_observed, end_date_observed, start_date_simulated, end_date_simulated, time_step, date_type)

Note that the format of observed and simulated inputs to the “bias_correct_monthly” function are not data frames. Here is how bias correction changes the simulated streamflow. As you see in the followin figure the bias-corrected flow doesn’t seem to have the underestimation problem during the low-flow anymore.

Figure 3- Monthly VIC simulated flow vs. bias-corrected flow

Annual Adjustment

You can use the following command to return the average annual flow back to what model originally simulated. Note that the inputs to the function is output of the monthly function.

df_bc_annual<-bias_correct_annual(df_bc_month$simulated, df_bc_month$bias_corrected, start_date_simulated, end_date_simulated)

Daily Disaggregation

The following function first uses the monthly function, then applies annual adjustment, and finally disaggregate the monthly streamflow to daily. The package has two options to convert monthly to daily: 1- it can multiply the simulated streamflow of each month by its bias-correction coefficient 2- it can use the K-nearest Neighbors method to find the closest month in the observed record

disaggregation_method<-"scaling_coeff" # "scaling_coeff" or "knn"

df_bc_daily<-bias_correct_daily(observed_flow, simulatred_flow, start_date_observed, end_date_observed, start_date_simulated, end_date_simulated, time_step, date_type, disaggregation_method)

Here is how the entire bias-correction procedure affect the simulated streamflow:

Figure 4 – Simulated vs. bias-corrected flow after annual adjustment and daily disaggregation

Known Issues and Future Plans

  • Currently the biascorrection package only accepts complete years. For example, if your year starts in January it has to end in December, and it can not continue to, let’s say, next February.
  • I am thinking about adding some more functionalities to the biascorrection package in a few months. Some built-in post-processing options, built-in example datasets, and at least one more bias-correction techniques are some of the options.
  • For the next version, I am also thinking about releasing the model through CRAN.

Hydro Packages in R: HydroGOF

In this blog post, I will go over a very helpful hydrologic package in R that can make your hydro-life much easier. The package is called HydroGOF, and it can be used to make different types of plots, including mean monthly, annual, and seasonal plots for streamflow, rainfall, temperature, and other environmental variables. You can also use HydroGOF to compare your simulated flow to observed flow and calculate various performance metrics such as Nash-Sutcliffe efficiency. Indeed, the GOF part of HydroGOF stands for “goodness of fit.” More information about HydroGOF and its applications for hydrologists can be found here. Also, you can find a more comprehensive list of hydrologic R packages from this water programming blog post.

1- Library and Data Preparation

HydroGOF accepts R data frames and R zoo objects. If you are not familiar with R’s zoo objects, you can find more information at here. In this tutorial, I use HydroGOF’s test case streamflow data, which are in zoo format. Here is how you can install and load zoo and HydroGOF.

install.packages("hydroGOF ")

After you load the package, you need to activate your streamflow data. This is how you do so.

# Activate HydroGOF's streamflow data

Now, let’s take a look at the first few lines of our streamflow data.


Note that the first column is date and that the second column is streamflow data; the unit is m3/sec. Also, keep in mind that you can use zoo to manipulate the temporal regime of your data. For example, you can convert your daily data to monthly or annual.

2- Streamflow Plots

Now, let’s use HydroGOF to visualize our observed streamflow data. You can use the following commands to generate some nice figures that will help you explore the overall regime of the streamflow data.


hydroplot(x = obs,var.type = "FLOW", var.unit = "m3/s", ptype = "ts+boxplot", FUN=mean)
# Note that "hydroplot" command is very flexible and there are many
# options that users can add or remove

3- Generate Simulated Dataset

For this tutorial, I have written the following function, which uses observed streamflow to generate a very simple estimation of daily streamflow. Basically, the function takes the observed data and calculates daily average flow for each day of the year. Then, the function repeats the one-year data as many times as you need, which for our case, is ten times to match the observed flow.

  # This function generates a very simple prediction of streamflow 
  # based on observed streamflow inputs

  DOY<-data.frame(matrix(ncol =1, nrow = length(EgaEnEstellaQts))) 
  for (i_day in 1:length(EgaEnEstellaQts)){
    DOY[i_day,]=as.numeric(strftime(index(EgaEnEstellaQts[i_day]), format = "%j"))
# Create a 365 day timeseries of average daily streamflow.  
  m_inflow_obs<-as.numeric(aggregate(obs[,1], by=list(DOY[,1]), mean)) 
  simplest_predictor<-data.frame(matrix(ncol=3, nrow =length(obs )))
  names(simplest_predictor)<-c("Date", "Observed", "Predicted")
  for (i_d in 1:length(obs)){
   # Iterates average daily flow for entire simulation period 
  # Convert to zoo format
  dt_z<-read.zoo(simplest_predictor, format="%Y-%m-%d") 

After loading the function, you can use the following to create your combined observed and simulated data frame.

# Here we just call the function

4- Hydrologic Error Metrics Using HydroGOF

There are twenty error metrics in HydroGOF—for example, mean error (ME), mean absolute error (MAE), root mean square error (RMSE), normalized root mean square error (NRMSE), percent bias (PBIAS), ratio of standard deviations (Rsd), and Nash-Sutcliffe efficiency (NSE). You can find more information about them here. You can use the following commands to calculate specific error metrics.

# Nash-Sutcliffe Efficiency
NSE(sim=obs_sim$Predicted, obs=obs_sim$Observed) 
# Root Mean Squared Error
rmse(sim=obs_sim$Predicted, obs=obs_sim$Observed) 
# Mean Error
me(sim=obs_sim$Predicted, obs=obs_sim$Observed) 

You can also use this cool command to see all of the available error metrics in HydroGOF.

gof(sim=obs_sim$Predicted, obs=obs_sim$Observed)

5- Visualizing Observed and Simulated Streamflow

Here is the most interesting part: you can plot observed and simulated on the same graph and add all error metrics to the plot.

ggof(sim=obs_sim$Predicted, obs=obs_sim$Observed, ftype="dm", gofs = c("NSE", "rNSE", "ME", "MSE",  "d", "RMSE", "PBIAS"), FUN=mean)
# You should explore different options that you can add to this figure. 
# For example you can choose which error metrics you want to display, etc

Variable Infiltration Capacity (VIC) Model- Part 2

As promised, I am back with the second part of my blog post for variable infiltration capacity (VIC) model training. Before we start talking about how you can run the model, take a look at my first blog post on VIC; that post will hopefully give you a high-level understanding of the model as well as its application, theoretical background, and main processes.

In this blog post, we’ll go over model input and output files and things that you need to do to run the model. We’ll use a “popular among first-timers” real-world example provided by the VIC development team. I found the instructions provided by the VIC website clear and very helpful, and I recommend referring to the site for more information beyond this post. However, the goal of this blog post is to provide easy-to-follow hands-on instructions on how to run the VIC model. Lastly, before I forget, although many of these details are applicable to all VIC versions, this instruction is specifically for VIC-4.x and VIC-5 (classic mode) versions.

Input Files

The most important input to the VIC model is likely the global parameter file. The global parameter file has two main responsibilities: (1) setting the model’s main options and (2) providing the paths to other input files. The other input files include soil parameters, meteorological data, vegetation library, vegetation parameter file, and snow band file. The following sections start by providing more information about these input files and then discuss the global parameter file.

Soil File

Broadly speaking, the soil file provides two categories of information to the model. (1) Calibration parameters include the coefficient that adjusts the rainfall-runoff formulation (bi), and additional parameters control base flow generation (Ws, Ds, Dsmax, C). Calibrating the depths of the middle and last layers of VIC is also a standard practice. However, keep in mind that although the snow model of VIC can be calibrated, its parameters are not in the soil file. (2) Soil textural information includes the parameters of the Brooks-Corey/Campbell relationship, soil depth, soil bulk density, field capacity, permanent wilting point, and more. The soil file usually has multiple lines, and each line corresponds to a specific grid cell. The soil file also tells the model whether or not a specific grid cell should be part of the simulation, and the first number in each line indicates these data. If the number is zero (0), the cell will not be part of the simulation.

Met Data

Met data in VIC also has two options. (1) Users can only provide precipitation, maximum temperature, minimum temperature, and wind speed; VIC’s internal weather generator (MTCLIM; Bohn et al. 2013) calculates the rest of the parameters such as shortwave and longwave radiation, atmospheric pressure, and vapor pressure. (2) Users can provide a complete time series of input data.

Vegetation Parameter File

The vegetation parameter file tells the model what percentage of each grid cell is occupied by each vegetation cover type. If you have a soil file (see the test case for an example) and sum up all the fractions in each grid cell, you will probably notice that the figure is almost always less than one. This is because the rest of the grid cell is bare soil with no vegetation cover, but keep in mind that bare soil is part of simulations. The vegetation parameter file also includes information about root depth, root fraction, and other vegetation-related parameters.

Vegetation Library

The vegetation library provides the model with characteristics of each vegetation type—for example, albedo, LAI, canopy coverage, roughness, and other parameters that the model needs to calculate Penman-Monteith’s evapotranspiration. The original vegetation library comes with twelve vegetation classes, and users usually don’t need to modify this file unless they want to add a new class of vegetation. Keep in mind that the original vegetation file does not do a good job of representing different crop types. That’s one of the reasons that VIC-CropSyst exists.

Snow Band (aka Elevation Band) File

The snow band file divides each grid cell into different elevations. The model simulates each elevation band separately while lapsing temperature and precipitation for each elevation. VIC does this to improve the accuracy of the simulation. Remember that VIC is a large-scale model; a 50 km by 50 km grid cell might contain mountains and valleys with various climates. Although using the snow band file is optional (as specified in the global parameter file), doing so is usually recommended—especially over snow-dominant regions—because snow is very sensitive to elevation.

Global Parameter File

The global parameter file provides two things:

(1) Model main options such as output parameters of interest; whether or not the model should simulate frozen soil and full energy balance; and information related to start and end date of simulation, start date of met data file, parameters included in the met data file, number of soil layers, and maximum number of snow bands. (2) Path to different input files.

How to Download VIC

VIC works in the Linux/Unix environment. The VIC website has recently been updated and provides everything that you need to know about the model. To download the model, go to its GitHub page. The model comes with all necessary codes. You should explore different folders in your “VIC-master” folder. However, in this example, we are only interested in the “/VIC-master/vic/drivers/classic” directory, which provides the VIC code and executable files.

How to Download the Test Dataset

VIC has a test dataset, which is for Stehekin river basin in Pacific Northwest. You can download it from here. This provides you with all the input files needed for the test case.

How to Adjust the Global Parameter File

The global parameter file needs to be adjusted based on what you want from the model. For example, if you need a specific output parameter, you must include it in the list of outputs and modify the number of output parameters accordingly. For this test case, let’s stick to the original setting of the model. You just need to point the global parameter file to your directory. To do so, open “VIC_sample_data-master/classic/Stehekin/parameters/global_param.STEHE.txt”

Create the Executable File

To run VIC, you need to have an executable file. To create an executable file, go to “/VIC-master/vic/drivers/classic,” and use Linux’s make command:

make clean


Finally, you’re ready to run the model. It’s super easy to type in the following command on your Linux terminal:

/VIC-master/vic/drivers/classic/vic_classic.exe -g VIC_sample_data-master/classic/Stehekin/parameters/global_param.STEHE.txt

I think that’s enough for the VIC model. As I mentioned in my last blog post, there is a coupled agro-hydrologic model called VIC-CropSyst that simulates agricultural processes on top of hydrology. If time allows, I will post about VIC-CropSyst in the future.

Synthetic streamflow generation

A recent research focus of our group has been the development and use of synthetic streamflow generators.  There are many tools one might use to generate synthetic streamflows and it may not be obvious which is right for a specific application or what the inherent limitations of each method are.  More fundamentally, it may not be obvious why it is desirable to generate synthetic streamflows in the first place.  This will be the first in a series of blog posts on the synthetic streamflow generators in which I hope to sketch out the various categories of generation methods and their appropriate use as I see it.  In this first post we’ll focus on the motivation and history behind the development of synthetic streamflow generators and broadly categorize them.

Why should we use synthetic hydrology?

The most obvious reason to use synthetic hydrology is if there is little or no data for your system (see Lamontagne, 2015).  Another obvious reason is if you are trying to evaluate the effect of hydrologic non-stationarity on your system (Herman et al. 2015; Borgomeo et al. 2015).  In that case you could use synthetic methods to generate flows reflecting a shift in hydrologic regime.  But are there other reasons to use synthetic hydrology?

In water resources systems analysis it is common practice to evaluate the efficacy of management or planning strategies by simulating system performance over the historical record, or over some critical period.  In this approach, new strategies are evaluated by asking the question:  How well would we have done with your new strategy?

This may be an appealing approach, especially if some event was particularly traumatic to your system. But is this a robust way of evaluating alternative strategies?  It’s important to remember that any hydrologic record, no matter how long, is only a single realization of a stochastic process.  Importantly, drought and flood events emerge as the result of specific sequences of events, unlikely to be repeated.  Furthermore, there is a 50% chance that the worst flood or drought in an N year record will be exceeded in the next N years.  Is it well advised to tailor our strategies to past circumstances that will likely never be repeated and will as likely as not be exceeded?  As Lettenmaier et al. [1987] reminds us “Little is certain about the future except that it will be unlike the past.”

Even under stationarity and even with long hydrologic records, the use of synthetic streamflow can improve the efficacy of planning and management strategies by exposing them to larger and more diverse flood and drought than those in the record (Loucks et al. 1981; Vogel and Stedinger, 1988; Loucks et al. 2005).  Figure 7.12 from Loucks et al. 2005 shows a typical experimental set-up using synthetic hydrology with a simulation model.  Often our group will wrap an optimization model like Borg around this set up, where the system design/operating policy (bottom of the figure) are the decision variables, and the system performance (right of the figure) are the objective(s).


(Loucks et al. 2005)


What are the types of generators?

Many synthetic streamflow generation techniques have been proposed since the early 1960s.  It can be difficult for a researcher or practitioner to know which method is best suited to the problem at hand.  Thus, we’ll start with a very broad characterization of what is out there, then proceed to some history.

Broadly speaking there are two approaches to generating synthetic hydrology: indirect and direct.  The indirect approach generates streamflow by synthetically generating the forcings to a hydrologic model.  For instance one might generate precipitation and temperature series and input them to a hydrologic model of a basin (e.g. Steinschneider et al. 2014).  In contrast, direct methods use statistical techniques to generate streamflow timeseries directly.

The direct approach is generally easier to apply and more parsimonious because it does not require the selection, calibration, and validation of a separate hydrologic model (Najafi et al. 2011).  On the other hand, the indirect approach may be desirable.  Climate projections from GCMs often include temperature or precipitation changes, but may not describe hydrologic shifts at a resolution or precision that is useful.  In other cases, profound regime shifts may be difficult to represent with statistical models and may require process-driven models, thus necessitating the indirect approach.

Julie’s earlier series focused on indirect approaches, so we’ll focus on the direct approach.  Regardless of the approach many of the methods are same.  In general generator methods can be divided into two categories: parametric and non-parametricParametric methods rely on a hypothesized statistical model of streamflow whose parameters are selected to achieve a desired result (Stedinger and Taylor, 1982a).  In contrast non-parametric methods do not make strong structural assumptions about the processes generating the streamflow, but rather rely on re-sampling from the hydrologic record in some way (Lall, 1995).  Some methods combine parametric and non-parametric techniques, which we’ll refer to as semi-parametric (Herman et al. 2015).

Both parametric and non-parametric methods have advantages and disadvantages.  Parametric methods are often parsimonious, and often have analytical forms that allow easy parameter manipulation to reflect non-stationarity.  However, there can be concern that the underlying statistical models may not reflect the hydrologic reality well (Sharma et al. 1997).  Furthermore, in multi-dimensional, multi-scale problems the proliferation of parameters can make parametric models intractable (Grygier and Stedinger, 1988).  Extensive work has been done to confront both challenges, but they may lead a researcher to adopt a non-parametric method instead.

Because many non-parametric methods ‘re-sample’ flows from a record, realism is not generally a concern, and most re-sampling schemes are computationally straight forward (relatively speaking).  On the other hand, manipulating synthetic flows to reflect non-stationarity may not be as straightforward as a simple parameter change, though methods have been suggested (Herman et al. 2015Borgomeo et al. 2015).  More fundamentally, because non-parametric methods rely so heavily on the data, they require sufficiently long records to ensure there is enough hydrologic variability to sample.  Short records can be a concern for parametric methods as well, though parametric uncertainty can be explicitly considered in those methods (Stedinger and Taylor, 1982b).  Of course, parametric methods also have structural uncertainty that non-parametric models largely avoid by not assuming an explicit statistical model.

In the coming posts we’ll dig into the nuances of the different methods in greater detail.

A historical perspective

The first use of synthetic flow generation seems to have been by Hazen [1914].  That work attempted to quantify the reliability of a water supply by aggregating the streamflow records of local streams into a 300-year ‘synthetic record.’  Of course the problem with this is that the cross-correlation between concurrent flows rendered the effective record length much less than the nominal 300 years.

Next Barnes [1954] generated 1,000 years of streamflow for a basin in Australia by drawing random flows from a normal distribution with mean and variance equal to the sample estimates from the observed record.  That work was extended by researchers from the Harvard Water Program to account for autocorrelation of monthly flows (Maass et al., 1962; Thomas and Fiering, 1962).  Later work also considered the use of non-normal distributions (Fiering, 1967), and the generation of correlated concurrent flows at multiple sites (Beard, 1965; Matalas, 1967).

Those early methods relied on first-order autoregressive models that regressed flows in the current period on the flows of the previous period (see Loucks et al.’s Figure 7.13  below).  Box and Jenkins [1970] extended those methods to autoregressive models of arbitrary order, moving average models of arbitrary order, and autoregressive-moving average models of arbitrary order.  Those models were the focus of extensive research over the course of the 1970s and 1980s and underpin many of the parametric generators that are widely used in hydrology today (see Salas et al. 1980; Grygier and Stedinger, 1990; Salas, 1993; Loucks et al. 2005).


(Loucks et al. 2005)

By the mid-1990s, non-parametric methods began to gain popularity (Lall, 1995).  While much of this work has its roots in earlier work from the 1970s and 1980s (Yakowitz, 1973, 1979, 1985; Schuster and Yakowitz, 1979; Yakowitz and Karlsson, 1987; Karlson and Yakowitz, 1987), improvements in computing and the availability of large data sets meant that by the mid-1990s non-parametric methods were feasible (Lall and Sharma, 1996).  Early examples of non-parametric methods include block bootstrapping (Vogel and Shallcross, 1996), k-nearest neighbor (Lall and Sharma, 1996), and kernel density methods (Sharma et al. 1997).  Since that time extensive research has made improvement to these methods, often by incorporating parametric elements.  For instance, Srinivas and Srinivasan (2001, 2005, and 2006) develop a hybrid autoregressive-block bootstrapping method designed to improve the bias in lagged correlation and to generate flows other than the historical, for multiple sites and multiple seasons.  K-nearest neighbor methods have also been the focus of extensive research (Rajagopalan and Lall, 1999; Harrold et al. 2003; Yates et al. 2003; Sharif and Burn, 2007; Mehortra and Sharma, 2006; Prairie et al. 2006; Lee et al. 2010, Salas and Lee, 2010, Nowak et al., 2010), including recent work by our group  (Giuliani et al. 2014).

Emerging work focuses on stochastic streamflow generation using copulas [Lee and Salas, 2011; Fan et al. 2016], entropy theory bootstrapping [Srivastav and Simonovic, 2014], and wavelets [Kwon et al. 2007; Erkyihun et al., 2016], among other methods.

In the following posts I’ll address different challenges in stochastic generation [e.g. long-term persistence, parametric uncertainty, multi-site generation, seasonality, etc.] and the relative strengths and shortcomings of the various methods for addressing them.

Works Cited

Barnes, F. B., Storage required for a city water supply, J. Inst. Eng. Australia 26(9), 198-203, 1954.

Beard, L. R., Use of interrelated records to simulate streamflow, J. Hydrol. Div., ASCE 91(HY5), 13-22, 1965.

Borgomeo, E., Farmer, C. L., and Hall, J. W. (2015). “Numerical rivers: A synthetic streamflow generator for water resources vulnerability assessments.” Water Resour. Res., 51(7), 5382–5405.

Y.R. Fan, W.W. Huang, G.H. Huang, Y.P. Li, K. Huang, Z. Li, Hydrologic risk analysis in the Yangtze River basin through coupling Gaussian mixtures into copulas, Advances in Water Resources, Volume 88, February 2016, Pages 170-185.

Fiering, M.B, Streamflow Synthesis, Harvard University Press, Cambridge, Mass., 1967.

Giuliani, M., J. D. Herman, A. Castelletti, and P. Reed (2014), Many-objective reservoir policy identification and refinement to reduce policy inertia and myopia in water management, Water Resour. Res., 50, 3355–3377, doi:10.1002/2013WR014700.

Grygier, J.C., and J.R. Stedinger, Condensed Disaggregation Procedures and Conservation Corrections for Stochastic Hydrology, Water Resour. Res. 24(10), 1574-1584, 1988.

Grygier, J.C., and J.R. Stedinger, SPIGOT Technical Description, Version 2.6, 1990.

Harrold, T. I., Sharma, A., and Sheather, S. J. (2003). “A nonparametric model for stochastic generation of daily rainfall amounts.” Water Resour. Res., 39(12), 1343.

Hazen, A., Storage to be provided in impounding reservoirs for municipal water systems, Trans. Am. Soc. Civ. Eng. 77, 1539, 1914.

Herman, J.D., H.B. Zeff, J.R. Lamontagne, P.M. Reed, and G. Characklis (2016), Synthetic Drought Scenario Generation to Support Bottom-Up Water Supply Vulnerability Assessments, Journal of Water Resources Planning & Management, doi: 10.1061/(ASCE)WR.1943-5452.0000701.

Karlsson, M., and S. Yakowitz, Nearest-Neighbor methods for nonparametric rainfall-runoff forecasting, Water Resour. Res., 23, 1300-1308, 1987.

Kwon, H.-H., U. Lall, and A. F. Khalil (2007), Stochastic simulation model for nonstationary time series using an autoregressive wavelet decomposition: Applications to rainfall and temperature, Water Resour. Res., 43, W05407, doi:10.1029/2006WR005258.

Lall, U., Recent advances in nonparametric function estimation: Hydraulic applications, U.S. Natl. Rep. Int. Union Geod. Geophys. 1991- 1994, Rev. Geophys., 33, 1093, 1995.

Lall, U., and A. Sharma (1996), A nearest neighbor bootstrap for resampling hydrologic time series, Water Resour. Res. 32(3), pp. 679-693.

Lamontagne, J.R. 2015,Representation of Uncertainty and Corridor DP for Hydropower 272 Optimization, PhD edn, Cornell University, Ithaca, NY.

Lee, T., J. D. Salas, and J. Prairie (2010), An enhanced nonparametric streamflow disaggregation model with genetic algorithm, Water Resour. Res., 46, W08545, doi:10.1029/2009WR007761.

Lee, T., and J. Salas (2011), Copula-based stochastic simulation of hydrological data applied to Nile River flows, Hydrol. Res., 42(4), 318–330.

Lettenmaier, D. P., K. M. Latham, R. N. Palmer, J. R. Lund and S. J. Burges, Strategies for coping with drought Part II: Planning techniques for planning and reliability assessment, EPRI P-5201, Final Report Project 2194-1, June 1987.

Loucks, D.P., Stedinger, J.R. & Haith, D.A. 1981, Water Resources Systems Planning and Analysis, 1st edn, Prentice-Hall, Englewood Cliffs, N.J.

Loucks, D.P. et al. 2005, Water Resources Systems Planning and Management: An Introduction to Methods, Models and Applications, UNESCO, Delft, The Netherlands.

Maass, A., M. M. Hufschmidt, R. Dorfman, H. A. Thomas, Jr., S. A. Marglin and G. M. Fair,

Design of Water Resource Systems, Harvard University Press, Cambridge, Mass., 1962.

Matalas, N. C., Mathematical assessment of synthetic hydrology, Water Resour. Res. 3(4), 937-945, 1967.

Najafi, M. R., Moradkhani, H., and Jung, I. W. (2011). “Assessing the uncertainties of hydrologic model selection in climate change impact studies.” Hydrol. Process., 25(18), 2814–2826.

Nowak, K., J. Prairie, B. Rajagopalan, and U. Lall (2010), A nonparametric stochastic approach for multisite disaggregation of annual to daily

streamflow, Water Resour. Res., 46, W08529, doi:10.1029/2009WR008530.

Nowak, K., J. Prairie, B. Rajagopalan, and U. Lall (2010), A nonparametric stochastic approach for multisite disaggregation of annual to daily

streamflow, Water Resour. Res., 46, W08529, doi:10.1029/2009WR008530.

Nowak, K., J. Prairie, B. Rajagopalan, and U. Lall (2010), A nonparametric stochastic approach for multisite disaggregation of annual to daily

streamflow, Water Resour. Res., 46, W08529, doi:10.1029/2009WR008530.

Nowak, K., J. Prairie, B. Rajagopalan, and U. Lall (2010), A nonparametric stochastic approach for multisite disaggregation of annual to daily streamflow, Water Resour. Res., 46, W08529, doi:10.1029/2009WR008530.

Prairie, J. R., Rajagopalan, B., Fulp, T. J., and Zagona, E. A. (2006). “Modified K-NN model for stochastic streamflow simulation.” J. Hydrol. Eng., 11(4), 371–378.

Rajagopalan, B., and Lall, U. (1999). “A k-nearest-neighbor simulator for daily precipitation and other weather variables.” Water Resour. Res., 35(10), 3089–3101.

Salas, J. D., J. W. Deller, V. Yevjevich and W. L. Lane, Applied Modeling of Hydrologic Time Series, Water Resources Publications, Littleton, Colo., 1980.

Salas, J.D., 1993, Analysis and Modeling of Hydrologic Time Series, Chapter 19 (72 p.) in The McGraw Hill Handbook of Hydrology, D.R. Maidment, Editor.

Salas, J.D., T. Lee. (2010). Nonparametric Simulation of Single-Site Seasonal Streamflow, J. Hydrol. Eng., 15(4), 284-296.

Schuster, E., and S. Yakowitz, Contributions to the theory of nonparametric regression, with application to system identification, Ann. Stat., 7, 139-149, 1979.

Sharif, M., and Burn, D. H. (2007). “Improved K-nearest neighbor weather generating model.” J. Hydrol. Eng., 12(1), 42–51.

Sharma, A., Tarboton, D. G., and Lall, U., 1997. “Streamflow simulation: A nonparametric approach.” Water Resour. Res., 33(2), 291–308.

Srinivas, V. V., and Srinivasan, K. (2001). “A hybrid stochastic model for multiseason streamflow simulation.” Water Resour. Res., 37(10), 2537–2549.

Srinivas, V. V., and Srinivasan, K. (2005). “Hybrid moving block bootstrap for stochastic simulation of multi-site multi-season streamflows.” J. Hydrol., 302(1–4), 307–330.

Srinivas, V. V., and Srinivasan, K. (2006). “Hybrid matched-block bootstrap for stochastic simulation of multiseason streamflows.” J. Hydrol., 329(1–2), 1–15.

Roshan K. Srivastav, Slobodan P. Simonovic, An analytical procedure for multi-site, multi-season streamflow generation using maximum entropy bootstrapping, Environmental Modelling & Software, Volume 59, September 2014a, Pages 59-75.

Stedinger, J. R. and M. R. Taylor, Sythetic streamflow generation, Part 1. Model verification and validation, Water Resour. Res. 18(4), 909-918, 1982a.

Stedinger, J. R. and M. R. Taylor, Sythetic streamflow generation, Part 2. Parameter uncertainty,Water Resour. Res. 18(4), 919-924, 1982b.

Steinschneider, S., Wi, S., and Brown, C. (2014). “The integrated effects of climate and hydrologic uncertainty on future flood risk assessments.” Hydrol. Process., 29(12), 2823–2839.

Thomas, H. A. and M. B. Fiering, Mathematical synthesis of streamflow sequences for the analysis of river basins by simulation, in Design of Water Resource Systems, by A. Maass, M. Hufschmidt, R. Dorfman, H. A. Thomas, Jr., S. A. Marglin and G. M. Fair, Harvard University Press, Cambridge, Mass., 1962.

Vogel, R.M., and J.R. Stedinger, The value of stochastic streamflow models in over-year reservoir design applications, Water Resour. Res. 24(9), 1483-90, 1988.

Vogel, R. M., and A. L. Shallcross (1996), The moving block bootstrap versus parametric time series models, Water Resour. Res., 32(6), 1875–1882.

Yakowitz, S., A stochastic model for daily river flows in an arid region, Water Resour. Res., 9, 1271-1285, 1973.

Yakowitz, S., Nonparametric estimation of markov transition functions, Ann. Stat., 7, 671-679, 1979.

Yakowitz, S. J., Nonparametric density estimation, prediction, and regression for markov sequences J. Am. Stat. Assoc., 80, 215-221, 1985.

Yakowitz, S., and M. Karlsson, Nearest-neighbor methods with application to rainfall/runoff prediction, in Stochastic  Hydrology, edited by J. B. Macneil and G. J. Humphries, pp. 149-160, D. Reidel, Norwell, Mass., 1987.

Yates, D., Gangopadhyay, S., Rajagopalan, B., and Strzepek, K. (2003). “A technique for generating regional climate scenarios using a nearest-neighbor algorithm.” Water Resour. Res., 39(7), 1199.