Synthetic Weather Generation: Part V

Conditioning Synthetic Weather Generation on Seasonal Climate Forecasts

This is the final blog post in a five part series on synthetic weather generators. See Parts I and II for a description of single-site parametric and non-parametric generators, respectively, and Part III for a description of multi-site generators of both types.

In my previous post, Part IV, I discussed how parametric and non-parametric weather generators can be modified to produce weather that is consistent with climate change projections for use in long-term planning. In the shorter-term, water managers may be able to exploit mid-range climate forecasts to inform seasonal reservoir operations (see e.g., Kim and Palmer (1997), Chiew et al. (2003), Block (2011), Block and Goddard (2012), Anghileri et al. (2016)). For such analyses, it could be useful to tailor management plans to simulated weather conditions consistent with these probabilistic forecasts. Here I discuss how one can condition weather generators on seasonal climate forecasts for such purposes.

Two major forecasting groups, the International Research Institute (IRI) for Climate and Society at Columbia University and the Climate Prediction Center (CPC) of the U.S. National Centers for Environmental Prediction, issue tercile seasonal forecasts that specify the probabilities of observing above normal (pA), near normal (pN) and below normal (pB) precipitation and temperature. Forecasts are issued each month for the upcoming three months (see those from IRI and CPC). While these forecasts are derived from dynamical and statistical models that include a variety of physically-based processes, most of the forecast skill can be explained by the effects of the El Niño-Southern Oscillation (ENSO) on the climate system (Barnston et al., 1994). As most of you probably know, ENSO refers to the quasi-periodic cycling of sea surface temperatures (SSTs) in the tropical eastern Pacific Ocean. The warm phase is known as El Niño and the cool phase as La Niña. These are quantified as five consecutive three-month periods with SST anomalies in the Niño 3.4 region of the Pacific > +0.5°C (El Niño) or < -0.5°C (La Niña). All other periods are considered neutral (Climate Prediction Center, 2016). Because much of seasonal climate forecast skill is derived from ENSO, current or forecasted SST anomalies in the Niño 3.4 region are sometimes used by themselves as proxy seasonal forecasts. Here I will discuss techniques for conditioning weather generator parameters on either tercile forecasts or current or projected ENSO conditions.

Parametric Weather Generators

Wilks (2002) presents a method for conditioning parametric weather generators of the Richardson (1981) type on tercile seasonal climate forecasts using a network of sites in New York State. The key idea, derived from Briggs and Wilks (1996), is to estimate the weather generator parameters not from the entire historical record, but from a weighted bootstrapped sample of the historical record consistent with the forecast. This is similar to the method of streamflow generation use by Herman et al. (2016) to increase the frequency of droughts of a given magnitude. As an illustrative example, Herman et al. (2016) empirically estimate the quantile of a noteworthy drought from the historical record and show the system impacts of droughts of that magnitude or worse becoming n times more likely. This is done by adapting the semi-parametric streamflow generator developed by Kirsch et al. (2013) to sample years with historical droughts of at least that magnitude n times more often.

While Herman et al. (2016) take a fully non-parametric approach by estimating quantiles empirically, Briggs and Wilks (1996) estimate terciles parametrically by fitting a normal distribution to the historical mean seasonal temperatures and a Gamma distribution to the historical seasonal precipitation totals at each site. After estimating terciles, Briggs and Wilks (1996) classify each year in the historical record as below normal, near normal or above normal in terms of temperature and precipitation in the season of interest. Because terciles are estimated parametrically, each category will not necessarily contain an equal number of years, even if the record length is a multiple of three. It should be noted, however, that like Herman et al. (2016), IRI defines terciles empirically from the most recent 30-yr record (see more here and here) and CPC from the most recent 30-yr record updated only every 10 years, i.e., the current reference frame is 1981-2010 (see more here).  IRI provides a discussion of the advantages and disadvantages to parametric and non-parametric tercile estimation methods. For consistency, it may be best to take the same approach as the agency whose forecasts are being used.

Once terciles have been defined, weather generator parameters can be estimated from a bootstrapped sample of the historical record consistent with the forecast (Wilks, 2002). Consider a historical record with NB years classified as being below normal, NN as near normal, and NA as above normal. Then the expected value of a given seasonal statistic, X, can be estimated from a bootstrapped sample of L years from the historical record in which the below normal, near normal and above normal years are sampled with probabilities pB, pN, and pA, respectively. Representing the statistic of interest in the ith below, near and above normal year as xi(B), xi(N) and xi(A), respectively, then the expected value of X is the following:

(1) E\left[X\right] = \lim_{L\to\infty} \frac{1}{L}\left[\sum_{i=1}^{N_B} \frac{p_BL}{N_B}x_{i}^{\left(B\right)} + \sum_{i=1}^{N_N} \frac{p_NL}{N_N}x_{i}^{\left(N\right)} + \sum_{i=1}^{N_A} \frac{p_AL}{N_A}x_{i}^{\left(A\right)}\right]

That is, the forecast-conditional value of X is a weighted sum of its average in the below normal, near normal and above normal years, where the weights are equal to the forecast probabilities for each of the three respective categories (Wilks, 2002). Note that X is a seasonal statistic, not a parameter, so one cannot simply estimate a weather generator parameter as a weighted average of its value in each of the terciles unless the parameter is itself a simple statistic.

Fortunately, for some of the weather generator parameters this is the case. Recall from Part I that the first step of the Richardson generator is to generate a sequence of daily rainfall occurrences from a first order Markov chain. This chain is defined by the probabilities of transitioning from a dry day to a wet day, p01, or from a wet day to another wet day, p11. As discussed in Part IV, these two parameters together define the unconditional probability of a wet day, π, and the first-order autocorrelation of the occurrences, d, where π = p01/(1 + p01p11) and d = p11p01 (Katz, 1983). The unconditional probability of a wet day is a simple statistic. Therefore, since pN = 1 – pApB, π can be estimated each month as a function of pB, pA and the average portion of wet days in below normal, near normal and above normal years for that month ((B),(N) and (A)):

(2) π = pB(N) + (1 – pBpA)(N) + pA(A).

More generically, π =b0 + bBpB + bApA, where b0 = (N), bB = (B) – x̅(N) and bA = (A) – x̅(N). Wilks, 2002 recommends that the parameter π be estimated separately at each site and for each month, but that the below normal, near normal and above normal years be defined based on the total precipitation at that site in the entire three-month season. This is because the forecast is for the entire season, but the portion of wet days varies on a shorter time scale.

Like the unconditional probability of a wet day, the persistence parameter d can also be estimated as a function of pB, pA and the value of d in below normal, near normal and above normal years. Wilks (2002) shows that d can be represented by a quadratic function of pA and p:

(3) d = b0 + bBpB + bBBpB2 + bApA + bAApA2 + bBApBpA.

but finds that variations in d across forecasts are small such that one can reasonably assume the climatological estimate of d for all sites and months, regardless of the forecast.

The remaining weather generator parameters related to precipitation are those defining the distribution of precipitation amounts. Because these parameters (α and β if fitting a Gamma distribution, and α, λ1 and λ2 if fitting a mixed exponential distribution) are estimated iteratively in an MLE approach, they cannot be estimated as a function of the forecast probabilities like the occurrence parameters can. Instead, Wilks (2002) suggests using the Briggs and Wilks (1996) approach of bootstrapping a large sample of years from the historical record consistent with the forecast and fitting separate probability distributions to the precipitation amounts in each month’s weighted sample. When performing this estimation using a mixed exponential distribution to model precipitation amounts, Wilks (2002) found the estimates of the mixing parameter, α, to be the least consistent across the investigated sites and chose to hold it constant across sites and forecasts. Thus, only λ1 and λ2 were re-estimated for each month and site as a function of the seasonal forecast.

One drawback to the sampling scheme employed by Briggs and Wilks (1996) is that all historical years within each tercile have an equal probability of being sampled: pB/NB, pN/NN and pA/NA for below, near and above normal years. In reality, years similar to those in the tail are less likely to occur than years similar to those near the median. An alternative, more precise sampling scheme called the pdf-ratio method suggested by Stedinger and Kim (2010) assigns each year i an un-normalized probability of selection, qi = (1/N)*f1(xi)/f0(xi) where N is the number of years in the historical record and f1 and f0 are pdfs of the statistic X under forecast and climatological conditions, respectively. The qi are then normalized such that they sum to 1. f1 and f0 can be analytical or empirical distributions.

After estimating parameters of the precipitation amounts distributions using either the approach of Briggs and Wilks (1996) or Stedinger and Kim (2010), one must estimate the forecast-conditional temperature parameters. Recall that in the Richardson generator, separate harmonics are fit to the eight time series of means and standard deviations of minimum and maximum temperature on wet and dry days. Historical residuals from these fits are determined by first subtracting the predicted mean and then dividing by the predicted standard deviation. Finally, the residuals are modeled by an order-one vector autoregression, or VAR(1) model. Because forecast-conditional weather generators are only applied three months at a time, Wilks (2002) suggests instead fitting quadratic functions to these eight time series within the season of interest.

Like the parameters describing precipitation occurrences, the parameters of the quadratic functions of time describing the mean temperature on wet and dry days can be estimated as a weighted average of fits in each of the three terciles. As shown in Wilks (2002), the four mean temperature functions (minimum and maximum on wet and dry days), µ(t), at each site are specified by the function:

(4) µ(t) = (β0 + βBpB + βApA) + (γ0 + γBpB + γApA)t + (δ0 + δBpB + δApA)t2


(5) µ̅t(N) = β0 + γ0t + δ0t2,

(6) [µ̅t(B) µ̅t(N)] = βB + γBt + δBt2,

(7) [µ̅t(A) µ̅t(N)] = βA + γAt + δAt2,

t is the day and µ̅t(B), µ̅t(N) and µ̅t(A) are the mean temperature statistics of concern on each day of below normal, near normal and above normal years in the historical record, respectively.

Finally, Wilks (2002) shows that the standard deviations of minimum and maximum temperature on wet and dry days can be estimated by an extension of Equation 1:

(8) s\left(t\right) = \left[\frac{p_B}{N_B}\sum_{i=1}^{N_B}\left[x_{i}^{\left(B\right)}\left(t\right) - \mu\left(t\right)\right]^2 + \frac{p_N}{N_N}\sum_{i=1}^{N_N}\left[x_{i}^{\left(N\right)}\left(t\right) - \mu\left(t\right)\right]^2 + \frac{p_A}{N_A}\sum_{i=1}^{N_A}\left[x_{i}^{\left(A\right)}\left(t\right) - \mu\left(t\right)\right]^2\right]^{1/2}

where µ(t) is defined as in Equation 4. Once again, the forecast-conditional standard deviations of the four temperature series, σ(t), can then be estimated by quadratic functions of time, conditional on the forecast probabilities pA and pB:

(9) σ(t) = (β0 + βBpB + βBBpB2 + βApA + βAApA2 + βBApBpA)+ (γ0 + γBpB + γBBpB2 + γApA + γAApA2 + γBApBpA)+ (δ0 + δBpB + δBBpB2 + δApA + δAApA2 + δBApBpA)t2.

For the VAR(1) model of temperature residuals, Wilks (2002) found that variations in the estimates of these parameters as a function of the forecast, like d, were minor for the investigated sites in New York. For this reason, the VAR(1) model was fit separately for each month and site based on the entire historical record and these estimates were unchanged with the forecast. Finally, Wilks (2002) found that the spatial correlation of temperature and precipitation also did not change significantly between climatic terciles, and so they too were assumed independent of the forecast. Correlations in temperature were included in the VAR(1) model as described in Part III. Correlations in precipitation occurrences ω, and amounts, ζ, between sites k and l were approximated for all site pairs each month as a function of the horizontal distance, c, between them (see Equations 10 and 11, respectively, for which parameters θ1, θ2 and θ3 were estimated).

(10) \omega = \left(1 + \frac{c_{k,l}}{\theta_1}\right)^{\theta_2}

(11) \zeta = \exp\left(-\theta_3c_{k,l}\right)

If one received a tercile ENSO forecast, the same approach could be used as in Wilks (2002), except the season of interest in each historical year would be classified as La Niña, Neutral or El Niño instead of below normal, near normal or above normal.

Non-parametric Weather Generators

The key idea of weighted sampling from Briggs and Wilks (1996) has also been applied in non-parametric weather generators to condition synthetic weather series on seasonal climate forecasts. For example, Apipattanavis et al. (2007) modify their semi-parametric k-nn generator to find and probabilistically select neighbors, not from the entire historical record, but from a bootstrapped sample of the historical record consistent with the forecast. Again, this can be applied using tercile forecasts of either the {Below Normal, Near Normal, Above Normal} type or {La Niña, Neutral, El Niño} type.

Clark et al. (2004a) develop a more innovative approach that combines ideas from the non-parametric Schaake Shuffle method used to spatially correlate short-term precipitation and temperature forecasts (Clark et al., 2004b) with a parametric approach to weighted resampling presented by Yates et al. (2003) for the k-nn generator of Rajagopalan and Lall (1999). The Schaake Shuffle, originally devised by Dr. J. Schaake of the National Weather Service Office of Hydrologic Development, is a method of reordering ensemble precipitation and temperature forecasts to better capture the spatial and cross correlation of these spatial fields (Clark et al., 2004b).

Traditionally, model output statistics (MOS) from the Numerical Weather Prediction (NWP) model such as temperature, humidity and wind speed at different pressure levels, are used as predictors in a regression model to forecast daily temperature and precipitation at a number of sites. To generate an ensemble of predictions for each forecasted day, normal random variables with mean 0 and variance σε2 are added back to the mean prediction, where σε2 is the variance of the regression residuals. However, these regressions are generally developed independently for each variable at each site and therefore do not reproduce the spatial or temporal correlation between the variables across sites and time (Clark et al., 2004b). To better capture these correlations, the Schaake Shuffle, illustrated in Figure 2 from Clark et al. (2004a) for a 10-member ensemble, re-orders the ensemble members each day in order to preserve the Spearman-rank correlation in the temperature and precipitation variables between sites.


The Schaake Shuffle proceeds as follows. For a particular day, the original ensemble members for each variable at each station are ranked from lowest to highest, as shown in Table A of Figure 2 above. Next, a set of historical observations of the same size is generated by randomly selecting days from the historical record within a window of 7 days before and after the forecast date (Table B). Third, the historical observations are sorted from lowest to highest for each variable at each site, as shown in Table C. Finally, the original ensemble members in Table A are re-shuffled to form the final, spatially correlated ensembles in Table D in the following way:

  1. The rank of the data in the first historical observation (shown with dark circles in Tables B and C) is determined at each site
  2. At each site, the member of the original ensemble with the same rank as the first historical observation for that site becomes the first member of the final, correlated ensemble (see dark circles in Table A and location in Table D).
  3. Steps 1 and 2 are repeated for every historical observation/ensemble member.

As stated earlier, this process reproduces the Spearman rank correlation of the observations across sites (Clark et al., 2004b). In order to preserve the temporal correlation for each variable, instead of re-generating a random set of historical observations to use for shuffling the next day’s forecast, the observations from the day following that used for the previous time step is utilized. While the Schaake Shuffle does not guarantee reproduction of the spatial correlation in the observations, just in their rank, the results presented in Clark et al. (2004b) indicate that the method does reasonably well for both, and significantly improves upon the un-shuffled forecasts.

In the weather generator presented by Clark et al. (2004a), the same approach is used to simulate weather sequences except the ensembles in Table A are not generated by MOS regressions but by independently sampling historical observations within +/- 7 days of the simulated day at each site. To condition this weather generator on seasonal climate forecasts, the unshuffled ensembles are formed by preferential selection of different years from the historical record following an approach inspired by Yates et al. (2003). The first step in this approach is to sort all N historical years in terms of their similarity to a climate index, such as current SSTs in the Niño 3.4 region. The most similar year is given rank i = 1 and the least similar i = N. Next, a standard uniform random variable u is drawn and the year of rank i is chosen as an ensemble member, where i = INT(uλN/α) + 1. Here INT(·) is the integer operator, λ is a weighting parameter, and α a selection parameter. Values of λ greater (less) than 1 increase (decrease) the probability of selecting years ranked more similar to the climate index. Values of α greater than 1 restrict the number of sampled years such that α = 5, for example, results in only the most similar 1/5 of years being selected (Clark et al., 2004a).

Yates et al. (2003) apply a simplified version of this method with only one parameter, λ, in a scenario discovery-type approach, investigating the effects of e.g. warmer-drier springs and cooler-wetter summers. Clark et al. (2004a) first take this approach by ranking the historical years according to their similarity to the current Niño 3.4 index and exploring the effects of different choices of λ and α on the skill of the generated weather sequences in forecasting total winter precipitation at Petrified Forest in Arizona, measuring skill by the ranked probability skill score (RPSS). Interestingly, they find that high values of both λ and α, where years more similar to the climate index at the beginning of the season are selected, result in negative forecast skill. This highlights the importance of not being overconfident by only sampling years closest to current or forecast conditions. They note that the values of λ and α should depend on the strength of the Niño 3.4 index, and therefore should be re-optimized for different values of the index in order to maximize the RPSS.

All of these approaches could prove informative for seasonal water resources planning, if the forecasts being used are reliable. In the case of tercile forecasts, this means that, on average, when a given climate state is forecast to occur with probability p, it does in fact occur with that probability. Given that past diagnostic assessments of IRI and CPC forecasts have found biases and overconfidence in some locations (Wilks and Godfrey, 2002; Wilks, 2000), water managers should proceed with caution in using them for seasonal planning. At a minimum, one should perform an analysis of the forecast value for the system of concern (Anghileri et al., 2016) before changing system operations. Fortunately, these forecasts continue to improve over time and several studies have already found value in using them to inform seasonal operations (e.g. Kim and Palmer (1997)Block (2011), Block and Goddard (2012), Anghileri et al. (2016)), indicating promise in their use for water resources planning.

Works Cited

Anghileri, D. Voisin, N., Castelletti, A., Pianosi, A., Nijssen, B., & Lettenmaier, D. P. (2016). Value of long-term streamflow forecasts to reservoir operations for water supply in snow-dominated river catchments. Water Resources Research, 52(6), 4209-4225.

Apipattanavis, S., Podestá, G., Rajagopalan, B., & Katz, R. W. (2007). A semiparametric multivariate and multisite weather generator. Water Resources Research, 43(11).

Barnston, A. G., van den Dool, H. M., Rodenhuis, D. R., Ropelewski, C. R., Kousky, V. E., O’Lenic, E. A., et al. (1994). Long-lead seasonal forecasts-Where do we stand?. Bulletin of the American Meteorological Society75(11), 2097-2114.

Block, P. (2011). Tailoring seasonal climate forecasts for hydropower operations. Hydrology and Earth System Sciences, 15, 1355-1368.

Block, P., & Goddard, L. (2012). Statistical and dynamical climate predictins to guide water resources in Ethiopoia. Journal of Water Resources Planning and management, 138(3), 287-298.

Briggs, W. M., & Wilks, D. S. (1996). Extension of the Climate Prediction Center long-lead temperature and precipitation outlooks to general weather statistics. Journal of climate, 9(12), 3496-3504.

Chiew, F. H. S., Zhou, S. L., & McMahon, T. A. Use of seasonal streamflow forecasts in water resources management. Journal of Hydrology, 270(1), 135-144.

Clark, M. P., Gangopadhyay, S., Brandon, D., Werner, K., Hay, L., Rajagopalan, B., & Yates, D. (2004a). A resampling procedure for generating conditioned daily weather sequences. Water Resources Research, 40(4).

Clark, M., Gangopadhyay, S., Hay, L., Rajagopalan, B., & Wilby, R. (2004b). The Schaake shuffle: A method for reconstructing space-time variability in forecasted precipitation and temperature fields. Journal of Hydrometeorology, 5(1), 243-262.

Climate Prediction Center (2016). ENSO: Recent Evolution, Current Status and Predictions. National Oceanic and Atmospheric Administration, pp. 19-20.

Herman, J. D., Zeff, H. B., Lamontagne, J. R., Reed, P. M., & Characklis, G. W. (2016). Synthetic drought scenario generation to support bottom-up water supply vulnerability assessments. Journal of Water Resources Planning and Management, 04016050.

Katz, R. W. (1983). Statistical procedures for making inferences about precipitation changes simulated by an atmospheric general circulation model. Journal of the Atmospheric Sciences, 40(9), 2193-2201.

Kim, Y., & Palmer, R. (1997). Value of seasonal flow forecasts in Bayesian stochastic programming. Journal of Water Resources Planning and Management, 123(6), 327-335.

Kirsch, B. R. , Characklis, G. W., & Zeff, H. B. (2013). Evaluating the impact of alternative hydro-climate scenarios on transfer agreements: Practical improvement for generating synthetic streamflows. Journal of Water Resources Planning and Management, 139(4), 396-406.

Rajagopalan, B., & Lall, U. (1999). A k‐nearest‐neighbor simulator for daily precipitation and other weather variables. Water Resources Research35(10), 3089-3101.

Richardson, C. W. (1981). Stochastic simulation of daily precipitation, temperature and solar radiation. Water Resources Research, 17, 182-190.

Stedinger, J. R., & Kim, Y. O. (2010). Probabilities for ensemble forecasts reflecting climate information. Journal of hydrology, 391(1), 9-23.

Wilks, D. S. (2000). Diagnostic verification of the Climate Prediction Center long-lead outlooks, 1995-1998. Journal of Climate, 13, 2389-2403.

Wilks, D. S. (2002). Realizations of daily weather in forecast seasonal climate. Journal of Hydrometeorology, 3(2), 195-207.

Wilks, D. S. & Godfrey, C. M. (2002). Diagnostic verification of the IRI net assessment forecasts, 1997-2000. Journal of Climate, 15(11), 1369-1377.

Yates, D., Gangopadhyay, S., Rajagopalan, B., & Strzepek, K. A technique for generating regional climate scenarios using a nearest-neighbor algorithm. Water Resources Research, 39(7).

Synthetic Weather Generation: Part IV

Conditioning Synthetic Weather Generation on Climate Change Projections

This is the fourth blog post in a five part series on synthetic weather generators. You can read about common single-site parametric and non-parametric weather generators in Parts I and II, respectively, as well as multi-site generators of both types in Part III. Here I discuss how parametric and non-parametric weather generators can be modified to simulate weather that is consistent with climate change projections.

As you are all well aware, water managers are interested in finding robust water management plans that will consistently meet performance criteria in the face of hydrologic variability and change. One method for such analyses is to re-evaluate different water management plans across a range of potential future climate scenarios. Weather data that is consistent with these scenarios can be synthetically generated from a stochastic weather generator whose parameters or resampled data values have been conditioned on the climate scenarios of interest, and methods for doing so are discussed below.

Parametric Weather Generators

Most climate projections from GCMs are given in terms of monthly values, such as the monthly mean precipitation and temperature, as well as the variances of these monthly means. Wilks (1992) describes a method for adjusting parametric weather generators of the Richardson type in order to reproduce these projected means and variances. This method, described here, has been used for agricultural (Mearns et al., 1996; Riha et al., 1996) and hydrologic (Woodbury and Shoemaker, 2012) climate change assessments.

Recall from Part I that the first step of the Richardson generator is to simulate the daily precipitation states with a first order Markov chain, and then the precipitation amounts on wet days by independently drawing from a Gamma distribution. Thus, the total monthly precipitation is a function of the transition probabilities describing the Markov chain and the parameters of the Gamma distribution describing precipitation amounts. The transition probabilities of the Markov chain, p01 and p11, represent the probabilities of transitioning from a dry day to a wet day, or a wet day to another wet day, respectively. An alternative representation of this model shown by Katz (1983) is with two different parameters, π and d, where π = p01/(1 + p01 p11) and d = p11p01. Here π represents the unconditional probability of a wet day, and d represents the first-order autocorrelation of the Markov chain.

Letting SN be the sum of N daily precipitation amounts in a month (or equivalently, the monthly mean precipitation), Katz (1983) shows that if the precipitation amounts come from a Gamma distribution with shape parameter α and scale parameter β, the mean, μ, and variance, σ2, of SN are described by the following equations:

(1) μ(SN) = Nπαβ

(2) σ2(SN) = Nπαβ2[1 + α(1-π)(1+d)/(1-d)].

For climate change impact studies, one must find a set of parameters θ=(π,d,α,β) that satisfies the two equations above for the projected monthly mean precipitation amounts and variances of those means. Since there are only 2 equations but 4 unknowns, 2 additional constraints are required to fully specify the parameters (Wilks, 1992). For example, one might assume that the frequency and persistence of precipitation do not change, but that the mean and variance of the amounts distribution do. In that case, π and d would be unchanged from their estimates derived from the historical record, while α and β would be re-estimated to satisfy Equations (1) and (2). Other constraints can be chosen based on the intentions of the impacts assessment, or varied as part of a sensitivity analysis.

To modify the temperature-specific parameters, recall that in the Richardson generator, the daily mean and standard deviation of the non-precipitation variables are modeled separately on wet and dry days by annual Fourier harmonics. Standardized residuals of daily minimum and maximum temperature are calculated for each day by subtracting the daily mean and dividing by the daily standard deviation given by these harmonics. The standardized residuals are then modeled using a first-order vector auto-regression, or VAR(1) model.

For generating weather conditional on climate change projections, Wilks (1992) assumes that the daily temperature auto and cross correlation structure remains the same under the future climate so that the VAR(1) model parameters are unchanged. However, the harmonics describing the mean and standard deviation of daily minimum and maximum temperature must be modified to capture projected temperature changes. GCM projections of temperature changes do not usually distinguish between wet and dry days, but it is reasonable to assume the changes are the same on both days (Wilks, 1992). However, it is not reasonable to assume that changes in minimum and maximum temperatures are the same, as observations indicate that minimum temperatures are increasing by more than maximum temperatures (Easterling et al., 1997; Vose et al., 2005).

Approximating the mean temperature, T, on any day t as the average of that day’s mean maximum temperature, µmax(t), and mean minimum temperature, µmin(t), the projected change in that day’s mean temperature, ΔT(t), can be modeled by Equation 3:

(3) \Delta \overline{T}\left(t\right) = \frac{1}{2}\left[\mu_{min}\left(t\right) + \mu_{max}\left(t\right)\right] = \frac{1}{2} \left(CX_0 + CX_1\cos\left[\frac{2\pi\left(t-\phi\right)}{365}\right] + CN_0 + CN_1\cos\left[\frac{2\pi\left(t-\phi\right)}{365}\right]\right)

where CX0 and CN0 represent the annual average changes in maximum and minimum temperatures, respectively, and CX1 and CN1 the corresponding amplitudes of the annual harmonics. The phase angle, φ, represents the day of the year with the greatest temperature change between the current and projected climate, which is generally assumed to be the same for the maximum and minimum temperature. Since GCMs predict that warming will be greater in the winter than the summer, a reasonable value of φ is 21 for January 21st, the middle of winter (Wilks, 1992).

In order to use Equation 3 to estimate harmonics of mean minimum and maximum temperature under the projected climate, one must estimate the values of CX0, CN0, CX1 and CN1. Wilks (1992) suggests a system of four equations that can be used to estimate these parameters:

(4) ΔT = 0.5*(CX0 + CN0)

(5) Δ[T(JJA) – T(DJF)] = -0.895(CX1 + CN1)

(6) ΔDR(DJF) = CX0 − CN0 + 0.895(CX1 − CN1)

(7) ΔDR(JJA) = CX0 − CN0 − 0.895(CX1 − CN1)

where the left hand sides of Equations (4)-(7) represent the annual average temperature change, the change in temperature range between summer (JJF) and winter (DJF), the change in average diurnal temperature range (DR = µmax – µmin) in winter, and the change in average diurnal temperature range in summer, respectively. The constant ±0.895 is simply the average value of the cosine term in equation (3) evaluated at φ = 21 for the winter (+) and summer (−) seasons. The values for the left hand side of these equations can be taken from GCM projections, either as transient functions of time or as step changes.

Equations (4)-(7) can be used to estimate the mean minimum and maximum temperature harmonics for the modified weather generator, but the variance in these means may also change. Unlike changes in mean daily minimum and maximum temperature, it is fair to assume that changes in the standard deviation of these means are the same as each other and the GCM projections for changes in the standard deviation of daily mean temperature for both wet and dry days. Thus, harmonics modeling the standard deviation of daily minimum and maximum temperature on wet and dry days can simply be scaled by some factor σd’/ σd, where σd is the standard deviation of the daily mean temperature under the current climate, and σd’ is the standard deviation of the daily mean temperature under the climate change projection (Wilks, 1992). Like the daily mean temperature changes, this ratio can be specified as a transient function of time or a step change.

It should be noted that several unanticipated changes can occur from the modifications described above. For instance, if one modifies the probability of daily precipitation occurrence, this will change both the mean daily temperature (since temperature is a function of whether or not it rains) and its variance and autocorrelation (Katz, 1996). See Katz (1996) for additional examples and suggested modifications to overcome these potential problems.

Non-parametric Weather Generators

As described in Part II, most non-parametric and semi-parametric weather generators simulate weather data by resampling historical data. One drawback to this approach is that it does not simulate any data outside of the observed record; it simply re-orders them. Modifications to the simple resampling approach have been used in some stationary studies (Prairie et al., 2006; Leander and Buishand, 2009) as mentioned in Part II, and can be made for climate change studies as well. Steinschneider and Brown (2013) investigate several methods on their semi-parametric weather generator. Since their generator does have some parameters (specifically, transition probabilities for a spatially averaged Markov chain model of precipitation amounts), these can be modified using the methods described by Wilks (1992). For the non-parametric part of the generator, Steinschneider and Brown (2013) modify the resampled data itself using a few different techniques.

The first two methods they explore are common in climate change assessments: applying scaling factors to precipitation data and delta shifts to temperature data. Using the scaling factor method, resampled data for variable i, xi, are simply multiplied by a scaling factor, a, to produce simulated weather under climate change, axi. Using delta shifts, resampled data, xi, are increased (or decreased) by a specified amount, δ, to produce simulated weather under climate change, xi + δ.

Another more sophisticated method is the quantile mapping approach. This procedure is generally applied to parametric CDFs, but can also be applied to empirical CDFs, as was done by Steinschneider and Brown (2013). Under the quantile mapping approach, historical data of the random variable, X, are assumed to come from some distribution, FX, under the current climate. The CDF of X under climate change can be specified by a different target distribution, FX*. Simulated weather variables xi under current climate conditions can then be mapped to values under the projected climate conditions, xi*, by equating their values to those of the same quantiles in the target distribution, i.e. xi* = F*-1(F(xi)).

While simple, these methods are effective approaches for top-down or bottom-up robustness analyses. Unfortunately, what one often finds from such analyses is that there is a tradeoff between meeting robustness criteria in one objective, and sacrificing performance in another, termed regret. Fortunately, this tradeoff can occasionally be avoided if there is an informative climate signal that can be used to inform water management policies. In particular, skillful seasonal climate forecasts can be used to condition water management plans for the upcoming season. In order to evaluate these conditioned plans, one can generate synthetic weather consistent with such forecasts by again modifying the parameters or resampling schemes of a stochastic weather generator. Methods that can be used to modify weather generators consistent with seasonal climate forecasts will be discussed in my final blog post on synthetic weather generators.

Works Cited

Easterling, D. R., Horton, B., Jones, P. D., Peterson, T. C., Karl, T. R., Parker, D. E., et al. Maximum and minimum temperature trends for the globe. Science, 277(5324), 364-367.

Katz, R. W. (1983). Statistical procedures for making inferences about precipitation changes simulated by an atmospheric general circulation model. Journal of the Atmospheric Sciences, 40(9), 2193-2201.

Katz, R. W. (1996). Use of conditional stochastic models to generate climate change scenarios. Climatic Change, 32(3), 237-255.

Leander, R., & Buishand, T. A. (2009). A daily weather generator based on a two-stage resampling algorithm. Journal of Hydrology, 374, 185-195.

Mearns, L. O., Rosenzweig, C., & Goldberg, R. (1996). The effect of changes in daily and interannual climatic variability on CERES-Wheat: a sensitivity study. Climatic Change, 32, 257-292.

Prairie, J. R., Rajagopalan, B., Fulp, T. J., & Zagona, E. A. (2006). Modified K-NN model for stochastic streamflow simulation. Journal of Hydrologic Engineering11(4), 371-378.

Richardson, C. W. (1981). Stochastic simulation of daily precipitation, temperature and solar radiation. Water Resources Research, 17, 182-190.

Riha, S. J., Wilks, D. S., & Simoens, P. (1996). Impact of temperature and precipitation variability on crop model predictions. Climatic Change, 32, 293-311.

Steinschneider, S., & Brown, C. (2013). A semiparametric multivariate, multisite weather generator with low-frequency variability for use in climate risk assessments. Water Resources Research, 49, 7205-7220.

Vose, R. S., Easterling, D. R., & Gleason, B. (2005). Maximum and minimum temperature trends for the globe: An update through 2004. Geophysical research letters, 32(23).

Wilks, D. S. (1992). Adapting stochastic weather generation algorithms for climate change studies. Climatic Change, 22(1), 67-84.

Woodbury, J., & Shoemaker, C. A. (2012). Stochastic assessment of long-term impacts of phosphorus management options on sustainability with and without climate change. Journal of Water Resources Planning and Management, 139(5), 512-519.

Synthetic Weather Generation: Part II

Non-Parametric Generators

This is the second part of a blog series on synthetic weather generators. In Part I, I described common parametric methods of weather generation. Here I am going to discuss a subset of non-parametric methods. There are several techniques for non-parametric weather generation (Apipattanavis et al., 2007): simulating from empirical distributions of wet and dry spells and precipitation amounts (Semenov and Porter, 1995); using simulated annealing for generating precipitation (Bárdossy, 1998) and neural networks for generating temperature (Trigo and Palutikof, 1999); fitting multivariate PDFs to several weather variables using Kernel-based methods (Rajagopalan et al, 1997); and resampling historical observations (Young, 1994). For brevity, all of the methods I will discuss employ the nearest neighbor, or k-nn, bootstrap resampling technique introduced by Lall and Sharma (1996). Many of you are probably familiar with this method from streamflow generation, as our group has been using it to disaggregate monthly and annual flows to daily values using an adaptation of Lall and Sharma’s method described by Nowak, et al. (2010).

The original purpose of the nearest-neighbor bootstrap was to reproduce the autocorrelation in simulated time series without assuming a parametric generating process (Lall and Sharma, 1996). Traditionally, auto-correlated time series are modeled using autoregressive (AR) or autoregressive moving average (ARMA) models, as described in Part I for simulating non-precipitation variables in parametric weather generators. Recall that in the AR model, the values of the non-precipitation variables at any given time step are linear combinations of the values of the non-precipitation variables at the previous time step(s), plus some Gaussian-distributed random error. Hence, ARMA models assume a linear generating process with independent, normally distributed errors. Non-parametric methods do not require that these assumptions be met.

The most common non-parametric method is the bootstrap, in which each simulated value is generated by independently sampling (with replacement) from the historical record (Efron, 1979; Efron and Tibshirani, 1993). This is commonly used to generate confidence intervals. If simulating time series, though, this method will not reproduce the auto-correlation of the observations. For this reason, the moving-blocks bootstrap has often been used, where “blocks” of consecutive values are sampled jointly with replacement (Kunsch, 1989; Vogel and Shallcross, 1996). This will preserve the correlation structure within each block, but not between blocks. The k-nn bootstrap, however, can reproduce the auto-correlation in the simulated time series without discontinuities between blocks (Lall and Sharma, 1996).

In the simplest case of the nearest neighbor bootstrap, the historical record is divided into overlapping blocks of length d, where d represents the dependency structure of the time series. So if d=2, then each value in the time series depends on the preceding two values. To simulate the next value in the time series, the most recent block of length d is compared to all historical blocks of length d to find the k “nearest” neighbors (hence k-nn). How “near” two blocks are to one another is determined using some measure of distance, such as Euclidean distance. The k nearest neighbors are then sorted and assigned a rank j from 1 to k, where 1 is the closest and k the furthest. The next simulated value will be that which succeeds one of the k nearest neighbors, where the probability of selecting the successor of each neighbor is inversely proportional to the neighbor’s rank, j.

P(j) = (1/j) / Σi=1,…,k (1/i)

To simulate the second value, the most recent block of length d will now include the first simulated value, as well as the last d-1 historical values, and the process repeats.

Lall and Sharma recommend that d and k be chosen using the Generalized Cross Validation score (GCV) (Craven and Wahba, 1979). However, they also note that choosing k = √n, where n is the length of the historical record, works well for 1 ≤ d ≤ 6 and n ≥ 100. Using d=1, k = √n and selecting neighbors only from the month preceding that which is being simulated, Lall and Sharma demonstrate the k-nn bootstrap for simulating monthly streamflows at a snowmelt-fed mountain stream in Utah. They find that it reproduces the first two log-space moments and auto-correlation in monthly flows just as well as an AR(1) model on the log-transformed flows, but does a much better job reproducing the historical skew.

Because of its simplicity and successful reproduction of observed statistics, Rajagopalan and Lall (1999) extend the k-nn bootstrap for weather generation. In their application, d is again set to 1, so the weather on any given day depends only on the previous day’s weather. Unlike Lall and Sharma’s univariate application of the nearest neighbor bootstrap for streamflow, Rajagopalan and Lall simulate multiple variables simultaneously. In their study, a six dimensional vector of solar radiation, maximum temperature, minimum temperature, average dew point temperature, average wind speed and precipitation is simulated each day by resampling from all historical vectors of these variables.

First, the time series of each variable is “de-seasonalized” by subtracting the calendar day’s mean and dividing by the calendar day’s standard deviation. Next, the most recent day’s vector is compared to all historical daily vectors from the same season using Euclidean distance. Seasons were defined as JFM, AMJ, JAS, and OND. The k nearest neighbors are then ranked and one probabilistically selected using the same Kernel density estimator given by Lall and Sharma. The vector of de-seasonalized weather variables on the day following the selected historical neighbor is chosen as the next simulated vector, after “re-seasonalizing” by back-transforming the weather variables. Rajagopalan and Lall find that this method better captures the auto- and cross-correlation structure between the different weather variables than a modification of the popular parametric method used by Richardson (1981), described in Part I of this series.

One shortcoming of the nearest neighbor bootstrap is that it will not simulate any values outside of the range of observations; it will only re-order them. To overcome this deficiency, Lall and Sharma (1996) suggest, and Prairie et al. (2006) implement, a modification to the k-nn approach. Using d=1 for monthly streamflows, a particular month’s streamflow is regressed on the previous month’s streamflow using locally-weighted polynomial regression on the k nearest neighbors. The order, p, of the polynomial is selected using the GCV score. The residuals of the regression are saved for future use in the simulation.

The first step in the simulation is to predict the expected flow in the next month using the local regression. Next, the k nearest neighbors to the current month are found using Euclidean distance and given the probability of selection suggested by Lall and Sharma. After a neighbor has been selected, its residual from the locally weighted polynomial regression is added to the expected flow given by the regression, and the process is repeated. This will yield simulated flows that are outside of those observed in the historical record.

One problem with this method is that adding a negative residual could result in the simulation of negative values for strictly positive variables like precipitation. Leander and Buishand (2009) modify this approach by multiplying the expected value from the regression by a positive residual factor. The residual factor is calculated as the quotient when the observation is divided by the expected value of the regression, rather than the difference when the expected value of the regression is subtracted from the observation.

Another noted problem with the k-nn bootstrap for weather generation is that it often under-simulates the lengths of wet spells and dry spells (Apipattanavis et al., 2007). This is because precipitation is intermittent, while the other weather variables are not, but all variables are included simultaneously in the k-nn resampling. Apipattanavis et al. (2007) developed a semi-parametric approach to solve this problem. In their weather generator, they adopt the Richardson approach to modeling the occurrence of wet and dry days through a lag-1 Markov process. Once the precipitation state is determined, the k-nn resampling method is used, but with only neighbors of that precipitation state considered for selection.

Steinschneider and Brown (2013) expand on the Apipattanavis et al. (2007) generator by incorporating wavelet decomposition to better capture low-frequency variability in weather simulation, as both parametric and non-parametric weather generators are known to under-simulate inter-annual variability (Wilks and Wilby, 1999). Their generator uses a method developed by Kwon et al. (2007) in which the wavelet transform is used to decompose the historical time series of annual precipitation totals into H orthogonal component series, zh (plus a residual noise component), that represent different low-frequency signals of annual precipitation totals. Each signal, zh, is then simulated separately using auto-regressive models. The simulated precipitation time series is the sum of these H component models. This approach is called WARM for wavelet auto-regressive model.

The time series of annual precipitation totals produced by the WARM simulation model is used to condition the sampling of daily weather data using Apipattanavis et al.’s method. First, the precipitation total in each year of the WARM simulation is compared to the precipitation in all historical years using Euclidean distance. As in the k-nn generator, the k nearest neighbors are ranked and given a probability of selection according to Lall and Sharma’s Kernel density function. Next, instead of sampling one of these neighbors, 100 of them are selected with replacement, so there will be some repeats in the sampled set. The Apipattanavis generator is then used to generate daily weather data from this set of 100 sampled years rather than the entire historical record. Effectively, this method conditions the simulation such that data from years most similar to the WARM-simulated year are given a greater probability of being re-sampled.

Another characteristic of the Apipattanavis et al. (2007) and Steinschneider and Brown (2013) weather generators is their ability to simulate correlated weather across multiple sites simultaneously. Part III discusses methods used by these authors and others to generate spatially consistent weather using parametric or non-parametric weather generators.

Works Cited

Apipattanavis, A., Podestá, G., Rajagopalan, B., & Katz, R. W. (2007). A semiparametric multivariate and multisite weather generator. Water Resources Research, 43.

Bárdossy, A. (1998). Generating precipitation time series using simulated annealing. Water Resources Research34(7), 1737-1744.

Craven, P., & Wahba, G. (1978). Smoothing noisy data with spline functions. Numerische Mathematik31(4), 377-403.

Efron, B. (1979). Bootstrap methods: another look at the jackknife. The annals of Statistics, 1-26.

Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap. CRC press.

Kunsch, H. R. (1989). The jackknife and the bootstrap for general stationary observations. The Annals of Statistics, 1217-1241.

Kwon, H. H., Lall, U., & Khalil, A. F. (2007). Stochastic simulation model for nonstationary time series using an autoregressive wavelet decomposition: Applications to rainfall and temperature. Water Resources Research43(5).

Lall, U., & Sharma, A. (1996). A nearest neighbor bootstrap for resampling hydrologic time series. Water Resources Research, 32(3), 679-693.

Leander, R., & Buishand, T. A. (2009). A daily weather generator based on a two-stage resampling algorithm. Journal of Hydrology, 374, 185-195.

Nowak, K., Prairie, J., Rajagopalan, B., & Lall, U. (2010). A nonparametric stochastic approach for multisite disaggregation of annual to daily streamflow. Water Resources Research, 46.

Prairie, J. R., Rajagopalan, B., Fulp, T. J., & Zagona, E. A. (2006). Modified K-NN model for stochastic streamflow simulation. Journal of Hydrologic Engineering11(4), 371-378.

Rajagopalan, B., & Lall, U. (1999). A k‐nearest‐neighbor simulator for daily precipitation and other weather variables. Water Resources Research35(10), 3089-3101.

Rajagopalan, B., Lall, U., Tarboton, D. G., & Bowles, D. S. (1997). Multivariate nonparametric resampling scheme for generation of daily weather variables. Stochastic Hydrology and Hydraulics11(1), 65-93.

Semenov, M. A., & Porter, J. R. (1995). Climatic variability and the modelling of crop yields. Agricultural and forest meteorology73(3), 265-283.

Steinschneider, S., & Brown, C. (2013). A semiparametric multivariate, multisite weather generator with low-frequency variability for use in climate risk assessments. Water Resources Research, 49, 7205-7220.

Trigo, R. M., & Palutikof, J. P. (1999). Simulation of daily temperatures for climate change scenarios over Portugal: a neural network model approach.Climate Research13(1), 45-59.

Vogel, R. M., & Shallcross, A. L. (1996). The moving blocks bootstrap versus parametric time series models. Water Resources Research, 32(6), 1875-1882.

Wilks, D. S., & Wilby, R. L. (1999). The weather generation game: a review of stochastic weather models. Progress in Physical Geography, 23(3), 329-357.

Young, K. C. (1994). A multivariate chain model for simulating climatic parameters from daily data. Journal of Applied Meteorology33(6), 661-671.

Synthetic Weather Generation: Part I

Parametric Generators

As water resources systems engineers, we spend a lot of time evaluating and comparing the performance of different water management policies. This is often achieved by modeling the system behavior under each policy when subjected to historical or synthetically generated hydrology. Testing performance under synthetic hydrology allows us to see how well the policy generalizes across plausible futures. One can generate synthetic hydrologies by directly simulating future streamflows, or by indirectly simulating them through a hydrologic model fed by real or synthetically generated weather conditions. Weather generation methods are also useful for predicting how hydrology will change given seasonal climate forecasts or long-term climate change projections.

Our group seems to be fairly well-versed in direct streamflow generation methods, but less familiar with indirect, weather generation methods. There are several similarities, though. First, both streamflow and weather (precipitation, temperature, solar radiation, etc.) can be simulated through “parametric” or “non-parametric” methods. In parametric methods, the streamflow and weather characteristics are assumed to come from some known distribution, whose parameters are fit with the historical record. In non-parametric methods, no such distribution is assumed, and instead future realizations are simulated by resampling from the historical record. Some methods, called “semi-parametric,” combine elements of each of these.

This will be the first of a series of blog posts on weather generators. In this blog, I will describe a handful of parametric weather generation techniques. In Part II, I will describe several non-parametric and semi-parametric weather generators. In Part III, I will cover different techniques for correlating weather generated at multiple sites. Finally, in Parts IV and V, I will discuss different methods of conditioning weather generation on climate change projections and seasonal climate forecasts, respectively. These blog posts will by no means be an exhaustive review of the literature, but a quick glimpse into commonly used weather generation techniques.

Generating Precipitation

Most parametric weather generation methods stem from the weather generator developed by Richardson (1981). In Richardson’s simulator, daily minimum temperature, maximum temperature and solar radiation are generated simultaneously, conditional on the occurrence or not of precipitation. Since both temperature and solar radiation are conditioned on whether or not precipitation has occurred, the first step is to generate a sequence of wet and dry days. This is done by assuming the occurrence of precipitation is a first-order Markov process, meaning whether or not it rains today depends solely on whether or not it rained yesterday.

Using the method of moments (MM), one can easily estimate the probability that today will be wet given that yesterday was wet or dry. Denoting a wet day at time t as Wt and a dry day at time t as Dt,

P(Wt+1|Dt) = (# of dry days followed by a wet day) / (# of dry days)

P(Wt+1|Wt) = (# of wet days followed by a wet day) / (# of wet days)

And by the law of total probability:

P(Dt+1|Dt) = 1 – P(Wt+1|Dt)

P(Dt+1|Wt) = 1 – P(Wt+1|Wt)

These estimates of the transition probabilities are the same as the Maximum Likelihood Estimates (MLE).

Wilks (1999) finds that the first-order Markov process often works well, except in the case of really dry climates, where it tends to under-simulate the lengths of dry spells. In this case, one could instead fit higher (k-)order Markov models (Chin, 1977; Gates and Tong, 1976), where the occurrence of rainy days depends on whether or not it rained on each of the previous k days. Unfortunately, the number of parameters that need to be estimated for a Markov model of order k is 2k, so a linear increase in the order of the process requires an exponential increase in the number of estimated parameters. Since it is often only dry spell lengths that are under-simulated, one could instead fit a hybrid-order Markov model where the occurrence of a wet or dry day depends on the previous min(l, k) days, where l is the number of days since it last rained, and k is the order of the hybrid model (Stern and Coe, 1984).

Another way to better capture persistence is to simulate the occurrence of wet and dry days by alternating between simulating the lengths of wet spells (consecutive wet days) and dry spells. A first order Markov process will create wet and dry spells whose lengths are geometrically distributed with parameter p, where p = P(Wt+1|Dt) for dry spells, and P(Dt+1|Wt) for wet spells. Racsko et al. (1991) found that they could better simulate dry spells in Hungary by using a mixed geometric distribution, where the probability of getting a dry spell, X, of length x is:

P(X = x) = αp1(1-p1)x-1 + (1-α)p2(1-p2)x-1.

So in this model, (100×α)% of dry spells are generated from a geometric distribution with parameter p1, and the rest of them are generated from a geometric distribution with parameter p2, thus requiring the estimation of three parameters. Since only dry spells were being under-simulated by the one-parameter geometric distribution, Racsko et al. (1991) still simulated wet spells from the one-parameter geometric distribution, requiring a total of four parameter estimates. Alternatively, one could simulate wet and dry spells from the negative binomial distribution (Wilby et al., 1998), which requires four total parameters if used for both wet and dry spells, and three if used only for dry spells while the geometric is used for wet spells.

Regardless of how precipitation occurrences are determined, once a wet day has been simulated, the amount of rainfall on that day must also be generated. It is often assumed that the amount of rainfall on any given day is independent of the amount of rainfall on any other day. While the auto-correlation in precipitation amounts is statistically significant, the consequences of assuming independence are not (Wilks, 1999). Richardson assumes that rainfall amounts are generated from an exponential distribution, which only has one parameter, λ, that must be estimated. More commonly (Katz, 1977; Wilks, 1989, 1992) a Gamma distribution is assumed, which has two parameters, α and β. Some authors (Foufoula-Georgiou and Lettenmaier, 1987; Wilks, 1998; 1999; Woolhiser and Roldan, 1982) find that a mixed exponential distribution works best. Like the mixed geometric distribution, a mixed exponential distribution is a weighted average of two exponential distributions:

f(x) = αλ1exp(-λ1x) + (1-α) λ2exp(-λ2x)

So on (100×α)% of wet days, rainfall amounts are generated from an exponential distribution with parameter λ1, and the rest of the time they are generated from an exponential distribution with parameter λ2.

Each of these models has different numbers of parameters that must be estimated; 1 for exponential, 2 for gamma, and 3 for mixed exponential. Using additional parameters should improve the fit, but at the risk of overfitting the data. One method of choosing between these models is to use the Akaike Information Criterion (AIC) which compares the log likelihood function of the different fitted models but with a penalty for the number of parameters estimated. The model with the lowest AIC is said to lose the least information in representing the generating process of the data. Another option is the Bayesian Information Criterion (BIC), which more harshly penalizes the use of additional parameters (Wilks and Wilby, 1999).

In Richardson’s model, he lumps the data from 26 14-day periods in each year of the historical record and calculates the transition probabilities and parameters of the amounts distribution over each of those 2-week periods. He then fits a harmonic function to the time series of transition probabilities and parameter estimates to capture their seasonality. So each day, the transition probabilities and distribution parameters are determined by this continuous function (see Figure 4 from his paper, below). It is more common to simply calculate transition probabilities and parameters of the amounts distribution each month, and use those for precipitation generation each day of that month (Wilks and Wilby, 1999). Despite the discontinuity from month to month, this method does a good job reproducing the seasonal pattern of observed weather.


Generating Temperature and Solar Radiation

Once the daily rainfall occurrence and amounts have been generated, the daily minimum temperature, maximum temperature and solar radiation can be simulated. These must be conditioned on the occurrence of rainfall because they are not independent; a rainy day in July is likely to be cooler than a dry day in July due to, among other reasons, increased cloud cover blocking incident solar radiation. Clearly, then, temperature and radiation are not only correlated with precipitation, but with each other. Additionally, like precipitation occurrence, there is great persistence in daily temperature and daily radiation time series. One model that can account for the persistence in the daily time series of each variable, as well as the cross-correlation between them, is a vector auto-regression (VAR) (Richardson, 1981). Often, a VAR(1) model is used, meaning the auto-regression is of order 1. This is similar to a Markov model of order 1, where the values of the weather variables on a particular day depend only on the values of those variables on the previous day.

To fit the vector auto-regression, one must first calculate the historical time series of “residual” minimum temperature, maximum temperature and solar radiation (Richardson, 1981). Outside of the tropics, temperature and solar radiation tend to vary sinusoidally over the course of a year, with peaks in the summer and troughs in the winter. One must first “remove” this signal before fitting the auto-regressive model. Removing the signal will yield “residuals” to which the VAR(1) model will be fit. Using minimum temperature as an example, the average minimum temperature of every January 1st in the historical record that was wet will be calculated, as well as the standard deviation. This will be done every single day, and a Fourier series fit to the resulting time series of means and standard deviations. The same will be done for the minimum temperature on dry days.

To remove the signal, for each observation of minimum temperature on a wet day in the historical record, that day’s mean wet-day minimum temperature (predicted by the Fourier model) will be subtracted, and the resulting difference divided by that day’s standard deviation of wet-day minimum temperature (predicted by the Fourier model) to produce a residual. The analogous process is performed for each observation of minimum temperature on a dry day. What will result is a historical record of “residuals”, also called “standardized anomalies”; the “anomaly” is the difference between the observation and the mean, which is “standardized” by dividing by the standard deviation. See Figure 1 from Richardson (1981) for a visual depiction of this process for solar radiation. Note that if the residuals are not normal, it may be appropriate to first transform the raw data to normality using a Box-Cox transformation, as Parlange and Katz (2000) do for radiation and wind speed in their extension of the VAR(1) model to include daily mean wind speed and dew point.


Sometimes, one might not have a long enough or complete enough historical record to calculate reliable means and standard deviations of wet day and dry day temperature and solar radiation. In this case, the daily data can be lumped to calculate average monthly wet day and dry day temperature and solar radiation data over the historical record. Fourier series can be fit to the mean and standard deviation of these time series instead, with a few caveats. Consider fitting a harmonic function to mean monthly wet day temperature. If one is going to interpolate daily values from this function, several of the daily values within each month should be above the mean and several below. However, oftentimes the peak and trough of the harmonic will lie on the interpolated value. In this case, what is supposed to be the mean daily value for that month will actually be the minimum or maximum. To account for this, one can adjust the amplitudes of the harmonics using a method described by Epstein (1991). A second caveat is that the standard deviation in average monthly wet day temperature will be much smaller than the standard deviation in average daily wet day temperature. If the variance of some random variable, X, is σ2, the variance in its average is σ2/n, where n is the number of observations being averaged. For this reason, the standard deviation of the average monthly wet day temperature must first be multiplied by √n before fitting a Fourier series to the monthly values, which again should be adjusted using Epstein’s method.

The residual time series resulting from this process should be auto-correlated within a particular variable’s time series, and cross-correlated between the three variables of minimum temperature, maximum temperature and solar radiation. This is captured by a VAR(1) process:

z(t) = [A] z(t-1) + [B] ε(t)

where z(t) and z(t-1) are 3×1 vectors of the residuals of the three variables at times t and t-1­, respectively, ε(t) is a 3×1 vector of independent standard normal random variables at time t, and [A] and [B] are 3×3 matrices that together capture the auto- and cross-correlation between the time series of the three variables. [A] and [B] are determined by the following two equations:

[A] = [S1][S]-1

[B][B]T = [S] – [A][S1]T

where [S] is the covariance matrix of the three variables, and [S1] the matrix of lagged (by one time step) covariances of the three variables. [B] can be found using either Cholesky decomposition or spectral decomposition (Wilks, 2011, pp. 470-481, 500-504). Sometimes [A] and [B] will be similar throughout the year, while seasonal changes in the autocovariance structure may require different estimates for different times of the year (Wilks and Wilby, 1999).

Once all of these parameters have been fit, you can start generating synthetic weather!

Works Cited

Chin, E. H. (1977). Modeling daily precipitation occurrence process with Markov chain. Water Resources Research, 13, 949-956.

Epstein, E. S. (1991). On obtaining daily climatological values from monthly means. Journal of Climate, 4, 365-368.

FouFoula-Georgiou, E., & Lettenmaier, D. P. (1987). A Markov renewal model for rainfall occurrences. Water Resources Research, 23, 875-884.

Gates, P., & Tong, H. (1976). On Markov chain modeling to some weather data. Journal of Applied Meteorology, 15, 1145-1151.

Katz, R. W. (1977). Precipitation as a chain-dependent process. Journal of Applied Meteorology, 16, 671-676.

Parlange, M. B., & Katz, R. W. (2000). An extended version of the Richardson model for simulating daily weather variables. Journal of Applied Meteorology, 39, 610-622.

Racsko, P., Szeidl, L., & Semenov, M. (1991). A serial approach to local stochastic weather models. Ecological Modelling, 57, 27-41.

Richardson, C. W. (1981). Stochastic simulation of daily precipitation, temperature and solar radiation. Water Resources Research, 17, 182-190.

Stern, R. D., & Coe, R. (1984). A model fitting analysis of daily rainfall data. Journal of the Royal Statistical Society, A147, 1-34.

Wilby, R. L., Wigley, T. M. L., Conway, D., Jones, P. D., Hewitson, B. C., Main, J., & Wilks, D. S. (1998). Statistical downscaling of general circulation model output: a comparison of methods. Water Resources Research, 34, 2995-3008.

Wilks, D. S. (1989). Conditioning stochastic daily precipitation models on total monthly precipitation. Water Resources Research, 25, 1429-1439.

Wilks, D. S. (1992). Adapting stochastic weather generation algorithms for climate change studies. Climatic Change, 22, 67-84.

Wilks, D. S. (1998). Multisite generalization of a daily stochastic precipitation generation model. Journal of Hydrology, 210, 178-191.

Wilks, D. S. (1999). Interannual variability and extreme-value characteristics of several stochastic daily precipitation models. Agricultural and Forest Meteorology, 93, 153-169.

Wilks, D. S. (2011). Statistical methods in the atmospheric sciences (Vol. 100). Academic press.

Wilks, D. S., & Wilby, R. L. (1999). The weather generation game: a review of stochastic weather models. Progress in Physical Geography, 23(3), 329-357.

Woolhiser, D. A., & Roldan, J. (1982). Stochastic daily precipitation models. 2. A comparison of distribution amounts. Water Resources Research, 18, 1461-1468.