Synthetic Weather Generation: Part I

Parametric Generators

As water resources systems engineers, we spend a lot of time evaluating and comparing the performance of different water management policies. This is often achieved by modeling the system behavior under each policy when subjected to historical or synthetically generated hydrology. Testing performance under synthetic hydrology allows us to see how well the policy generalizes across plausible futures. One can generate synthetic hydrologies by directly simulating future streamflows, or by indirectly simulating them through a hydrologic model fed by real or synthetically generated weather conditions. Weather generation methods are also useful for predicting how hydrology will change given seasonal climate forecasts or long-term climate change projections.

Our group seems to be fairly well-versed in direct streamflow generation methods, but less familiar with indirect, weather generation methods. There are several similarities, though. First, both streamflow and weather (precipitation, temperature, solar radiation, etc.) can be simulated through “parametric” or “non-parametric” methods. In parametric methods, the streamflow and weather characteristics are assumed to come from some known distribution, whose parameters are fit with the historical record. In non-parametric methods, no such distribution is assumed, and instead future realizations are simulated by resampling from the historical record. Some methods, called “semi-parametric,” combine elements of each of these.

This will be the first of a series of blog posts on weather generators. In this blog, I will describe a handful of parametric weather generation techniques. In Part II, I will describe several non-parametric and semi-parametric weather generators. In Part III, I will cover different techniques for correlating weather generated at multiple sites. Finally, in Parts IV and V, I will discuss different methods of conditioning weather generation on climate change projections and seasonal climate forecasts, respectively. These blog posts will by no means be an exhaustive review of the literature, but a quick glimpse into commonly used weather generation techniques.

Generating Precipitation

Most parametric weather generation methods stem from the weather generator developed by Richardson (1981). In Richardson’s simulator, daily minimum temperature, maximum temperature and solar radiation are generated simultaneously, conditional on the occurrence or not of precipitation. Since both temperature and solar radiation are conditioned on whether or not precipitation has occurred, the first step is to generate a sequence of wet and dry days. This is done by assuming the occurrence of precipitation is a first-order Markov process, meaning whether or not it rains today depends solely on whether or not it rained yesterday.

Using the method of moments (MM), one can easily estimate the probability that today will be wet given that yesterday was wet or dry. Denoting a wet day at time t as Wt and a dry day at time t as Dt,

P(Wt+1|Dt) = (# of dry days followed by a wet day) / (# of dry days)

P(Wt+1|Wt) = (# of wet days followed by a wet day) / (# of wet days)

And by the law of total probability:

P(Dt+1|Dt) = 1 – P(Wt+1|Dt)

P(Dt+1|Wt) = 1 – P(Wt+1|Wt)

These estimates of the transition probabilities are the same as the Maximum Likelihood Estimates (MLE).

Wilks (1999) finds that the first-order Markov process often works well, except in the case of really dry climates, where it tends to under-simulate the lengths of dry spells. In this case, one could instead fit higher (k-)order Markov models (Chin, 1977; Gates and Tong, 1976), where the occurrence of rainy days depends on whether or not it rained on each of the previous k days. Unfortunately, the number of parameters that need to be estimated for a Markov model of order k is 2k, so a linear increase in the order of the process requires an exponential increase in the number of estimated parameters. Since it is often only dry spell lengths that are under-simulated, one could instead fit a hybrid-order Markov model where the occurrence of a wet or dry day depends on the previous min(l, k) days, where l is the number of days since it last rained, and k is the order of the hybrid model (Stern and Coe, 1984).

Another way to better capture persistence is to simulate the occurrence of wet and dry days by alternating between simulating the lengths of wet spells (consecutive wet days) and dry spells. A first order Markov process will create wet and dry spells whose lengths are geometrically distributed with parameter p, where p = P(Wt+1|Dt) for dry spells, and P(Dt+1|Wt) for wet spells. Racsko et al. (1991) found that they could better simulate dry spells in Hungary by using a mixed geometric distribution, where the probability of getting a dry spell, X, of length x is:

P(X = x) = αp1(1-p1)x-1 + (1-α)p2(1-p2)x-1.

So in this model, (100×α)% of dry spells are generated from a geometric distribution with parameter p1, and the rest of them are generated from a geometric distribution with parameter p2, thus requiring the estimation of three parameters. Since only dry spells were being under-simulated by the one-parameter geometric distribution, Racsko et al. (1991) still simulated wet spells from the one-parameter geometric distribution, requiring a total of four parameter estimates. Alternatively, one could simulate wet and dry spells from the negative binomial distribution (Wilby et al., 1998), which requires four total parameters if used for both wet and dry spells, and three if used only for dry spells while the geometric is used for wet spells.

Regardless of how precipitation occurrences are determined, once a wet day has been simulated, the amount of rainfall on that day must also be generated. It is often assumed that the amount of rainfall on any given day is independent of the amount of rainfall on any other day. While the auto-correlation in precipitation amounts is statistically significant, the consequences of assuming independence are not (Wilks, 1999). Richardson assumes that rainfall amounts are generated from an exponential distribution, which only has one parameter, λ, that must be estimated. More commonly (Katz, 1977; Wilks, 1989, 1992) a Gamma distribution is assumed, which has two parameters, α and β. Some authors (Foufoula-Georgiou and Lettenmaier, 1987; Wilks, 1998; 1999; Woolhiser and Roldan, 1982) find that a mixed exponential distribution works best. Like the mixed geometric distribution, a mixed exponential distribution is a weighted average of two exponential distributions:

f(x) = αλ1exp(-λ1x) + (1-α) λ2exp(-λ2x)

So on (100×α)% of wet days, rainfall amounts are generated from an exponential distribution with parameter λ1, and the rest of the time they are generated from an exponential distribution with parameter λ2.

Each of these models has different numbers of parameters that must be estimated; 1 for exponential, 2 for gamma, and 3 for mixed exponential. Using additional parameters should improve the fit, but at the risk of overfitting the data. One method of choosing between these models is to use the Akaike Information Criterion (AIC) which compares the log likelihood function of the different fitted models but with a penalty for the number of parameters estimated. The model with the lowest AIC is said to lose the least information in representing the generating process of the data. Another option is the Bayesian Information Criterion (BIC), which more harshly penalizes the use of additional parameters (Wilks and Wilby, 1999).

In Richardson’s model, he lumps the data from 26 14-day periods in each year of the historical record and calculates the transition probabilities and parameters of the amounts distribution over each of those 2-week periods. He then fits a harmonic function to the time series of transition probabilities and parameter estimates to capture their seasonality. So each day, the transition probabilities and distribution parameters are determined by this continuous function (see Figure 4 from his paper, below). It is more common to simply calculate transition probabilities and parameters of the amounts distribution each month, and use those for precipitation generation each day of that month (Wilks and Wilby, 1999). Despite the discontinuity from month to month, this method does a good job reproducing the seasonal pattern of observed weather.


Generating Temperature and Solar Radiation

Once the daily rainfall occurrence and amounts have been generated, the daily minimum temperature, maximum temperature and solar radiation can be simulated. These must be conditioned on the occurrence of rainfall because they are not independent; a rainy day in July is likely to be cooler than a dry day in July due to, among other reasons, increased cloud cover blocking incident solar radiation. Clearly, then, temperature and radiation are not only correlated with precipitation, but with each other. Additionally, like precipitation occurrence, there is great persistence in daily temperature and daily radiation time series. One model that can account for the persistence in the daily time series of each variable, as well as the cross-correlation between them, is a vector auto-regression (VAR) (Richardson, 1981). Often, a VAR(1) model is used, meaning the auto-regression is of order 1. This is similar to a Markov model of order 1, where the values of the weather variables on a particular day depend only on the values of those variables on the previous day.

To fit the vector auto-regression, one must first calculate the historical time series of “residual” minimum temperature, maximum temperature and solar radiation (Richardson, 1981). Outside of the tropics, temperature and solar radiation tend to vary sinusoidally over the course of a year, with peaks in the summer and troughs in the winter. One must first “remove” this signal before fitting the auto-regressive model. Removing the signal will yield “residuals” to which the VAR(1) model will be fit. Using minimum temperature as an example, the average minimum temperature of every January 1st in the historical record that was wet will be calculated, as well as the standard deviation. This will be done every single day, and a Fourier series fit to the resulting time series of means and standard deviations. The same will be done for the minimum temperature on dry days.

To remove the signal, for each observation of minimum temperature on a wet day in the historical record, that day’s mean wet-day minimum temperature (predicted by the Fourier model) will be subtracted, and the resulting difference divided by that day’s standard deviation of wet-day minimum temperature (predicted by the Fourier model) to produce a residual. The analogous process is performed for each observation of minimum temperature on a dry day. What will result is a historical record of “residuals”, also called “standardized anomalies”; the “anomaly” is the difference between the observation and the mean, which is “standardized” by dividing by the standard deviation. See Figure 1 from Richardson (1981) for a visual depiction of this process for solar radiation. Note that if the residuals are not normal, it may be appropriate to first transform the raw data to normality using a Box-Cox transformation, as Parlange and Katz (2000) do for radiation and wind speed in their extension of the VAR(1) model to include daily mean wind speed and dew point.


Sometimes, one might not have a long enough or complete enough historical record to calculate reliable means and standard deviations of wet day and dry day temperature and solar radiation. In this case, the daily data can be lumped to calculate average monthly wet day and dry day temperature and solar radiation data over the historical record. Fourier series can be fit to the mean and standard deviation of these time series instead, with a few caveats. Consider fitting a harmonic function to mean monthly wet day temperature. If one is going to interpolate daily values from this function, several of the daily values within each month should be above the mean and several below. However, oftentimes the peak and trough of the harmonic will lie on the interpolated value. In this case, what is supposed to be the mean daily value for that month will actually be the minimum or maximum. To account for this, one can adjust the amplitudes of the harmonics using a method described by Epstein (1991). A second caveat is that the standard deviation in average monthly wet day temperature will be much smaller than the standard deviation in average daily wet day temperature. If the variance of some random variable, X, is σ2, the variance in its average is σ2/n, where n is the number of observations being averaged. For this reason, the standard deviation of the average monthly wet day temperature must first be multiplied by √n before fitting a Fourier series to the monthly values, which again should be adjusted using Epstein’s method.

The residual time series resulting from this process should be auto-correlated within a particular variable’s time series, and cross-correlated between the three variables of minimum temperature, maximum temperature and solar radiation. This is captured by a VAR(1) process:

z(t) = [A] z(t-1) + [B] ε(t)

where z(t) and z(t-1) are 3×1 vectors of the residuals of the three variables at times t and t-1­, respectively, ε(t) is a 3×1 vector of independent standard normal random variables at time t, and [A] and [B] are 3×3 matrices that together capture the auto- and cross-correlation between the time series of the three variables. [A] and [B] are determined by the following two equations:

[A] = [S1][S]-1

[B][B]T = [S] – [A][S1]T

where [S] is the covariance matrix of the three variables, and [S1] the matrix of lagged (by one time step) covariances of the three variables. [B] can be found using either Cholesky decomposition or spectral decomposition (Wilks, 2011, pp. 470-481, 500-504). Sometimes [A] and [B] will be similar throughout the year, while seasonal changes in the autocovariance structure may require different estimates for different times of the year (Wilks and Wilby, 1999).

Once all of these parameters have been fit, you can start generating synthetic weather!

Works Cited

Chin, E. H. (1977). Modeling daily precipitation occurrence process with Markov chain. Water Resources Research, 13, 949-956.

Epstein, E. S. (1991). On obtaining daily climatological values from monthly means. Journal of Climate, 4, 365-368.

FouFoula-Georgiou, E., & Lettenmaier, D. P. (1987). A Markov renewal model for rainfall occurrences. Water Resources Research, 23, 875-884.

Gates, P., & Tong, H. (1976). On Markov chain modeling to some weather data. Journal of Applied Meteorology, 15, 1145-1151.

Katz, R. W. (1977). Precipitation as a chain-dependent process. Journal of Applied Meteorology, 16, 671-676.

Parlange, M. B., & Katz, R. W. (2000). An extended version of the Richardson model for simulating daily weather variables. Journal of Applied Meteorology, 39, 610-622.

Racsko, P., Szeidl, L., & Semenov, M. (1991). A serial approach to local stochastic weather models. Ecological Modelling, 57, 27-41.

Richardson, C. W. (1981). Stochastic simulation of daily precipitation, temperature and solar radiation. Water Resources Research, 17, 182-190.

Stern, R. D., & Coe, R. (1984). A model fitting analysis of daily rainfall data. Journal of the Royal Statistical Society, A147, 1-34.

Wilby, R. L., Wigley, T. M. L., Conway, D., Jones, P. D., Hewitson, B. C., Main, J., & Wilks, D. S. (1998). Statistical downscaling of general circulation model output: a comparison of methods. Water Resources Research, 34, 2995-3008.

Wilks, D. S. (1989). Conditioning stochastic daily precipitation models on total monthly precipitation. Water Resources Research, 25, 1429-1439.

Wilks, D. S. (1992). Adapting stochastic weather generation algorithms for climate change studies. Climatic Change, 22, 67-84.

Wilks, D. S. (1998). Multisite generalization of a daily stochastic precipitation generation model. Journal of Hydrology, 210, 178-191.

Wilks, D. S. (1999). Interannual variability and extreme-value characteristics of several stochastic daily precipitation models. Agricultural and Forest Meteorology, 93, 153-169.

Wilks, D. S. (2011). Statistical methods in the atmospheric sciences (Vol. 100). Academic press.

Wilks, D. S., & Wilby, R. L. (1999). The weather generation game: a review of stochastic weather models. Progress in Physical Geography, 23(3), 329-357.

Woolhiser, D. A., & Roldan, J. (1982). Stochastic daily precipitation models. 2. A comparison of distribution amounts. Water Resources Research, 18, 1461-1468.

9 thoughts on “Synthetic Weather Generation: Part I

  1. Pingback: Synthetic weather generation | Software Engineer InformationSoftware Engineer Information

  2. Pingback: Synthetic Weather Generation: Part II | Water Programming: A Collaborative Research Blog

  3. Pingback: Synthetic Weather Generation: Part III – Water Programming: A Collaborative Research Blog

  4. Pingback: Synthetic Weather Generation: Part IV – Water Programming: A Collaborative Research Blog

  5. Pingback: Synthetic Weather Generation: Part V – Water Programming: A Collaborative Research Blog

  6. Pingback: A simple command for plotting autocorrelation functions in Matlab – Water Programming: A Collaborative Research Blog

  7. Pingback: Synthetic streamflow generation – Water Programming: A Collaborative Research Blog

  8. Pingback: Water Programming Blog Guide (Part 2) – Water Programming: A Collaborative Research Blog

  9. Pingback: Fitting Hidden Markov Models Part I: Background and Methods – Water Programming: A Collaborative Research Blog

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s