**Multi-Site Weather Generators**

This is the third part of a blog series on synthetic weather generators. In Parts I and II, I described common parametric and non-parametric methods of weather generation, respectively. Both of these blogs only discussed how to use these methods to generate daily weather simulations at one site. However, we are often interested in generating weather at multiple sites simultaneously, for instance, if we are using the weather as input to a distributed watershed model. If there is a high degree of spatial heterogeneity within the system being modeled, applying average weather conditions everywhere will not provide representative simulations of the hydrology, nor will independently generating weather at multiple sites within the system. Fortunately, several techniques have been developed to correlate synthetic weather sequences generated at multiple nearby sites that are consistent with observed data.

This blog will focus on two methods of generating correlated weather data: 1) driving multiple single-site parametric weather generators with spatially correlated random numbers (Wilks, 1998; Wilks 1999; Wilks, 2008; Wilks, 2009; Srikanthan and Pegram, 2009; Baigorria and Jones, 2010) and 2) resampling concurrent weather data from multiple sites with a non-parametric weather generator (Buishand and Brandsma, 2001; Beersma and Buishand, 2003; Apipattanavis et al., 2007; Steinschneider and Brown, 2013). Other methods exist, such as jointly sampling weather at multiple sites from an empirical copula (Bardossy and Pegram, 2009; Serinaldi, 2009, Aghakouchak et al., 2010a; Aghakouchak et al., 2010b), modeling rainfall occurrence and amounts processes with generalized linear models (GLMs) (Kigobe et al., 2011), and disaggregating spatially averaged synthetic rainfall with a nonhomogeneous random cascade process (Jothityangkoon et al., 2000), but will not be discussed here.

**Method 1**

The first method, driving multiple single-site weather generators with spatially correlated random numbers, is described by Wilks (1998) for use with parametric generators of the Richardson type described in Part I. In the single-site formulation of this model, the first step is to generate precipitation occurrences at each site. Normally, this is done by generating a uniform random number, *u *~ U[0,1], and comparing it to the critical probably, *p _{c}*, of transitioning from the current state (wet or dry) to a wet state. If

*u*≤

*p*, the state transitions to a wet state; otherwise it transitions to a dry state.

_{c}An alternative to generating uniform random numbers for the transition computation is to generate standard normal random numbers, *w* ~ N(0,1). In this case, state transitions from the current state to a wet state occur if *w* ≤ Φ^{-1}(*p _{c}*), where Φ

^{-1}(·) is the inverse of the standard normal CDF. The benefit to using this alternative is that it is easy to generate correlated normally distributed random variables (Wilks, 2011, pp.499-500), and we will need to correlate the random number streams generated at each of

*K*sites to ensure realistic weather simulations.

Let *X _{t}*(

*k*) and

*X*(

_{t}*l*) be the precipitation states on day

*t*at sites

*k*and

*l*, respectively, where

*X*= 0 corresponds to a dry day and

*X*= 1 corresponds to a wet day. In order to simulate realistic sequences of precipitation states at these two sites, one must find the correlation between the normally distributed random variables driving the transitions, ω

^{o}(

*k*,

*l*) = ρ(

*w*(

_{t}*k*),

*w*(

_{t}*l*)), that reproduces the historical correlation between the precipitation states, ξ

^{o}(

*k*,

*l*) = ρ(

*X*(

_{t}*k*),

*X*(

_{t}*l*)). Unfortunately, this relationship must be calculated empirically by simulating precipitation states at two sites for different values of ω and calculating the corresponding values of ξ. This must be done for every one of the

*K*(

*K*-1)/2 station pairs. Figure 1 shows one such relationship for the July transition parameters at Ithaca and Freeville from Wilks (1998).

Once the relationship between ξ and ω has been mapped for each station pair, one can determine the correlation ω^{o}(*k*,*l*) that should be used for the weather generator. In the case of Ithaca and Freeville, the historical correlation in precipitation states between the two sites, ξ^{o}, is 0.800. The dotted line in Figure 1 shows that ξ^{o} = 0.8 corresponds to a value of ω^{o }= 0.957. Once all of the *K*(*K*-1)/2 pairs of correlations ω^{o}(*k*,*l*) have been determined, one can generate a daily vector of standard normal random variables, *w _{t}*

_{,}from a multi-variate normal distribution with mean vector

**0**and correlation matrix [Ω], whose elements are the correlations ω

^{o}(

*k*,

*l*). One drawback to this method is that, while it does a good job replicating the lag 0 correlations between sites, it tends to under-represent lag 1 correlations between them (Wilks, 1998).

As in the single site weather generator, after simulating whether or not rain occurs at a particular site each day, the next step is to determine the depth of rain on wet days. Not surprisingly, the conditional distribution of rainfall amounts at a particular site depends on whether or not it is raining at nearby sites. A convenient way to accommodate this is to generate the precipitation amounts at wet sites from the mixed exponential distribution (Wilks, 1998).

Recall that the mixed exponential distribution, f(*x*) = αλ_{1}exp(-λ_{1}*x*) + (1–α) λ_{2}exp(-λ_{2}*x*), has three parameters: two rate parameters, λ_{1} and λ_{2} where λ_{1} < λ_{2}, and a weight parameter, α. Thus, with probability α the rainfall amounts are generated from an exponential distribution with rate λ_{1}, and with probability 1–α they are generated from an exponential distribution with rate λ_{2}. Since the mean of the exponential distribution is 1/λ and λ_{1} < λ_{2}, rainfall amounts generated from an exponential distribution with rate λ_{1} will have a greater mean than those generated from an exponential distribution with rate λ_{2}. Therefore, rainfall amounts at sites neighboring dry sites are more likely to have come from an exponential distribution with rate λ_{2}, while rainfall amounts at wet sites neighboring other wet sites are more likely to have come from an exponential distribution with rate λ_{1}.

Keeping this in mind, one can conditionally generate rainfall amounts at a given site, *k*, from one of these two exponential distributions. As stated earlier, if *w* ≤ Φ^{-1}(*p _{c}*), or equivalently if Φ(

*w*(

*k*)) <

*p*(

_{c}*k*), then rainfall is generated at site

*k*. Because the transition probabilities at nearby sites are correlated, if Φ(

*w*(

*k*)) <<

*p*(

_{c}*k*), then it is very likely that nearby sites also transitioned to a wet state, so rainfall amounts are likely high. Therefore, if Φ(

*w*(

*k*))/

*p*(

_{c}*k*) ≤

*α*(

*k*), rainfall at the site should be generated from the exponential distribution with the greater mean, that with rate λ

_{1}. On the contrary, if Φ(

*w*(

*k*)) is less than, but close to

*p*(

_{c}*k*), it is very likely that nearby sites did not transition to a wet state, so rainfall amounts are likely lower. Therefore, if Φ(

*w*(

*k*))/

*p*(

_{c}*k*) >

*α*(

*k*), rainfall at the site should be generated from the exponential distribution with the smaller mean, that with rate λ

_{2}(Wilks, 1998).

This is a coarse method that can be used to decide on the rate parameter for rainfall generation; Wilks (1998) also describes a method to continuously vary the smaller rate parameter to improve the simulation of precipitation fields. Regardless of which method is chosen, once one has decided on the rate parameter, the rainfall amount itself can be generated from the exponential distribution with this parameter. Just as the precipitation state can be determined by generation of a standard normal random variable, so too can the precipitation amount. Letting the rainfall amount at a particular site *k* equal *r*(*k*), the depth of precipitation is simulated as *r*(*k*) = *r _{min}* – (1/λ)*ln(Φ(

*v*(

*k*))), where

*r*is the minimum threshold below which a day is recorded as dry and λ is the rate parameter determined earlier.

_{min}The final step in precipitation simulation then is to generate a daily vector of correlated standard normal random variables, ** v_{t}**, that reproduces the historical spatial correlation structure in precipitation amounts across sites. This procedure is analogous to that used to reproduce the historical spatial correlation structure in precipitation

*occurrences*across sites. However, Wilks (1998) found that this procedure did not always produce positive definite correlation matrices [Z] for generation of the multivariate normal random variables,

**. To overcome this problem, the correlations ζ**

*v*_{t}^{o}(

*k*,

*l*) making up [Z] can be determined as smooth functions of the distance between station pairs. See Wilks (1998) for further details.

After generating spatially correlated precipitation, one must then generate spatially correlated minimum temperature, maximum temperature and solar radiation. Recall from Part I that in single site weather generators of the Richardson type, these three variables are generated from a first order vector auto-regressive (VAR(1)) model. As Wilks (1999) points out, this method can easily be extended by fitting a VAR(1) model to the simultaneous minimum temperature, maximum temperature and solar radiation across all sites, rather than just at one site. This increases the dimensionality from 3 for the 3 variables at one site, to 3*K* for the 3 variables at *K* sites. However, Wilks cautions that this may be difficult due to limited solar radiation data at many sites and inconsistent correlations if the record lengths are different at the *K* sites. See Wilks (1999) for details on how one can adjust the observed historical correlations such that they are mutually consistent.

The method just described will generate weather at multiple sites with long enough historical records to fit the weather generator parameters. One can also interpolate these parameters using locally weighted regressions (Wilks, 2008) to generate weather at arbitrary locations between stations with historical weather records. Wilks (2009) shows how to do just that to generate gridded weather data, which is particularly useful for driving distributed hydrologic models.

**Method 2**

The second method, resampling concurrent weather data from multiple sites with a non-parametric weather generator, is a simpler method and a natural extension of the non-parametric single site weather generator. As discussed in Part II, most non-parametric weather generators probabilistically re-sample weather data from the historical record based on how “near” the most recently generated daily weather characteristics are to previous days in the historical record. “Nearness” is typically measured using Euclidean distance.

Buishand and Brandsma (2001) modify the single-site approach to generate weather at multiple sites by describing nearness in terms of Euclidean distance of *spatially averaged* temperature and precipitation standardized anomalies. The simulation begins with a randomly selected day from the historical record. The temperature and precipitation on that day are standardized by subtracting the historical daily mean and dividing by the historical daily standard deviation. The standardized temperature and precipitation anomalies at each site are then averaged and compared to the spatially averaged standardized anomalies on all other historical days within some window of the current day using Euclidean distance. One of the *k* nearest neighbors (*k*-nn) is then probabilistically selected using the Kernel density estimator derived by Lall and Sharma (1996), and the weather at all sites on the following day in the historical record is chosen to be the weather on the next simulated day.

While this non-parametric method of multi-site weather generation is much simpler than the parametric method of Wilks (1998), Mehrotra (2006) found that the Wilks multi-site weather generator was better able to accommodate varying orders of serial dependence at each site and still maintain the historical correlation structure across sites. Similarly, Young (1994) found that the *k*-nn weather generation approach tended to under-simulate the lengths of wet and dry spells. For this reason, Apitpattanavis et al. (2007) adopt a semi-parametric multi-site weather generation approach that combines elements of each method.

The Apipattanavis generator, also employed by Steinschneider and Brown (2013), first generates spatially averaged precipitation *states* with a Markov chain model. To begin the simulation, the historical precipitation record is first converted to a time series of spatially averaged precipitation. If the spatially averaged rainfall on the first simulated day is below 0.3 mm, it is classified as a dry day. If it is greater than the 80^{th} percentile of spatially averaged daily precipitation in the simulated month, it is classified as extremely wet. Otherwise it is simply wet. A three-state, first-order Markov chain is then used to simulate the next day’s spatially averaged precipitation state. One of the *k*-nearest neighbors from the historical record *with the same transition *(e.g. wet to extremely wet) is selected as in Buishand and Brandsma (2001). The weather at each site on the next simulated day is then chosen to be the weather at each site on the day following the selected neighbor. The main difference between these two methods then, is that Buishand and Brandsma (2001) consider all historical neighbors within some window of the simulated day, while Apipattanavis et al. (2007) only consider those with the same transition in precipitation states as is simulated.

All of the above described methods can reasonably reproduce the spatial and temporal dependence structure of weather at multiple sites, while maintaining the moments at each site. Additionally, they can be modified conditional on short-term, seasonal or long-term forecasts of climate change. That will be the subject of the next blog in this five-part series.

**Works Cited**

Aghakouchak, A., Bárdossy, A. & Habib, E. (2010a). Conditional simulation of remotely sensed rainfall data using a non-Gaussian v-transformed copula. *Advances in Water Resources, 33*, 624-634.

—. (2010b). Copula-based uncertainty modelling: application to multisensory precipitation estimates. *Hydrological Processes, 24*, 2111-2124.

Apipattanavis, S., Podesta, G., Rajagopalan, B., & Katz, R. W. (2007). A semiparametric multivariate and multisite weather generator. *Water Resources Research, 43(11)*.

Baigorria, G. A., & Jones, J. W. (2010). GiST: A stochastic model for generating spatially and temporally correlated daily rainfall data. *Journal of Climate, 23(22)*, 5990-6008.

Bárdossy, A., & Pegram, G. G. S. (2009). Copula based multisite model for daily precipitation simulation. *Hydrology and Earth System Sciences, 13*, 2299-2314.

Beersma, J. J., & Buishand, T. A. (2003). Multi-site simulation of daily precipitation and temperature conditional on the atmospheric circulation. *Climate Research, 25*, 121-133.

Buishand, T. A., & Brandsma, T. (2001). Multsite simulation of daily precipitation and temperature in the Rhine basin by nearest-neighbor resampling. *Water Resources Research, 37(11)*, 2761-2776.

Jothityangkoon, C., Sivalapan, M., & Viney, N. R. (2000). Tests of a space-time model of daily rainfall in southwestern Australia based on nonhomogeneous random cascades. *Water Resources Research, 36(1)*, 267-284.

Kigobe, M., McIntyre, N., Wheater, H., & Chandler, R. (2011). Multi-site stochastic modelling of daily rainfall in Uganda. *Hydrological Sciences Journal, 56(1)*, 17-33.

Lall, U., & Sharma, A. (1996). A nearest neighbor bootstrap for resampling hydrologic time series. *Water Resources Research, 32(3)*, 679-693.

Mehrotra, R., Srikanthan, R., & Sharma, A. (2006). A comparison of three stochastic multi-site precipitation occurrence generators. *Journal of Hydrology, 331(1-2)*, 280-292.

Richardson, C. W. (1981). Stochastic simulation of daily precipitation, temperature and solar radiation. *Water Resources Research, 17*, 182-190.

Serinaldi, F. (2009). A multisite daily rainfall generator driven by bivariate copula-based mixed distributions. *Journal of Geophysical Research: Atmospheres, 114(D10).*

Srikanthan, R., & Pegram, G. G. S. (2009). A nested multisite daily rainfall stochastic generation model. *Journal of Hydrology, 371*, 142-153.

Steinschneider, S., & Brown, C. (2013). A semiparametric multivariate, multisite weather generator with low-frequency variability for use in climate risk assessments. *Water Resources Research, 49*, 7205-7220.

Wilks, D. S. (1998). Multisite generalization of a daily stochastic precipitation generation model. *Journal of Hydrology, 210*, 178-191.

Wilks, D. S. (1999). Simultaneous stochastic simulation of daily precipitation, temperature and solar radiation at multiple sites in complex terrain. *Agricultural and Forest Meteorology, 96*, 85-101.

Wilks, D. S. (2008). High-resolution spatial interpolation of weather generator parameters using local weighted regressions. *Agricultural and Forest Meteorology, 148*, 111-120.

Wilks, D. S. (2009). A gridded multisite weather generator and synchronization to observed weather data. *Water Resources Research, 45(10)*.

Wilks, D. S. (2011). *Statistical methods in the atmospheric sciences* (Vol. 100). Academic press.

Young, K. C. (1994). A multivariate chain model for simulating climatic parameters from daily data. *Journal of Applied Meteorology*, *33*(6), 661-671.

Pingback: Synthetic Weather Generation: Part II – Water Programming: A Collaborative Research Blog

Pingback: Synthetic Weather Generation: Part I – Water Programming: A Collaborative Research Blog

Pingback: Synthetic Weather Generation: Part IV – Water Programming: A Collaborative Research Blog

Pingback: Synthetic Weather Generation: Part V – Water Programming: A Collaborative Research Blog

Pingback: Water Programming Blog Guide (Part 2) – Water Programming: A Collaborative Research Blog