April 29, 2016 by Jazmin Zatarain

Interactive plotting basics in matplotlib

The main goals of this post are:

Provide bullet points about Matplotlib’s architecture and provide documentation for more in depth exploration.
Build two examples plotting multidimensional data with basic interactive capabilities.

1. Matplotlib architecture bullets

In order to enable Matplotlib’s interactive capabilities, it doesn’t hurt to understand how it is structured. The current matplotlib architecture is comprised of three layers: the scripting layer, the artist layer and backend layer, which interact in the following way:

The data is either created or loaded in the scripting layer, this layer basically supports the programmatic interaction and provides users with the ability to manipulate figures with a syntax that is somewhat intuitive.

The data is transformed into various objects in the artist layer; it is adjusted as scripted. This layer is responsible for the abstraction of each visual component that you see in a figure.

These objects are then rendered by the backend. This last layer enables the users to create, render, and update the figure objects. Figures can be displayed and interacted with via common user interface events such as the keyboard and mouse inputs.

Matplotlib has extensive documentation; however, the best way to learn about backends is exploring the sourcecode. Some of the documents that I also found helpful, aside from the wonderful stackoverflow, are:

Raman, Kirthi. Mastering Python Data Visualization. Packt Publishing Limited, 2015.

Root, Benjamin V. “Interactive Applications Using Matplotlib.” (2015).

McGreggor, Duncan M. “Mastering matplotlib.” (2015).

2. Example: Fun click

Here’s a simple example that connects a 5D scatter plot to a pick and a mouse click event. The goal is to show information about the point being clicked. In this case, the row index of the data matrix and the corresponding values are shown in the python terminal as shown in the following figure:This could be useful when plotting a multidimensional Pareto Front. If you see a solution of interest in your scatter plot, you can directly access its exact values and its index by a simple mouse click. This index may be useful to track the corresponding decision vector in your decision matrix. Also, note that even if I am only plotting a 5D scatter plot, I used a 6 D data matrix, hence, I can see what the sixth value is on the terminal. The following code was used to generate the previous figure. Parts 2.1. through 2.5 were adapted from the Visualization strategies for multidimensional data post, refer to this link for a detailed explanation of these fragments.

2.1. Required Libraries

import pylab as plt
mpl_toolkits.mplot3d.axes3d as p3
import numpy as np
import seaborn # not required

2.2. Loading or generating data

#data= np.loadtxt('your_data.txt’)
data= X =np.random.random((50,6)) #  setting a random matrix

2.3. Accessing object oriented Matplotlib’s objective oriented plotting

fig, ax = plt.subplots(1, 1)
ax = fig.add_subplot(111, projection = '3d')
im= ax.scatter(data[:,0], data[:,1],data[:,2], c=data[:,3], s= data[:,4]*scale, alpha=1, cmap=plt.cm.spectral, picker=True)

2.4. Setting the main axis labels

ax.set_xlabel('OBJECTIVE 1')
ax.set_ylabel('OBJECTIVE 2')
ax.set_zlabel('OBJECTIVE 3')

2.5. Setting colorbar and its label vertically

cbar= fig.colorbar(im)
cbar.ax.set_ylabel('OBJECTIVE 4')

2.6. Setting size legend

objs=data[:,4]
max_size=np.amax(objs)*scale
min_size=np.amin(objs)*scale
handles, labels = ax.get_legend_handles_labels()
display = (0,1,2)
size_max = plt.Line2D((0,1),(0,0), color='k', markersize=max_size,linestyle='')
size_min = plt.Line2D((0,1),(0,0), color='k', markersize=min_size,linestyle='')
legend1= ax.legend([handle for i,handle in enumerate(handles) if i in display]+[size_max,size_min],
[label for i,label in enumerate(labels) if i in display]+["%.2f"%(np.amax(objs)), "%.2f"%(np.amin(objs))], labelspacing=1.5, title='OBJECTIVE 5', loc=1, frameon=True, numpoints=1, markerscale=1)

2.7. Setting the picker function

This part defines and connects the events. The mpl_connect() method connects the event to the figure. It accepts two arguments, the name of the event and the callable object (such as a function).

def onpick(event):ind = event.ind
print ('index: %d\nobjective 1: %0.2f\nobjective 2: %0.2f\nobjective 3: %0.2f\nobjective 4: %0.2f\nobjective 5: %0.2f\nobjective 6: %0.2f' % (event.ind[0],data[ind,0],data[ind,1],data[ind,2],data[ind,3],data[ind,4],data[ind,5]))
fig.canvas.mpl_connect('pick_event', onpick)
plt.show

For a list of events that Matplotlib supports, please refer to: Matplotlib event options. To download the previous code go to the following link: https://github.com/JazminZatarain/Basic-interactive-plotting/blob/master/fun_click.py

3. Example: Fun Annotation

Transitioning to a slightly more sophisticated interaction, we use annotations to link the event to a figure. That is, the information is shown directly in the figure as opposed to the terminal as in the previous example. In the following snippet, the row index of the data matrix is shown directly in the figure canvas by simply clicking on the point of interest.

The following code was adapted from Matplotlib’s documentation and this stackoverflow post to execute the previous figure. Sections 3.1. and 3.2. define our scatter plot with its corresponding labels as we saw in the previous example.

3.1. Required Libraries

import matplotlib.pyplot as plt, numpy as np
from mpl_toolkits.mplot3d import proj3d

3.2. Visualizing data in 3d plot with popover next to mouse position

def visualize3DData (X,scale,cmap):
fig = plt.figure(figsize = (16,10))
ax = fig.add_subplot(111, projection = '3d')
im= ax.scatter(X[:, 0], X[:, 1], X[:, 2], c= X[:, 3], s= X[:, 4]*scale, cmap=cmap,&amp;amp;amp;amp;nbsp;&amp;amp;amp;amp;nbsp;&amp;amp;amp;amp;nbsp;&amp;amp;amp;amp;nbsp; alpha=1, picker = True)
ax.set_xlabel('OBJECTIVE 1')
ax.set_ylabel('OBJECTIVE 2')
ax.set_zlabel('OBJECTIVE 3')
cbar= fig.colorbar(im)
cbar.ax.set_ylabel('OBJECTIVE 4')
objs=X[:,4]
max_size=np.amax(objs)*scale
min_size=np.amin(objs)*scale
handles, labels = ax.get_legend_handles_labels()
display = (0,1,2)
size_max = plt.Line2D((0,1),(0,0), color='k', marker='o', markersize=max_size,linestyle='')
size_min = plt.Line2D((0,1),(0,0), color='k', marker='o', markersize=min_size,linestyle='')
legend1= ax.legend([handle for i,handle in enumerate(handles) if i in display]+[size_max,size_min],
[label for i,label in enumerate(labels) if i in display]+["%.2f"%(np.amax(objs)), "%.2f"%(np.amin(objs))], labelspacing=1.5, title='OBJECTIVE 5', loc=1, frameon=True, numpoints=1, markerscale=1)

3.3. Return distance between mouse position and given data point

def distance(point, event):
assert point.shape == (3,), "distance: point.shape is wrong: %s, must be (3,)" % point.shape

3.4. Project 3d data space to 2d data space

 
x2, y2, _ = proj3d.proj_transform(point[0], point[1], point[2], plt.gca().get_proj())
# Convert 2d data space to 2d screen space
x3, y3 = ax.transData.transform((x2, y2))
return np.sqrt ((x3 - event.x)**2 + (y3 - event.y)**2)

3.5. Calculate which data point is closest to the mouse position

def calcClosestDatapoint(X, event):
distances = [distance (X[i, 0:3], event) for i in range(X.shape[0])]
return np.argmin(distances)
def annotatePlot(X, index):
# If we have previously displayed another label, remove it first
if hasattr(annotatePlot, 'label'):
annotatePlot.label.remove()
Get data point from array of points X, at position index:
x2, y2,_ = proj3d.proj_transform(X[index, 0], X[index, 1], X[index, 2], ax.get_proj())

3.6. Specify the information to be plotted in the annotation label

annotatePlot.label = plt.annotate("index: %d" % index,
xy = (x2, y2), xytext = (-20,20), textcoords = 'offset points', ha = 'right', va = 'bottom',
bbox = dict(boxstyle = 'round,pad=0.5', fc = 'yellow', alpha = 0.5),
arrowprops = dict(arrowstyle = '-&amp;gt;', connectionstyle = 'arc3,rad=0'))
fig.canvas.draw()

3.7. Defining the event

This last part defines the event that is triggered when the mouse is moved. It also shows the text annotation over the data point closest to the mouse.

def onMouseMotion(event):
closestIndex = calcClosestDatapoint(X, event)
annotatePlot (X, closestIndex)
fig.canvas.mpl_connect('motion_notify_event', onMouseMotion)&amp;nbsp; # on mouse motion
plt.show()

3.8. You can forget the previous code and simply insert your data here

if __name__ == '__main__':
import seaborn
#X=np.loadtxt('your_data.txt’) # your data goes here
X = np.random.random((50,6)) #this is the randomly generated data for this example
scale=1000 # scale of the size objective
cmap=plt.cm.spectral # choose your colormap
visualize3DData (X,scale,cmap)

To download the previous code go to the following link: https://github.com/JazminZatarain/Basic-interactive-plotting/blob/master/annotacious.py

April 21, 2016 by David Gold

Visualizing multidimensional data: a brief historical overview

The results of a MOEA search are presented as a set of multidimensional data points. In order to form useful conclusions from our results, we must have the ability to comprehend the multidimensional differences between results and effectively analyze and communicate them to decision makers.

Navigating through multiple dimensions is an inherently complex task for the human mind. We perceive the world in three dimensions, and thinking in higher dimensional space can be heavily taxing. The difficulty of comprehending multidimensional data is compounded when one must display the data on a two dimensional surface such as a sheet of paper or a computer screen. The challenge of “flattening” data has persisted for centuries, and has plagued not only those who were concerned with gleaning scientific insights from data, but also artists and those seeking to accurately portray the physical world as perceived by the human eye.

For much of human history, even the greatest artists were unable to accurately express the three dimensional world in a two dimensional plane. Nicolo da Bologna’s 14th century work, The Marriage, fails to convey any sense of three dimensional space, giving the viewer the impression that the figures painted have been pressed against a pane of glass.

Nicolo da Bologna’s The Marriage (1350s) is unable to convey any sense of depth to the viewer.

During the Italian Renaissance, artists rediscovered the mathematics of perspective, allowing them to break free of the constraints of their two dimensional canvas and convey realistic images that gave the illusion of a third dimension. Raphael’s The school of Athens masterfully uses perspective to imbue his painting with a sense of depth. Through clever exploitation of Euclidean geometry and the mechanics of the human eye, Raphael is able to use the same medium (paint on a two dimensional surface) to convey a much richer representation of his subjects than his Bolognese predecessor.

Raphael’s The School of Athens (1509-1511) is an example of a masterful use of perspective. The painting vividly depicts a three dimensional space.

In the twentieth century, artists began attempting to covey more than three dimensions in two dimensional paintings. Cubists such as Picasso, attempted to portray multiple viewpoints of the same image simultaneously, and futurists such as Umberto Boccioni attempted to depict motion and “Dynamism” in their paintings to convey time as a fourth dimension.

Pablo Picasso’s Portrait of Dora Maar (1938), depicts a woman’s face from multiple viewpoints simultaneously

Umberto Boccioni’s Dynamism of a Cyclist (1913) attempts to portray a fourth dimension, time, through a sense of motion and energy in the painting. Can you tell this is supposed to be a cyclist, or did I go too far out there for a water programming blog?

Regardless of your views on the validity of modern art, as engineers and scientists we have to admit that in this area we share similar goals and challenges with these artists: to effectively convey multidimensional data in a two dimensional space. Unlike artists, whose objectives are to convey emotions, beauty or abstract ideas through their work, we in the STEM fields seek to gain specific insights from multidimensional data that will guide our actions or investigations.

A notable historical example of the effective use of clever visualization was English physician John Snow’s map of the London Cholera epidemic of 1854. Snow combined data of cholera mortality with patient home addresses to map the locations of cholera deaths within the city.

John Snow’s map of the 1854 London Cholera Epidemic. Each black bar is proportional to the number of cholera deaths at a given residence. Pumps are depicted using black circles. One can clearly see that the cholera deaths are clustered around the pump on Broad Street (which I’ve circled in red).

The results of his analysis led Snow to conclude that a contaminated well was the likely source of the outbreak, a pioneering feat in the field of public health. Snow’s effective visualization not only provided him with insights into the nature of the problem, but also allowed him to effectively communicate his results to a general public who had previously been resistant to the idea of water borne disease.

In his insightful book Visual Explanations: Images and Quantities, Evidence and Narrative, Edward Tufte points to three strengths within John Snow’s use of data visualization in his analysis of the epidemic. First, Snow provided the appropriate context for his data. Rather than simply plotting a time series of cholera deaths, Snow placed those deaths within a new context, geographic location, which allowed him to make the connection to the contaminated pump. Second, Snow made quantitative comparisons within his data. As Tufte points out, a fundamental question when dealing with statistical analysis is “Compared with what?” It’s not sufficient to simply provide data about those who were struck with the disease, but also to explain why certain populations were not effected. By complimenting his data collection with extensive interviews of the local population, Snow was able to show that there were indeed people who escaped disease within the area of interest, but these people all got their water from other sources, which strengthened his argument that the pump was the source of the epidemic. Finally, Tufte insists that one must always consider alternative explanations than the one that seems apparent from the data visualization before drawing final conclusions. It is easy for one to make a slick but misleading visualization, and in order to maintain credibility as an analyst, one must always keep an open mind to alternative explanations. Snow took the utmost care in crafting and verifying his conclusion, and as a result his work stands as a shining example of the use of visualization to explore multidimensional data.

While Snow’s methodology is impressive, and Tufte’s observations about his work helpful, we cannot directly use his methodology to future evaluation of multidimensional data, because his map is only useful when evaluating data from the epidemic of 1854. There is a need for general tools that can be applied to multidimensional data to provide insights through visualizations. Enter the field of visual analytics. As defined by Daniel Keim “Visual analytics combines automated-analysis techniques with interactive visualization for an effective understanding, reasoning and decision making on the basis of very large and complex data sets”. The field of visual analytics combines the disciplines of data analysis, data management, geo-spacial and temporal processes, spacial decision support, human-computer interaction and statistics. The goal of the field is to create flexible tools for visual analysis and data mining. Noted visualization expert Alfred Inselberg proposed six criterion that successful visualization tools should have:

Low representational complexity.
Works for any number of dimensions
Every variable is treated uniformly
the displayed object can be recognized under progressive transformations (ie. rotation, translation, scaling, perspective)
The display easily/intuitively conveys information on the properties of the N-dimensional object it represents
The methodology is based on rigorous mathematical and algorithmic results

Using the above criteria, Inselberg developed the Parallel Coordinate plot. Parallel Coordinate plots transform multidimensional relationships into two dimensional patterns which are well suited for visual data mining.

An example of a five dimensional data set plotted on a Parallel Coordinate plot. Each line represents a data point, while each axis represents the point’s value in each dimension.

As water resources analysts dealing with multiobjective problems, it is critical that we have the ability to comprehend and communicate the complexities of multidimensional data. By learning from historical data visualization examples and making use of cutting edge visual analytics, we can make this task much more manageable. Parallel coordinate plots are just one example of the many visualization tools that have been created in recent decades by the ever evolving field of visual analytics. As computing power continues its rapid advancement, it is important that we as analysts continue to ask ourselves whether we can improve our ability to visualize and gain insights from complex multidimensional data sets.

April 13, 2016 by Julie

Synthetic Weather Generation: Part IV

Conditioning Synthetic Weather Generation on Climate Change Projections

This is the fourth blog post in a five part series on synthetic weather generators. You can read about common single-site parametric and non-parametric weather generators in Parts I and II, respectively, as well as multi-site generators of both types in Part III. Here I discuss how parametric and non-parametric weather generators can be modified to simulate weather that is consistent with climate change projections.

As you are all well aware, water managers are interested in finding robust water management plans that will consistently meet performance criteria in the face of hydrologic variability and change. One method for such analyses is to re-evaluate different water management plans across a range of potential future climate scenarios. Weather data that is consistent with these scenarios can be synthetically generated from a stochastic weather generator whose parameters or resampled data values have been conditioned on the climate scenarios of interest, and methods for doing so are discussed below.

Parametric Weather Generators

Most climate projections from GCMs are given in terms of monthly values, such as the monthly mean precipitation and temperature, as well as the variances of these monthly means. Wilks (1992) describes a method for adjusting parametric weather generators of the Richardson type in order to reproduce these projected means and variances. This method, described here, has been used for agricultural (Mearns et al., 1996; Riha et al., 1996) and hydrologic (Woodbury and Shoemaker, 2012) climate change assessments.

Recall from Part I that the first step of the Richardson generator is to simulate the daily precipitation states with a first order Markov chain, and then the precipitation amounts on wet days by independently drawing from a Gamma distribution. Thus, the total monthly precipitation is a function of the transition probabilities describing the Markov chain and the parameters of the Gamma distribution describing precipitation amounts. The transition probabilities of the Markov chain, p₀₁ and p₁₁, represent the probabilities of transitioning from a dry day to a wet day, or a wet day to another wet day, respectively. An alternative representation of this model shown by Katz (1983) is with two different parameters, π and d, where π = p₀₁/(1 + p₀₁ – p₁₁) and d = p₁₁ – p₀₁. Here π represents the unconditional probability of a wet day, and d represents the first-order autocorrelation of the Markov chain.

Letting S_N be the sum of N daily precipitation amounts in a month (or equivalently, the monthly mean precipitation), Katz (1983) shows that if the precipitation amounts come from a Gamma distribution with shape parameter α and scale parameter β, the mean, μ, and variance, σ², of S_N are described by the following equations:

(1) μ(S_N) = Nπαβ

(2) σ²(S_N) = Nπαβ²[1 + α(1-π)(1+d)/(1-d)].

For climate change impact studies, one must find a set of parameters θ=(π,d,α,β) that satisfies the two equations above for the projected monthly mean precipitation amounts and variances of those means. Since there are only 2 equations but 4 unknowns, 2 additional constraints are required to fully specify the parameters (Wilks, 1992). For example, one might assume that the frequency and persistence of precipitation do not change, but that the mean and variance of the amounts distribution do. In that case, π and d would be unchanged from their estimates derived from the historical record, while α and β would be re-estimated to satisfy Equations (1) and (2). Other constraints can be chosen based on the intentions of the impacts assessment, or varied as part of a sensitivity analysis.

To modify the temperature-specific parameters, recall that in the Richardson generator, the daily mean and standard deviation of the non-precipitation variables are modeled separately on wet and dry days by annual Fourier harmonics. Standardized residuals of daily minimum and maximum temperature are calculated for each day by subtracting the daily mean and dividing by the daily standard deviation given by these harmonics. The standardized residuals are then modeled using a first-order vector auto-regression, or VAR(1) model.

For generating weather conditional on climate change projections, Wilks (1992) assumes that the daily temperature auto and cross correlation structure remains the same under the future climate so that the VAR(1) model parameters are unchanged. However, the harmonics describing the mean and standard deviation of daily minimum and maximum temperature must be modified to capture projected temperature changes. GCM projections of temperature changes do not usually distinguish between wet and dry days, but it is reasonable to assume the changes are the same on both days (Wilks, 1992). However, it is not reasonable to assume that changes in minimum and maximum temperatures are the same, as observations indicate that minimum temperatures are increasing by more than maximum temperatures (Easterling et al., 1997; Vose et al., 2005).

Approximating the mean temperature, T, on any day t as the average of that day’s mean maximum temperature, µ_max(t), and mean minimum temperature, µ_min(t), the projected change in that day’s mean temperature, ΔT(t), can be modeled by Equation 3:

(3) $\Delta \overline{T}\left(t\right) = \frac{1}{2}\left[\mu_{min}\left(t\right) + \mu_{max}\left(t\right)\right] = \frac{1}{2} \left(CX_0 + CX_1\cos\left[\frac{2\pi\left(t-\phi\right)}{365}\right] + CN_0 + CN_1\cos\left[\frac{2\pi\left(t-\phi\right)}{365}\right]\right)$

where CX₀ and CN₀ represent the annual average changes in maximum and minimum temperatures, respectively, and CX₁ and CN₁ the corresponding amplitudes of the annual harmonics. The phase angle, φ, represents the day of the year with the greatest temperature change between the current and projected climate, which is generally assumed to be the same for the maximum and minimum temperature. Since GCMs predict that warming will be greater in the winter than the summer, a reasonable value of φ is 21 for January 21^st, the middle of winter (Wilks, 1992).

In order to use Equation 3 to estimate harmonics of mean minimum and maximum temperature under the projected climate, one must estimate the values of CX₀, CN₀, CX₁ and CN_1. Wilks (1992) suggests a system of four equations that can be used to estimate these parameters:

(4) ΔT = 0.5*(CX₀ + CN₀)

(5) Δ[T(JJA) – T(DJF)] = -0.895(CX₁ + CN₁)

(6) ΔDR(DJF) = CX₀ − CN₀ + 0.895(CX₁ − CN₁)

(7) ΔDR(JJA) = CX₀ − CN₀ − 0.895(CX₁ − CN₁)

where the left hand sides of Equations (4)-(7) represent the annual average temperature change, the change in temperature range between summer (JJF) and winter (DJF), the change in average diurnal temperature range (DR = µ_max – µ_min) in winter, and the change in average diurnal temperature range in summer, respectively. The constant ±0.895 is simply the average value of the cosine term in equation (3) evaluated at φ = 21 for the winter (+) and summer (−) seasons. The values for the left hand side of these equations can be taken from GCM projections, either as transient functions of time or as step changes.

Equations (4)-(7) can be used to estimate the mean minimum and maximum temperature harmonics for the modified weather generator, but the variance in these means may also change. Unlike changes in mean daily minimum and maximum temperature, it is fair to assume that changes in the standard deviation of these means are the same as each other and the GCM projections for changes in the standard deviation of daily mean temperature for both wet and dry days. Thus, harmonics modeling the standard deviation of daily minimum and maximum temperature on wet and dry days can simply be scaled by some factor σ_d’/ σ_d, where σ_d is the standard deviation of the daily mean temperature under the current climate, and σ_d’ is the standard deviation of the daily mean temperature under the climate change projection (Wilks, 1992). Like the daily mean temperature changes, this ratio can be specified as a transient function of time or a step change.

It should be noted that several unanticipated changes can occur from the modifications described above. For instance, if one modifies the probability of daily precipitation occurrence, this will change both the mean daily temperature (since temperature is a function of whether or not it rains) and its variance and autocorrelation (Katz, 1996). See Katz (1996) for additional examples and suggested modifications to overcome these potential problems.

Non-parametric Weather Generators

As described in Part II, most non-parametric and semi-parametric weather generators simulate weather data by resampling historical data. One drawback to this approach is that it does not simulate any data outside of the observed record; it simply re-orders them. Modifications to the simple resampling approach have been used in some stationary studies (Prairie et al., 2006; Leander and Buishand, 2009) as mentioned in Part II, and can be made for climate change studies as well. Steinschneider and Brown (2013) investigate several methods on their semi-parametric weather generator. Since their generator does have some parameters (specifically, transition probabilities for a spatially averaged Markov chain model of precipitation amounts), these can be modified using the methods described by Wilks (1992). For the non-parametric part of the generator, Steinschneider and Brown (2013) modify the resampled data itself using a few different techniques.

The first two methods they explore are common in climate change assessments: applying scaling factors to precipitation data and delta shifts to temperature data. Using the scaling factor method, resampled data for variable i, x_i, are simply multiplied by a scaling factor, a, to produce simulated weather under climate change, ax_i. Using delta shifts, resampled data, x_i, are increased (or decreased) by a specified amount, δ, to produce simulated weather under climate change, x_i + δ.

Another more sophisticated method is the quantile mapping approach. This procedure is generally applied to parametric CDFs, but can also be applied to empirical CDFs, as was done by Steinschneider and Brown (2013). Under the quantile mapping approach, historical data of the random variable, X, are assumed to come from some distribution, F_X, under the current climate. The CDF of X under climate change can be specified by a different target distribution, F_X*. Simulated weather variables x_i under current climate conditions can then be mapped to values under the projected climate conditions, x_i*, by equating their values to those of the same quantiles in the target distribution, i.e. x_i* = F*^-1(F(x_i)).

While simple, these methods are effective approaches for top-down or bottom-up robustness analyses. Unfortunately, what one often finds from such analyses is that there is a tradeoff between meeting robustness criteria in one objective, and sacrificing performance in another, termed regret. Fortunately, this tradeoff can occasionally be avoided if there is an informative climate signal that can be used to inform water management policies. In particular, skillful seasonal climate forecasts can be used to condition water management plans for the upcoming season. In order to evaluate these conditioned plans, one can generate synthetic weather consistent with such forecasts by again modifying the parameters or resampling schemes of a stochastic weather generator. Methods that can be used to modify weather generators consistent with seasonal climate forecasts will be discussed in my final blog post on synthetic weather generators.

Works Cited

Easterling, D. R., Horton, B., Jones, P. D., Peterson, T. C., Karl, T. R., Parker, D. E., et al. Maximum and minimum temperature trends for the globe. Science, 277(5324), 364-367.

Katz, R. W. (1983). Statistical procedures for making inferences about precipitation changes simulated by an atmospheric general circulation model. Journal of the Atmospheric Sciences, 40(9), 2193-2201.

Katz, R. W. (1996). Use of conditional stochastic models to generate climate change scenarios. Climatic Change, 32(3), 237-255.

Leander, R., & Buishand, T. A. (2009). A daily weather generator based on a two-stage resampling algorithm. Journal of Hydrology, 374, 185-195.

Mearns, L. O., Rosenzweig, C., & Goldberg, R. (1996). The effect of changes in daily and interannual climatic variability on CERES-Wheat: a sensitivity study. Climatic Change, 32, 257-292.

Prairie, J. R., Rajagopalan, B., Fulp, T. J., & Zagona, E. A. (2006). Modified K-NN model for stochastic streamflow simulation. Journal of Hydrologic Engineering, 11(4), 371-378.

Richardson, C. W. (1981). Stochastic simulation of daily precipitation, temperature and solar radiation. Water Resources Research, 17, 182-190.

Riha, S. J., Wilks, D. S., & Simoens, P. (1996). Impact of temperature and precipitation variability on crop model predictions. Climatic Change, 32, 293-311.

Steinschneider, S., & Brown, C. (2013). A semiparametric multivariate, multisite weather generator with low-frequency variability for use in climate risk assessments. Water Resources Research, 49, 7205-7220.

Vose, R. S., Easterling, D. R., & Gleason, B. (2005). Maximum and minimum temperature trends for the globe: An update through 2004. Geophysical research letters, 32(23).

Wilks, D. S. (1992). Adapting stochastic weather generation algorithms for climate change studies. Climatic Change, 22(1), 67-84.

Woodbury, J., & Shoemaker, C. A. (2012). Stochastic assessment of long-term impacts of phosphorus management options on sustainability with and without climate change. Journal of Water Resources Planning and Management, 139(5), 512-519.

April 8, 2016 by Tom Wild

Debugging in Python (using PyCharm) – Part 3

This post is part 3 of a multi-part series of posts intended to provide discussion of some basic debugging tools that I have found to be helpful in developing a pure Python simulation model using a Python Integrated Development Environment (IDE) called PyCharm.

Before I begin this post, the following are links to previous blog posts I have written on this topic:

In this post I will focus on PyCharm’s “Coverage” features, which are very useful for debugging by allowing you to see what parts of your program (e.g., modules, classes, methods) are/are not being accessed for a given implementation (run) of the model. If instead you are interested in seeing how much time is being spent running particular sections of code, or want to glimpse into the values of variables during execution, see the previous posts I linked above on profiling and breakpoints.

To see what parts of my code are being accessed, I have found it helpful to create and run what are called “unit tests”. You can find more on unit testing here, or just by googling it. (Please note that I am not a computer scientist, so I am not intending to provide a comprehensive summary of all possible approaches you could take to do this. I am just going to describe something that has worked well for me). To summarize, unit testing refers to evaluating sections (units) of source code to determine whether those units are performing as they should. I have been using unit testing to execute a run of my model (called “PySedSim”) to see what sections of my code are and are not being accessed.

I integrated information from the following sources to prepare this post:

This website has a useful post on how to create a unit test, so some of the first few steps I list below are borrowed from this site, starting in the “Creating test” section.
PyCharm’s blog offers useful details regarding PyCharm’s code coverage features.
PyCharm’s home page also offers some useful discussion of code coverage features.

Please follow the following steps:

Step 1. Open the script (or class, or method) you want to assess, and click on the function or method you want to assess.

In my case, I am assessing the top-level python file “PySedSim.py”, which is the file in my program that calls all of the classes to run a simulation (e.g., Reservoirs and River Channels). Within this file, I have clicked on the PySedSim function. Note that these files are already part of a PyCharm project I have created, and Python interpreters have already been established. You need to do that first.

Step 2. With your cursor still on the function/method of interest, click “ctrl + shift + T”.

A window should appear as it does below. Click to “Create New Test”.

Step 3. Create a new test. Specify the location of the script you are testing, and keep the suggested test file and class names, or modify them. Then click to add a check mark to the box next to the Test Method, and click “OK”.

Step 4. Modify the new script that has been created. (In my case, this file is called “test_pySedSim.py”, and appears initially as it does below).

I then modified this file so that it reflects testing I want to conduct on the PySedSim method in the PySedSim.py file.

In my case, it appears like this.


from unittest import TestCase
from PySedSim import PySedSim
class TestPySedSim(TestCase):
def test_PySedSim(self):
PySedSim()

Note that there is a ton of functionality that is now possible in this test file. I suggest reviewing this website again carefully for ideas. You can raise errors, and use the self.fail() function, to indicate whether or not your program is producing acceptable results. For example, if the program produces a negative result when it should produce a positive result, you can indicate to PyCharm that this represents a fail, and the test has not been passed. This offers you a lot of flexibility in testing various methods in your program.

In my case, all I am wanting to do is run the model and see which sections were accessed, not to specifically evaluate results it has produced, so in my case PyCharm should execute the model and indicate it has “passed” the unit test (once I create and run the unit test).

Step 5. In the menu at the top of the screen that I show clicked on in the image below, click on “Edit configurations”.

From here, click on the “+” button, and go to Python tests –> Unittests.

Step 6. In the “Run/Debug Configurations” window, give your test a name in the “Name” box, and in the “script” box locate the script you created in Step 4, and indicate its path. Specify any method parameters that need to be specified to run the method. I did not specify any environment preferences, as the interpreter was already filled in. Click OK when you are done.

Step 7. Your test should now appear in the same configuration menu you clicked on earlier in Step 5. So, click the button at the top to “Run with Coverage”. (In my case, run water_programming_blog_post with coverage)

Note that it is likely going to take some time for the test to run (more than it would take for a normal execution of your code).

Step 8. Review the results.

A coverage window should appear to the right of your screen, indicating what portions (%) of the various functions and methods contained in this program were actually entered.

To generate a more detailed report, you can click on the button with the green arrow inside the coverage window, which will offer you options for how to generate a report. I selected the option to generate an html report. If you then select the “index” html file that appears in the directory you’re working in, you can click to see the coverage for each method.

For example, here is an image of a particular class (reservoir.py), showing in green those sections of code that were entered, and in red the sections that were not. I used this to discover that particular portions of some methods were not being accessed when they should have been. The script files themselves also now have red and green text that appears next to the code that was not or was entered, respectively. See image above for an example of this.

PyCharm also indicates whether or not the unittest has passed. (Though I did not actually test for specific outputs from the program, I could have done tests on model outputs as I described earlier, and any test failure would be indicated here).

Water Programming: A Collaborative Research Blog

Tips and tricks on programming, evolutionary algorithms, and doing research

Month: April 2016