ggplot (Part 1)

In this blog post, I am going to introduce a powerful plotting package in R. The plotting package is called ggplo2. This library allows us to quickly create different plots (scatter plots, boxplots, histograms, density plots, time series plots: you name it!) while also customizing them to produce elegant graphics beyond regular line or bar charts. First, we need to download the library and then activate it:

install.packages("ggplot2")
library(ggplot2)

I am going to outline how to build two different types of map: (1) calendar heat map and (2) alluvial map. The first is used to present the variations of a variable or an activity over a long period of time through color coding so that we can easily recognize the trend, the seasonality, or any patterns or anomalies. The alluvial map provides better visualization for categorical and multidimensional data. They show the changes in magnitude of some variables at different situations that can be any discrete indexes. To create this type of map, we also need the “ggalluvial” library that has to be called with ggplot2.

The general code structure for plotting calendar heat map is the following:

ggplot ( dataframe , aes ( x , y , fill )) + geom_tile ( ) + facet_grid ( )

With “aes,” which stands for aesthetics, we describe how variables in the data frame are mapped to visual properties, so we specify the x- and y-axes. We use the fill in aesthetic to specify the fill color for different variables.

“Geom” specifies the geometric objects that define the graph type—which can be point, line, etc.—and show the value we assigned in “aes(fill)”. The “geom_tile” tiles plane with rectangles uses the center of the tile and its size. A rectangle can be divided into smaller rectangles or squares.

The “facet” command creates a trellis graph or panel by specifying two variables or one on top of “aes.”

We can show the daily, weekly, or monthly values in the calendar heat map. As an example of calendar heat map, I am using weather data for the Yakima River Basin in central Washington State; these data were originally downloaded from Northwest Knowledge Network. The data set includes three downscaled global climate models (GCMs) from 2019 to 2061 with the resolution of 1/24th degree; the data is also aggregated for the basin and monthly time step. You can download data here. Now, let’s read the data.

gcm1 <- read.csv("(your directory)/CanESM2_rcp85.csv")
gcm2 <- read.csv("(your directory)/inmcm4_rcp85.csv")
gcm3 <- read.csv("(your directory)/GFDL-ESM2G_rcp85.csv")

By running header(gcm1) or colnames(gcm1), you can see different columns in each data set, including “Year”; “Month”; name of the “GCM”; and weather variables including tasmin, tasmax, pr, PotEvap corresponded to minimum and maximum temperature, precipitation, and potential evapotranspiration. The goal is to visualize the difference between these realizations of monthly precipitation for 21 future years from 2020 to 2040. To lay out panels in a faced_grid, I want to show years and months. In each month, I am going to show precipitation values for each GCM.

gcms<- rbind(gcm1,gcm2,gcm3)  # Join three data frames into one

# Add a new column to the data frame and call it “Month_name”; then, fill it with the name of the months for each row
tst<- c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct",
"Nov","Dec")  
gcms$Month_name <- tst[gcms$Month]  
gcms$nmonth<- as.factor(1)  # add a new column to a data frame and fill it with value of 1, as a factor
gcm_2040<- subset(gcms, gcms$Year<2041) # select just the years before 2041
prec_fut<- ggplot(gcm_2040, aes(x=gcm_2040$gcm,nmonth,fill = pr)) +
  geom_tile()+
  facet_grid(Year~Month_name) +
  theme(strip.text.x = element_text(colour = "black",size =9 ,margin=margin(0.2,0,0.2,0,"cm")),
        strip.text.y = element_text(colour = "black",size = 9,angle =360),
        strip.background = element_rect(colour="black", size=1, linetype="solid"),
        axis.text.x=element_text(size=9,angle = 90),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        panel.background = element_rect(fill = "white"))+
scale_fill_gradient(low="green",high="red",limits=c(0,230),breaks=c(0,25,50,75,100,125,150,175,200,230),labels=c(0,25,50,75,100,125,150,175,200,230)) +
  labs(x="",
       y="",
       title = "Time-Series Calendar Heatmap",
       subtitle="Projected Monthly Precipitation-Yakima Rive Basin",
       fill="Precipitation (mm)") 
ggsave("(your directory)/name.png", prec_fut)

With theme(), you can modify the non-data parts of your plot. For example, “strip.text.x and .y” and “strip.background“ adjust facet labels of each panel along horizontal and vertical directions and background format.

The “axis.text” and “axis.ticks” commands adjust the tick labels and marks along axes, and by using element_blank(), you can remove specific labels. With “panel.background,” the underlying background of the plot can be customized.

With argument “limits” in “scale_fill_gradient,” you can extend the color bar to the minimum and maximum values you need. In our example, these limits are obtained by the following commands.

max(gcm_2040$pr) 
min(gcm_2040$pr)

With the labs() command, you can change axis labels, plot, and legend titles. Finally, ggsave() is used to save a plot; the scale, size, and resolution of the saved plot can also be specified. Ggsave() supports various formats including “eps,” “ps,” “tex” (pictex), “pdf,” “jpeg,” “tiff,” “png,” “bmp,” “svg,” and “wmf” (Windows only).

What can we learn from this graph? Well, we can see the interannual variability of precipitation based on three GCMs and how monthly precipitation varies in different GCMs. Compared to precipitation values for wet months, when variations are generally higher, precipitation values for dry months are more consistent among GCMs.

Now, let’s create an alluvial diagram. For that, we need to prepare the essential components. The data should be categorical, and for each row, there should be a frequency for a category that we are interested in presenting. In this example, I am going to show simulated (by a crop model) winter wheat yield changes in dryland low rainfall zone of the Pacific Northwest during the future period (2055–2085) compared to the historical period (1980–2010). The low zone in this region includes 1,384 grid cells with the dimension of 4 by 4 km. Different GCMs projected different weather scenarios, which we showed in the calendar heat map plot. As you can imagine, if you force your crop model with different GCMs, you will get different projections for the crop yield. The example data set can be read in R by the following:

L_zones<- read.csv("(your directory)/yield_gcms.csv",head=T)

This data set includes a few columns: “fre_yield,” “GCM,” “Zone,” “RCP,” and “Ratio.”

For each row, there is a GCM name that corresponds to RCP and Zone. Then, there is a Ratio column, which shows the category of the yield ratio. These yield ratio categories correspond to the average winter wheat yield during the future period, divided by average yield during the historical period. For each of the categories under the Ratio column, the number of grid cells was counted and is reported under the “fre_ yield” column. For example, Row 2 (L_zones[2,]) shows that 976 grid cells out of 1,384 cells in a low-rainfall zone are projected to have yield ratio between 1.2 and 1.5 during 2055–2085 compared to 1980–2010, under CanESM2 and RCP 4.5 future weather scenarios.

Ggplot and ggalluvial provide an easy way to illustrate this type of data set. At each category on the x-axis, we can have multiple groups, and they are called “strata.” Alluvial diagrams have horizontal splines that span across the categories at the x-axis, and they are called “alluvia.” A fixed value is assigned to an alluvium at each category at the x-axis that can be represented by a fill color.

install.packages("ggalluvial")
library(ggalluvial)

yield_ratio<- ggplot(data = L_zones, aes(axis=GCM, axis2=RCP, y = fre_yield)) + 
 # We can add more (axis4, …) to have more groups in the x-axis
  scale_x_discrete(limits = c("GCM","RCP"), expand = c(.1, .1)) +
  geom_alluvium(aes(fill = Ratio)) +
  geom_stratum(width = 1/12, fill = "lightgrey", color = "black") + 
  geom_label(stat = "stratum", label.strata = TRUE) +
  scale_fill_brewer(type = "qual", palette = "Set1")+
  labs(x="",
       y="Number of Grid cells in the Zone",
       title = "Average Change in Winter Wheat Yield During 2055-2085 Compared to 1980-2010",
       subtitle="Low Rainfall Zone in Pacific Northwest")
ggsave("(your directory)/Future_WW_Yield.jpeg",yield_ratio, width=10, height=10)

Y in the aes() controls the heights of the alluvia and is aggregated across equivalent observations.

“Scale_x_discrete” allows you to place labels between discrete position scales. You can use “limit” to define values of the scale and also their order, and “expand” adds some space around each value of the scale.

“Geom_alluvium” receives the x and y from the data set from ggplot and plots the flows between categories.

“Geom_stratum” plots the rectangles for the categories, and we can adjust their appearance.

 Labels can be assigned to strata by adding “stat = stratum” and “label.strata = TRUE” to the geom_label. Then, the unique values within each stratum are shown on the map.

“Scale_fill_brewer” is useful for displaying discrete values on a map. The type can be seq (sequential), div (diverging), or qual (qualitative). The “palette” can be a string of the named palette or a number, which will index into the list of palettes of appropriate type.

Now, we can easily see in this graph that the three GCMs used in the crop model produced different results. The changes in winter wheat yield during the future period compared to the historical period are not predicted with the same magnitude based on different future weather scenarios, and these differences are more profound under RCP 8.5 compared to RCP 4.5.

3 thoughts on “ggplot (Part 1)

  1. Pingback: ggplot (Part 2) – Water Programming: A Collaborative Research Blog

  2. Pingback: Publication-Quality Illustrations in R – Water Programming: A Collaborative Research Blog

  3. Pingback: Basics of data visualization with ggplot2 – Water Programming: A Collaborative Research Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s