R Shiny – Part 1

In this blog post, I explain a helpful R package that you can use to create interactive plots and web applications. It’s called shiny, and it can also be used to create websites with interactive user interfaces. The interactive plots can contain graphs, maps, tables and other details you need on your site. In this post, I cover some basics of R shiny and some guidelines to creating interactive plots with it. In the next two posts, I’ll provide more details on how the package can be used to create simple web apps.

Shiny apps have two main functions: ui.R and sever.R. These functions can be in two separate R files or a single one. The ui.R function is used to create the graphical user interface; I provide more details below. The server.R function is where calculations are done and graphs are generated. In other words, ui.R is equivalent to front-end development and server.R to the back-end part of the code. Before I get into the details, I’ll note that there are several nice instruction manuals out there that you can refer to for more information. Probably the best is Shiny’s original manual, available at here and here.

To showcase some of the things you can create with R shiny, I downloaded the package download logs from November 1, 2019 here. This file has more than 3 million lines, corresponding to all the CRAN packages that were downloaded that day. I post-processed the data so that the final file lists the packages by the number of times they were downloaded that day. For example, this graph shows the first thirty packages. You can download the final post-processed file from here.

User Interface

The user interface part of our script divides the page into several components and defines the characteristics of each. Let’s say we want a side bar that provides options for the user. We can instruct the model on the layout of our web app, the size of each component, the details of any slide bars or dropdown menus we want, and so forth. The following is a user interface script I created, in which I added some comments to explain what each part of the code does.

library(shiny)
library(ggpubr)
library(ggplot2)

setwd("~/blog posts/R shiny/") 

# UI 

ui <- fluidPage(  
  # The fluidPage command translates R shiny codes to HTML format 
  
  
  
  titlePanel("Most Downloaded R Packages"), # Here we can define the main title of our web app
  
  
  fluidRow( 
    
    # The fluidRow command gives us the flexibility to define 
    # our layout column and offset commands are two of the popular arguments
    # that can be used to generate our layout
    # There are also other predefined layouts available in 
    # R-Shiny such as sidebar layout, vertical layout and split layout
    
    
    # Here we width and position of our object in the web app
    column(10, offset = 4,
           
    # Users can have different types of widgets 
    # in R-Shiny, such as dropdown menus and date range inputs. 
    # One of the popular ways to set a number
    # is through a slider widget, the following two lines 
    # define two slider bars to set different values
    # The sliderInputs command allows us to define the range 
    # and default value of our slider widget
           
           sliderInput(inputId = "NumLibs1",
                       label = "1- Number of libraries in PDF plot:", min = 30, max = 1000, value = 30),
           
           
           sliderInput(inputId = "NumLibs2",
                       label = "2- Number of libraries in bar plot:", min = 3, max = 30, value = 10)
    ),
    
    # The mainPanel command defines the size and layout 
    # structure of our main panel where we show our actual plots
    column(12, 
           mainPanel(
             
    # Creates a plot that will be placed in the main panel
             plotOutput(outputId = "plot",width = "150%", height = "450")
             
           )
    )
    
  ) 
)

Server

The server provides information about calculations that need to be done behind the scenes and then plots and generates the graphics, defines the properties of table, and so on. The following is server code I developed to provide information on all the components of my plot.


# Server 

server<-function(input, output){ 
  # Here we define a server function that does 
  # behind the scene calculations and generates our plot
  
  input_data<- read.table("pkg_by_freq.txt", header = T)
  # Reads the input file from our working directory
  
  reorderd_freq<-input_data[order(input_data$pkg_count, decreasing = T),]
  # This command sorts our input data based on number of downloads 
  # in that particular day (11/01/2019)
  
  output$plot <- renderPlot({
    # This part renders our output plot
    
    max_numb<-input$NumLibs1
    num_pop_libs<-input$NumLibs2
    # Here our code receives the number that will 
    # be set by users through the slider widget
    
    p1<-ggplot(reorderd_freq[6:max_numb,], aes(x=pkg_count)) +geom_density(fill="lightblue")+
      labs(title = "1- PDF of Numuer of Downloads of R Packages",  x="Number of Downloads", y="") +
      theme_bw() +theme(axis.text.x  = element_text(size = rel(1.8)) )
    
    reorderd_freq$pkg_name <- reorder(reorderd_freq$pkg_name, reorderd_freq$pkg_count)
    p2<-ggplot(reorderd_freq[1:num_pop_libs,])+ geom_bar(aes(x=pkg_name, y=pkg_count), stat="identity", fill="purple1") +
      labs(title = "2- Most Downloaded R Packages", y="Number of Downloads", x="Package Name") +
      coord_flip() +theme_bw() +theme(axis.text.y  = element_text(size = rel(1.4)) )
    # Now we use ggplot2 package to generate two figures a PDF plot (geom_density) and a bar plot (geom_bar)
    # Note that we use the slider input to change the characteristics of our plot
    
    ggarrange(p1, p2)
    # Finally we combine our two plots using ggarange function from the ggpubr package
  })
  
}

One thing to keep in mind is that if you have two separate files for ui.R and sever.R, you always have to save them in the same folder.

When you’re done with your ui.R and server.R, you can either use your R-Studio run bottom or the runApp() command to combine all the components and create your final output.

# This command connects the UI code with the server code and 
# generates our final output 
shinyApp(ui, server)

And here is the final interactive figure that you will produce:

Also, I created this simple interactive webpage to show the results of my R code example (download the R code from here). The left graph shows the probability density function of the different R libraries downloaded on November 1, 2019. Because more than 15,000 R libraries were downloaded that day, the graph allows you to see the distribution of the libraries with the highest download rates. The slider lets you change the number plots in the graph. The right plot shows the most downloaded libraries, and the slider lets you to include an arbitrary number of the most popular libraries in the figure.

In the next couple of posts, I will give more examples of R shiny’s capabilities and ways to make websites using it.

ggplot (Part 2)

This is the second part of the ggplot introduction. In this blog post, I am going to go over how you can make a decent density plot in ggplot. Density plots are basically smoothed versions of the histogram and show the distribution of your data while also presenting the probability distribution of the data using the kernel density estimation procedure. For example, when we have a regional data set, it is important to look at the distribution of our data across the region instead of just considering the region average. In our example (download the data set from here), we are going to visualize the regional distribution of simulated average winter wheat yield for 30 years from 1981 to 2010. The “ID” column in the data set represents one grid cell in the region, and there are 1,812 total grid cells. For each grid cell, the average historical yield and the standard deviation of yield during 30 years were given. First, we need to load the library; then, in the general code structure of “ggplot ( dataframe , aes ( x , y , fill )),” we need to specify x-axis to “yield.” The y-axis will be calculated and added through “geom_density()”. Then, we can add a color, title, and label and customize the background.

example1<- read.csv("(your directory)/example_1.csv")
library(ggplot2)   
ggplot(example1, aes(x=example1$period_ave_Y))+ 
geom_density(fill="blue")+
 theme(panel.background = element_rect(fill = 'white'),axis.line = element_line(size = 0.5, linetype = "solid",colour = "black"))+
  labs(title = paste("Density Plot of Regional Average Historical Yield (30 years)"),x = "Winter Wheat Yield (tonnes/ha)", y = "Density", color="black")

Now, we want to know how the standard deviation of 30 years’ average yield for all the grid cells in the region can be mapped into this density plot.

We can add another column (name it “SD_class”) to the data set and classify the standard deviations. The maximum and minimum standard deviations among all the grid cells are the following.

max(example1$period_sd_Y)
# [1] 3.605131
min(example1$period_sd_Y)
# [1] 0.8645882

For example, I want to see this plot categorized by standard deviations between 0.8 to 1.5, 1.5 to 2.5, and 2.5 to the maximum value. Here, I am writing a simple loop to go over each row and check the standard deviation value for each row (corresponding to each grid cell in a region); I fill the newly added column (“SD_class”) with the correct class that I specify in the “if statement.”

example1$SD_class<- NA
for (i in 1:nrow(example1)){
  if(example1[i,2]>0.8 && example1[i,2]<= 1.5) {example1[i,4]<- c("0.8-1.5")}
  if(example1[i,2]>1.5 && example1[i,2]<= 2.5) {example1[i,4]<- c("1.5-2.5")}
  if(example1[i,2]>2.5) {example1[i,4]<- c("2.5-3.6")}
}

Now, we just need to add “fill” to the aesthetics section of the code, specify the column with the classifications, and add “alpha” to make the color transparent in order to see the shapes of the graphs and whether they have overlaps.

ggplot(example1, aes(x=example1$period_ave_Y,fill =SD_class))+
  geom_density(alpha=0.4)+
  theme(panel.background = element_rect(fill = 'white'),axis.line = element_line(size = 0.5, linetype = "solid",colour = "black"),
        axis.text=element_text(size=16),axis.title=element_text(size=16,face="bold"),plot.title = element_text(size = 20, face = "bold"),
        legend.text=element_text(size=13),legend.title=element_text(size=14))+
  labs(title = paste("Density Plot of Regional Average Historical Yield (30 years)"),x = "Winter Wheat Yield (tonnes/ha)", y = "Density", color="black")

We can also use the “facet_grid()” option, like the plot in Part (1), and specify the column with classification to show each of these classes in a separate panel.

ggplot(example1, aes(x=example1$period_ave_Y,fill =SD_class))+
  geom_density(alpha=0.4)+facet_grid(example1$SD_class ~ .)+
  theme(panel.background = element_rect(fill = 'white'),axis.line = element_line(size = 0.5, linetype = "solid",colour = "black"),
        axis.text=element_text(size=16),axis.title=element_text(size=16,face="bold"),plot.title = element_text(size = 20, face = "bold"),
        legend.text=element_text(size=13),legend.title=element_text(size=14))+
  labs(title = paste("Density Plot of Regional Average Historical Yield (30 years)"),x = "Winter Wheat Yield (tonnes/ha)", y = "Density", color="black")

The other interesting variables that we can explore are different percentiles of our data set that correspond to the density plot. For this, we need to obtain the density values (y-axis on the plot) for the percentiles that we are interested in—for example 10%, 25%, 50%, 75%, and 90%. Also we need to find out the actual yield value corresponding to each percentile:

quantiles_yield <- quantile(example1$period_ave_Y, prob=c(0.1, 0.25, 0.5, 0.75, 0.9))
#     10%      25%      50%      75%      90% 
#  4.229513 5.055070 5.582192 5.939071 6.186014

Now, we are going to estimate the density value for each of the yields at the 10th, 25th, 50th, 75th, and 90th percentiles.

df <- approxfun(density(example1$period_ave_Y))

The above function will give us the approximate density value for each point (yield) in which we are interested—in our case, yields for the above percentiles:

df(c(quantiles_yield))
#[1] 0.1176976 0.3267841 0.6129621 0.6615790 0.4345247

Now, we can add several vertical segments to the density plot that show where each percentile is located on this graph. The limits of these segments on the y-axis are based on the density values for each percentile that we got above. Also, note that I used those values to adjust the positions of the labels for the segments.

ggplot()+ 
      geom_density(aes(x=example1$period_ave_Y),fill="blue",alpha=0.4) + 
    geom_segment(aes(x=quantiles_yield, y=0, xend =quantiles_yield,
                     yend= df(c(quantiles_yield))),size=1,colour =c("red","green","blue","purple","orange"),linetype='dashed')+
      theme(panel.background = element_rect(fill = 'white'),axis.line = element_line(size = 0.5, linetype = "solid",colour = "black"),
            axis.text=element_text(size=16),axis.title=element_text(size=16,face="bold"),plot.title = element_text(size = 20, face = "bold"),
            legend.text=element_text(size=13),legend.title=element_text(size=14))+
      labs(title = paste("Density Plot of Regional Average Historical Yield (30 years) and Percentiles"),x = "Winter Wheat Yield (tonnes/ha)", y = "Density", color="black")+
    annotate("text", x=4.229513, y=0.15, label=paste("10%"),size=5)+
    annotate("text", x=5.055070, y=0.36, label=paste("25%"),size=5)+
    annotate("text", x=5.582192, y=0.65, label=paste("50%"),size=5)+
    annotate("text", x=5.939071, y=0.7, label=paste("75%"),size=5)+
    annotate("text", x=6.186014, y=0.47, label=paste("90%"),size=5)