Visuals vs. Visual Analytics
How do visuals differ from visual analytics? In a scientific sense, a visual is a broad term for any picture, illustration, or graph that can be used to convey an idea. However, visual analytics is more than just generating a graph of complex data and handing it to a decision maker. Visual analytic tools help create graphs that allow the user to interact with the data, whether that involves manipulating a graph in three-dimensional space or allowing users to filter or brush for solutions that match certain criteria. Ultimately, visual analytics seeks to help in making decisions as fast as possible and to “enable learning through continual [problem] reformulation” (Woodruff et al., 2013) by presenting large data sets in an organized way so that the user can better recognize patterns and make inferences.
My goal with this blog post is to introduce two R libraries that are particularly useful to develop interactive graphs that will allow for better exploration of a three-dimensional space. I have found that documentation on these libraries and potential errors was sparse, so this post will consolidate my hours of Stack Overflow searching into a step-by-step process to produce beautiful graphs!
Use rgl to create a GIF of a 3D graph
Spinning graphs can be especially useful to visualize a 3D Pareto front and a nice visualization for a Power Point presentation. I will be using an example three-objective Pareto set from Julie’s work on the Red River Basin for this tutorial. The script has been broken down and explained in the following sections.
#Set working directory setwd("C:/Users/Me/Folder/Blog_Post_1") #Read in in csv of pareto set data=read.csv("pareto.csv") #Create three vectors for the three objectives hydropower=data$WcAvgHydro deficit=data$WcAvgDeficit flood=data$WcAvgFlood
In this first block of code, the working directory is set, the data set is imported from a CSV file, and each column of the data frame is saved as a vector that is conveniently named. Now we will generate the plot.
#call the rgl library library(rgl) #Adjust the size of the window par3d(windowRect=c(0,0,500,500))
If the rgl package isn’t installed on your computer yet, simply type install.packages(“rgl”) into the console. Otherwise, use the library function in line 2 to call the rgl package. The next line of code is used to adjust the window that the graph will pop up in. The default window is very small and as such, the movie will have a small resolution if the window is not adjusted!
#Plot the set in 3D space plot3d(hydropower,deficit,flood,col=brewer.pal(8,"Blues"), size=2, type='s', alpha=0.75)
Let’s plot these data in 3D space. The first three components of the plot3d function are the x,y, and z vectors respectively. The rest of the parameters are subject to your personal preference. I used the Color Brewer (install package “RColorBrewer”) to color the data points in different blue gradients. The first value is the number of colors that you want, and the second value is the color set. Color Brewer sets can be found here: http://www.datavis.ca/sasmac/brewerpal.html. My choice of colors is random, so I opted not to create a color scale. Creating a color scale is more involved in rgl. One option is to split your data into classes and to use legend3d and the cut function to cut your legend into color levels. Unfortunately, there simply isn’t an easy way to create a color scale in rgl. Finally, I wanted my data points to be spheres, of size 2, that were 50% transparent, which is specified with type, size, and alpha respectively. Plot3d will open a window with your graph. You can use your mouse to rotate it.
Now, let’s make a movie of the graph. The movie3d function requires that you install ImageMagick, a software that allows you to create a GIF from stitching together multiple pictures. ImageMagick also has cool functionalities like editing, resizing, and layering pictures. It can be installed into your computer through R using the first two lines of code below. Make sure not to re-run these lines once ImageMagick is installed. Note that ImageMagick doesn’t have to be installed in your directory, just on your computer.
require(installr) install.ImageMagick() #Create a spinning movie of your plot movie3d(spin3d(axis = c(0, 0, 1)), duration = 20, dir = getwd())
Finally, the last line of code is used to generate the movie. I have specified that I want the plot to spin about the z axis, specified a duration (you can play around with the number to see what suits your data), and that I want the movie to be saved in my current working directory. The resulting GIF is below. If the GIF has stopped running, reload the page and scroll down to this section again.
I have found that creating the movie can be a bit finicky and the last step is where errors usually occur. When you execute your code, make sure that you keep the plot window open while ImageMagick stitches together the snapshots otherwise you will get an error. If you have errors, please feel free to share because I most likely had them at one point and was able to ultimately fix them.
Overall, I found this package to be useful for a quick overview of the 3D space, but I wasn’t pleased with the way the axes values and titles overlap sometimes when the graph spins. The way to work around this is to set the labels and title to NULL and insert your own non-moving labels and title when you add the GIF to a PowerPoint presentation.
Use plotly to create an interactive scatter
setwd("C:/Users/Me/Folder/Blog_Post_1") library(plotly) library(ggplot2) #Set environment variables Sys.setenv("plotly_username"="rg727") Sys.setenv("plotly_api_key"="insert key here") #Read in pareto set data pareto=read.csv("ieee_synthetic_thinned.csv")
Set the working directory, install the relevant libraries, set the environment variables and load in the data set. Be sure to insert your API key. You will need to regenerate a new key every time you make a new graph. Also, note that your data must be in the form of a data frame for plotly to work.
#Plot your data plot= plot_ly(pareto, x = ~WcAvgHydro, y = ~WcAvgDeficit, z = ~WcAvgFlood, color = ~WcAvgFlood, colors = c('#00d6f7', '#ebfc2f')) add_markers() #Add axes titles layout(title="Pareto Set", scene = list(xaxis = list(title = 'Hydropower'),yaxis = list(title = 'Deficit'), zaxis = list(title = 'Flood'))) #call the name of your plot to appear in the viewer plot
To correctly use the plotly command, the first input needed is the data frame, followed by the column names of the x,y, and z columns in the data frame. Precede each column name with a “~”.
I decided that I wanted the colors to scale with the value of the z variable. The colors were defined using color codes available at http://www.color-hex.com/. Use the layout function to add a main title and axis labels. Finally call the name of your plot and you will see it appear in the viewer at the lower right of your screen.If your viewer shows up blank with only the color scale, click on the viewer or click “zoom”. Depending on how large the data set is, it may take some time for the graph to load.
#Create a link to your chart and call it to launch the window chart_link = api_create(plot, filename = "public-graph") chart_link
Finally create the chart link using the first line of code above and the next line will launch the graph in Plotly. Copy and save the URL and anyone with it can access your graph, even if they don’t have a Plotly account. Play around with the cool capabilities of my Plotly graph, like mousing over points, rotating, and zooming!
Woodruff, M.J., Reed, P.M. & Simpson, T.W. Struct Multidisc Optim (2013) 48: 201. https://doi.org/10.1007/s00158-013-0891-z
James J. Thomas and Kristin A. Cook (Ed.) (2005). Illuminating the Path: The R&D Agenda for Visual Analytics National Visualization and Analytics Center.