A dendrogram is an effective way of visualizing results from hierarchical clustering. The purpose of this post is to show how to make a basic dendrogram in R and illustrate the ways in which one can add colors to dendrogram labels and branches to help identify key clustering drivers. Making dendrograms in R is quite straightforward. However, customizing a dendrogram is not so straightforward, so this post shows some tricks that I learned and should help expedite the process!
First and foremost, your data must be in an appropriate from for hierarchical clustering to be conducted. Table 1 shows an example of how your data can be set up. Four different spatial temperatures projected by CMIP5 models are shown along with various attributes that could be potential driving forces behind clustering: the institution at which the model comes from, the RCP (radiative forcing scenario) used in the model, and the initial conditions with which the model was run.
At this point, it is helpful to add the model names as the row names (shown in the leftmost column) of your data frame, otherwise the dendrogram function will use the row number as a label on the dendrogram which can make it hard to interpret the clustering results.
Next, create a distance matrix, which will be composed of Euclidean distances between pairs of model projections. This is what clustering will be based on. We first create a new data frame composed of just the temperature values (shown below) by removing columns from the Model Attributes table.
The following code can be used to create Table 2 from the original table and then the distance matrix.
#Create a new data frame with just temperature values just_temperature=Model_Attributes[ -c(1:4) ] #Create a distance matrix d=dist(just_temperature)
Now, one can make the clustering diagram. Here I chose to use complete linkage clustering as the agglomeration method and wanted my dendrogram to be horizontal.
#Perform clustering complete_linkage_cluster=as.dendrogram(hclust(d,method="complete")) #Adjust dimensions of dendrogram so that it fits in plotting window par(mar=c(3,4,1,15)) plot(complete_linkage_cluster,horiz =TRUE)
And that’s it! Here is the most basic dendrogram.
Now for customization. You will first need to install the “dendextend” library in R.
We have 11 institutions that the models can come from and we want to visualize if institution has some impact on clustering, by assigning a color to the label. Here we use the rainbow color palette to assign each model a color and then replot the dendrogram.
library(dendextend) #Create a vector of colors with one color for each institution col=rainbow(max(Model_Attributes$Institution)) #Add colors to the ordered dendrogram labels_colors(complete_linkage_cluster)= col[Model_Attributes$Institution][order.dendrogram(complete_linkage_cluster)] #Replot the dendrogram par(mar=c(3,4,1,15)) #Dendrogram parameters plot(complete_linkage_cluster,horiz =TRUE)
Now suppose we wanted to change the branch colors to show what RCP each model was run with. Here, we assign a color from the rainbow palette to each of the four RCPs and add it to the dendrogram.
col=rainbow(max(Model_Attributes$RCP)) col_branches= col[Model_Attributes$RCP][order.dendrogram(complete_linkage_cluster)] colored_dendrogram=color_branches(complete_linkage_cluster,col=col_branches) par(mar=c(3,4,1,15)) plot(colored_dendrogram,horiz =TRUE)
Now finally, we can change the node shapes to reflect the initial condition. There are 10 total initial conditions, so we’re going to use the first 10 standard pch (plot character) elements to represent the individual nodes.
pch=c(1:max(Model_Attributes$Initial_Conditions)) nodes=pch[Model_Attributes$Initial_Conditions[order.dendrogram(complete_linkage_cluster)] nodePar = list(lab.cex = 0.6, pch = c(NA,19),cex = 0.7, col = "black") #node parameters dend1 = colored_dendrogram %>% set("leaves_pch", c(nodes)) par(mar=c(3,4,1,15)) plot(dend1,horiz =TRUE)
And that’s how you customize a dendrogram in R!