Today I’d like to share some simple code we worked up with the help of CU graduate student Rebecca Smith, for creating parallel coordinate plots in R. The code plots a Pareto approximate set in light gray, and on top of it is placed several highlighted solutions. Showing 3 highlighted solutions really gives a nice view of the tradeoffs in many dimensions simultaneously. You can follow the colored lines and see how objectives and decisions interact, and having the colored solutions above the full tradeoff gives you a sense of how individual solutions compare to the full tradeoff set (i.e., is a solution the lowest in the set? The highest?) I would show you an example, but, well, that is left to the reader 🙂
I’ll post the code here and then explain below.
### #R template to make Parallel Coordinate plots with the full set in gray and highlighted solutions in color # #by Rebecca Smith and Joe Kasprzyk ### # #User defined parameters # #the excel file containing the data, with headers and rows myFilename="an-excel-file.xlsx" #the file folder you're working in myWorkingDirectory="D:/Data/Folder/" #a vector of the indices you want to plot, in order objIndices=c(16, 2, 11, 5, 9, 17) #highlighted solutions. The default is 3, more or less and you have to change the code below sol1.row = 125 sol1.color = "blue" sol2.row = 109 #the red one sol2.color = "red" sol3.row = 179 #the green one sol3.color = "green" # #The code is below. You should have to make minimal changes below this point! # setwd(myWorkingDirectory) options(java.parameters = "-Xmx4g") require(XLConnect) library(MASS) wb1 = loadWorkbook(myFilename) archive=as.matrix(readWorksheet(wb1,sheet="Sheet1",header = TRUE)) N=length(archive[,1]) #save the specific data you want to plot par.plot.arch=archive[,objIndices] #Concatenate a matrix that starts with the rows of all the solutions, #then appends the specific solutions you want. par.plot.arch.sol.1.2.3=rbind(par.plot.arch,par.plot.arch[sol1.row,],par.plot.arch[sol2.row,],par.plot.arch[sol3.row,]) #parcoord takes the following arguments: #the data #col is a vector of the colors you want to plot #lwd is a vector of values that will be assigned to line weight #lty is a vector of values that will be assigned to line type #note, below that rep.int repeats the integer the specified number of times. Basically we are telling R to make the first N values #gray, and then the rest are assigned a specific colour. #including linetype... #parcoord(par.plot.arch.sol.1.2.3, col=c(rep.int("grey",N),sol1.color,sol2.color,sol3.color), lwd=c(rep.int(1,N),6,6,6),lty=c(rep.int(1,N),2,2,2),var.label=TRUE) #without linetype parcoord(par.plot.arch.sol.1.2.3, col=c(rep.int("grey",N),sol1.color,sol2.color,sol3.color), lwd=c(rep.int(1,N),6,6,6),var.label=TRUE)
Most of my comments are in the comments above, but a few quick thoughts.
First, the code is separated into stuff the user can change, and stuff the user shouldn’t have to play around with too much. One big thing is that the number of ‘highlighted’ solutions is hardcoded to 3. It is pretty simple to add more, you just have to make sure that you add the new entities everywhere where they belong (i.e., in concatenating the data, and also in the plotting).
Second, play around with the ordering of the columns for the best effect. You’ll see that R uses a c construction for you to make a vector with columns indices in any order.
Finally, remember you can also use the col option in parcoord to plot a real-valued color spectrum. We have found, though, this is kind of difficult to do if there isn’t a clear monotonic relationship between the colored variable and the others.
That’s all for now but feel free to comment below!
Pingback: Programming language R is gaining prominence in the scientific community | Water Programming: A Collaborative Research Blog
Pingback: Water Programming Blog Guide (Part I) – Water Programming: A Collaborative Research Blog