Monthly Archives: March 2012

Using col parameter in plot() to display third variable

I’ve been working on a project recently where I needed to produce a xy scatterplot, and I wanted to plot each of the data points a different color based on a third variable (lets call it “z”). I developed a quick and easy utility to do this. It seems to work pretty well! I’m still working out how to add a colorramp to plot() to displays the values of “z”, but in the mean time, I’ll show you what I’ve come up with:

First, I create a vector of colors. Below is a nice color scale I use often; it goes from black to blue to purple, orange and finally a deep red:

plotclr <- c(colorRampPalette(c("black", "blue"))(50),
             colorRampPalette(c("blue", "purple", "orange"))(55),
             colorRampPalette(c("orange", "red", "darkred"))(70))
 

Here’s an image of the ramp (in case you were wondering):

Colorramp produced from above script.

Next, I create a utility function to scale the third variable, z, to between 0 and 1. Then I multiply z_scl by the length of plotclr (50+55+70 = 175), and make sure that the smallest value in color_scl (ie. 0) is set to 1 so that I can use color_scl as an index for plotclr:

z_scl <- (z - min(z, na.rm=T))/(max(z, na.rm=T) - min(z, na.rm=T))
color_scl = round(z_scl*length(plotclr))
color_scl[color_scl == 0] = 1
 

Now, lets plot something up with it:

 ### Fake data
 x = seq(0.1, 10, 0.5) + rnorm (length(seq(0.1, 10, 0.5)), 0.25)
 y = seq(0.1, 10, 0.5) + rnorm (length(seq(0.1, 10, 0.5)), 0.25)
 z = seq(0.1, 10, 0.5) + rnorm (length(seq(0.1, 10, 0.5)), 0.25)
 
 ### Plotting
 plot(x, y, type = "n") # create new plot
 z_scl <- (z - min(z, na.rm=T))/(max(z, na.rm=T) - min(z, na.rm=T))
 color_scl = round(z_scl*length(plotclr))
 color_scl[color_scl == 0] = 1

 # Loop to plot each point
 for(i in 1:length(x)){ 
   points(x[i], y[i], pch = 20, col = plotclr[color_scl[i]], cex = 2.5)
 }
 ### End of Plotting
 

Here’s the final result:

xy scatterplot with the color of each point based on a third variable.

I should mention that by using a colorramp, the reader may have some difficulty determining the exact value of the third variable for the point, but as long as that is OK, then I think this is a pretty nifty way of plotting three variables together! Till next time…