Skip to content

NicolasH2/ggdendroplot

Repository files navigation

ggdendroplot

An R package that draws highly modifiable dendrograms in ggplot2. The dendrogram can easily be modified and added to an existing ggplot object. ggdendroplot takes as an input the output of the R stats function hclust(). It vizualizes the clustering using ggplot2's geom_path layers.

Installation

Install the ggdendroplot package from the git repository:

devtools::install_github("solatar/ggdendroplot")

Default dendrogram

We build a random example matrix, called df. We use the functions dist and hclust from base R to get hclust objects. We cluster rows (rowclus) and columns (colclus) individually. You can change the distance matrix and also clustering algorithm by checking out the respective functions' help pages (?dist and ?hclust). Then we can directly take one of these clusterings and vizualize a dendrogram from it.

library(ggdendroplot)
library(ggplot2)

#a test data.frame, with columns drawing values from 2 different standard distributions
df <- matrix(c(rnorm(64, mean=0), rnorm(64, mean=1)), ncol = 8, dimnames=list(
  rownames=paste0("trait",seq(16)),
  colnames=paste0("sample",seq(8))
))

#perform hierarchical clustering
rowclus <- hclust(dist( df ))    #cluster the rows
colclus <- hclust(dist( t(df) )) #cluster the columns

ggplot() + geom_dendro(colclus)

Often, we dont't just want a dendrogram, but also a heatmap. ggdendroplot provides the function hmReady, which takes the original table and the clustering you made. It uses reshape2 to output a ready-to-plot data.frame. This data.frame has columns x and y for coordinates, and a value column for the color in the heatmap. It also has the columns rowid and variable, which contain the row and column names of the original table. We can supply colclus or rowclus or both to get a dataframe that is clustered accordingly.

Here we only use the column clustering (colclus) as a simple example.

hm <- hmReady(df, colclus=colclus, rowclus=rowclus)

hmplot <- ggplot() + 
  geom_tile(data=hm, aes(x=x, y=y, fill=value)) +
  theme(axis.text.x=element_text(angle=45, hjust=1))
  
print(hmplot)

ggdendroplot also comes with a small function that provides a nice colouring for the heatmap (ggplot's scale_fill_gradient2 could also be used, but its blue turns purple midway through for some reason). When we use it in ggplot's scale_fill_gradientn function and define limits that have its middle at 0, the colors will provide good indication of the cells value.

hmplot <- hmplot + scale_fill_gradientn(colors=hmGradient(), limits=c(-4,4))

print(hmplot)

When we simply add the dendrogram to the plot, we see that it is not in the correct place. We can move it up by specifying the ylim arguement.

hmplot + geom_dendro(colclus)
hmplot + geom_dendro(colclus, ylim=c(17,20))

We can add a second dendrogram that shows the clustering of the rows. For that we have to speficy that it is pointing sideways.

hmplot + 
  geom_dendro(colclus, ylim=c(17, 20)) +
  geom_dendro(rowclus, xlim=c(8.5, 10), pointing="side")

Custom dendrogram

You can tell ggdendroplot to color the clusters according to how they group on a certain level. Imagine a horizonal line being drawn: every cluster below that line has the same color as the cluster it originated from at that line. The integer you provide refers to the cluster level, so for this example we select the 5th cluster, counting from bottom to top. You can customize the colors if you want.

ggplot() + geom_dendro(colclus, dendrocut=5)
ggplot() + geom_dendro(colclus, dendrocut=5, groupCols=c("green","orange","gray20","purple"))

Reverse the order or direction. This happens when you set the limits so that the first limit number (here: 3) is higher than the second (here: 0). In the following example case we reverse the order because we change xlim, while the pointing arguement is in its default state "updown" (you would also achieve a order reversal by changing ylim while pointing="side"). Note that now, the dendrogram will not line up with your heatmap and will give you a false impression, which is why this reversal is only possible when you set the failsafe arguement to FALSE.

ggplot() + geom_dendro(colclus, xlim=c(3,0), failsafe=FALSE)

When we change ylim while pointing=updown", we reverse the direction, which is less problematic (the same for changing xlim while pointing="side").

ggplot() + geom_dendro(colclus, ylim=c(3,0))

You can disable that geom_dendro displays the sample names:

ggplot() + geom_dendro(colclus, axis.labels = F)

You can change the dendrogram in the same way that you would also change a geom_path object. Specifically you can change color, size, linetype and lineend. Possible options for linetype are: solid (default), dotted, dotdash, twodash, dashed, longdash, blank.

ggplot() + geom_dendro(colclus, size=2, color="blue", linetype="dashed")

The lineend arguement introduces suttle changes, effecting only how the ends of the lines look. Possible options are: butt (default), square, round.

ggplot() + geom_dendro(colclus, size=4, lineend="round")