karyoploteR tutorial
NB: This vignette illustrates use of karyoploteR
in a GWAS setting. It also depicts the use of some convenience functions I have gathered together in pidProjCode
. Installing what was only meant to be a toy package would be silly, though, so I encourage anyone who thinks a function like back_to_back_manhattan
could be useful to simply copy the source code from my package’s GitHub repo. These days, however (autumn 2024, about three years after first writing this tutorial), I use
locuszoomr
for zoomed-in Manhattan plots and ggplot2
code adapted from some written by Daniel Roelfs for genome-wide Manhattan plots.
library(pidProjCode)
library(GenomicRanges)
library(karyoploteR)
library(data.table)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
As sample data sets we use subsets of four sets of GWAS summary statistics: one of selective IgA deficiency by Bronson et al. (2016), another of antibody-deficient primary immunodeficiency by Thaventhiran et al. (2020), a third of systemic lupus erythematosus by Bentham et al. (2015), and a fourth of ulcerative colitis by de Lange et al. (2017). These contain only SNPs on chromosome 6 from 25Mb to 36Mb (this interval includes the MHC locus).
igadDat <- fread(file = system.file('extdata', 'igadGwas.tsv.gz', package = 'pidProjCode', mustWork = T), sep = '\t', header = T, select = c('SNPID','CHR38','BP38','P'))
pidDat <- fread(file = system.file('extdata', 'pidGwas.tsv.gz', package = 'pidProjCode', mustWork = T), sep = '\t', header = T, select = c('SNPID','CHR38','BP38','P'))
sleDat <- fread(file = system.file('extdata', 'sleGwas.tsv.gz', package = 'pidProjCode', mustWork = T), sep = '\t', header = T, select = c('SNPID','CHR38','BP38','P'))
ucDat <- fread(file = system.file('extdata', 'ucGwas.tsv.gz', package = 'pidProjCode', mustWork = T), sep = '\t', header = T, select = c('SNPID','CHR38','BP38','P'))
# GenomicRanges require that chromosome seqnames have the prefix 'chr'
igadDat[ , CHR38 := paste0('chr', CHR38)]
pidDat[ , CHR38 := paste0('chr', CHR38)]
sleDat[ , CHR38 := paste0('chr', CHR38)]
ucDat[ , CHR38 := paste0('chr', CHR38)]
karyoploteR
requires you pass it genomic data in the form of GenomicRanges
objects. The GenomicRanges
package provides the makeGRangesFromDataFrame
function to do this.
A useful feature of karyoploteR
is the highlighting of specified SNPs. These need to be given in a GenomicRanges
object, too, so we create topSnpGRanges
to hold these (in this case just a single SNP). For a Manhattan plot we want a y axis on the negative log10 scale, but karyoploteR::kpPoints
doesn’t seem to respect this rescaling, so we specify the y coordinate as an extra metadata column in the GenomicRanges
object
igadGRanges <- makeGRangesFromDataFrame(data.frame(igadDat), start.field = 'BP38', end.field = 'BP38', seqnames.field = 'CHR38', ignore.strand = T, keep.extra.columns = T)
pidGRanges <- makeGRangesFromDataFrame(data.frame(pidDat), start.field = 'BP38', end.field = 'BP38', seqnames.field = 'CHR38', ignore.strand = T, keep.extra.columns = T)
sleGRanges <- makeGRangesFromDataFrame(data.frame(sleDat), start.field = 'BP38', end.field = 'BP38', seqnames.field = 'CHR38', ignore.strand = T, keep.extra.columns = T)
ucGRanges <- makeGRangesFromDataFrame(data.frame(ucDat), start.field = 'BP38', end.field = 'BP38', seqnames.field = 'CHR38', ignore.strand = T, keep.extra.columns = T)
topSnps <- c('rs116540075')
topSnpGRanges <- makeGRangesFromDataFrame(data.frame(igadDat[SNPID %in% topSnps][, y:= -log10(P)]), start.field = 'BP38', end.field = 'BP38', seqnames.field = 'CHR38', ignore.strand = T, keep.extra.columns = T)
The first reference for use of karyoploteR
should be the docs themselves. In particular, I relied on the author’s Manhattan tutorial but some of the details are now outdated, such as its use of the annotations from the hg19
draft genome. Look at that page first to see the full array of Manhattan plot functionality offered by this package.
I spent some hours muddling through the process of adapting the examples given in the tutorial to my own use case. Copying my code here rather than that in the tutorial should hopefully spare you some of the trouble I went through getting the tutorial code to work.
ymax <- 137
r0 <- 0
cex.main <- 1.3
cex.tick <- 0.75
pp <- getDefaultPlotParams(plot.type=4)
pp$leftmargin <- 0.08
pp$rightmargin <- 0.08
kp <- plotKaryotype(plot.type = 4, labels.plotter = NULL, zoom = "chr6:25e6-36e6", plot.params = pp)
title(main = 'IgAD GWAS, chr6:25M-36M, MHC boxed in blue,\ntop extra-MHC SNP rs116540075 circled in red', cex.main = cex.main)
kpAddBaseNumbers(kp, add.units = T, cex = cex.tick, tick.dist = 1e6)
kpAxis(kp, ymin = 0,ymax = ymax, r0 = r0, cex = cex.tick)
# karyoploteR looks for a column with the name 'p' or 'P' in the metadata columns of the GenomicRanges object. Alternatively, y coordinates can be passed as a vector to the argument 'pval' (see the docs).
kp <- kpPlotManhattan(kp, data = igadGRanges, points.col = 'brewer.set3', ymax = ymax, r0 = r0)
kpRect(kp, chr = 'chr6', x0 = 28510120, x1 = 33480577,r0 = r0, y0 = 0,y1 = 1, col = NA, border = 'blue', lwd = 3)
kp <- kpPoints(kp, data = topSnpGRanges, pch = 1, cex = 1.6, col = "red", lwd = 2, ymax = ymax, r0 = r0)
ymax <- 38
r0 <- 0.25
kp <- plotKaryotype(plot.type = 4, labels.plotter = NULL, zoom = "chr6:33e6-34e6", plot.params = pp)
title(main = 'IgAD GWAS, chr6:33M-34M, neighbourhood of\ntop extra-MHC SNP rs116540075 (circled in red)', cex.main = cex.main)
kpAddBaseNumbers(kp, add.units = T, cex = cex.tick, tick.dist = 1e5)
kpAxis(kp, ymin = 0,ymax = ymax, r0 = r0)
kp <- kpPlotManhattan(kp, data = igadGRanges, points.col = 'brewer.set3', ymax = ymax, r0 = r0)
kpRect(kp, chr = 'chr6', x0 = 33e6, x1 = 33480577,r0 = r0, y0 = 0,y1 = 1, col = NA, border = 'blue', lwd = 3)
kp <- kpPoints(kp, data = topSnpGRanges, pch = 1, cex = 1.6, col = "red", lwd = 2, ymax = ymax, r0 = r0)
# Needed to install org.Hs.eg.db with BiocManager::install to get this running
genes.data <- makeGenesDataFromTxDb(karyoplot = kp, txdb = TxDb.Hsapiens.UCSC.hg38.knownGene)
genes.data <- addGeneNames(genes.data)
genes.data.merged <- mergeTranscripts(genes.data)
kp <- kpPlotGenes(kp, data = genes.data.merged, r0 = 0, r1 = 0.2, gene.name.cex = 0.8, gene.name.position = 'left')
cex.label <- 1.0
kp <- plotKaryotype(plot.type = 4, labels.plotter = NULL, zoom = "chr6:25e6-36e6", plot.params = pp)
kpAddBaseNumbers(kp, add.units = T, cex = cex.tick, tick.dist = 1e6)
title(main = 'IgAD and PID GWAS, chr6:25M-36M', cex.main = cex.main)
kpAddLabels(kp, labels = "IgAD", srt = 90, pos = 3, r0 = 0.7, r1 = 1, cex = cex.label, label.margin = 0.025)
kpAxis(kp, ymin = 0, ymax = 40, r0 = 0.5)
kp <- kpPlotManhattan(kp, data = igadGRanges, r0 = 0.5, r1 = 1, ymax = 40)
kpAddLabels(kp, labels = "PID", srt = 90, pos = 3, r0 = 0, r1 = 0.3, cex = cex.label, label.margin = 0.025)
kpAxis(kp, ymin = 0, ymax = 10, r0 = 0.5, r1 = 0)
kp <- kpPlotManhattan(kp, data = pidGRanges, r0 = 0.5, r1 = 0, ymax = 10, points.col = "2blues")
pidGRanges<-subset(pidGRanges, -log10(P) < 10)
sleGRanges<-subset(sleGRanges, -log10(P) < 10)
ucGRanges<-subset(ucGRanges, -log10(P) < 10)
igadGRanges<-subset(igadGRanges, -log10(P) < 10)
cex.label <- 1
ymax <- 10
label.margin <- 0.05
# The % of the total plotting space to leave in each margin between the tracks
autotrack.margin <- 0.18
kp <- plotKaryotype(plot.type = 4, labels.plotter = NULL, zoom = "chr6:25e6-36e6", plot.params = pp)
kpAddBaseNumbers(kp, add.units = T, cex = cex.tick, tick.dist = 1e6)
# If plotting genome-wide loci, replace the call to kpAddBaseNumbers with the line below
# kpAddChromosomeNames(kp, col = 'black', srt = 90, cex = 2)
title(main = 'IgAD, PID, UC, and SLE GWAS, chr6:25M-36M', cex.main = cex.main)
# Note that the last keyword argument, which specifies that this plot should occupy position 1 of 4
# kpPlotManhattan will look for a column labelled 'p' or 'P' in the GenomicRanges object, but you can specify it explicitly by passing it as a vector to the 'pval' argument
# Note the points.col argument; I recommend brewer.set3 when looking at the whole genome because the points from different chromosomes will be coloured distinctly, although we can't see that here as I have data from only one chromosome
at<-autotrack(1, 4, margin = autotrack.margin)
kp <- kpPlotManhattan(kp, data = pidGRanges, pval = pidGRanges$P, points.col = 'brewer.set3', r0 = at$r0, r1 = at$r1, ymax = ymax)
kpAddLabels(kp, labels = 'PID', srt = 90, pos = 3, r0 = at$r0, r1 = at$r1, cex = cex.label, label.margin = label.margin)
kpAxis(kp, ymin = 0, ymax = ymax, r0 = at$r0, r1 = at$r1)
at<-autotrack(2, 4, margin = autotrack.margin)
kp <- kpPlotManhattan(kp, data = igadGRanges, points.col = 'brewer.set3', r0 = at$r0, r1 = at$r1, ymax = ymax)
kpAddLabels(kp, labels = 'IgAD', srt = 90, pos = 3, r0 = at$r0, r1 = at$r1, cex = cex.label, label.margin = label.margin)
kpAxis(kp, ymin = 0, ymax = ymax, r0 = at$r0, r1 = at$r1)
at<-autotrack(3, 4, margin = autotrack.margin)
kp <- kpPlotManhattan(kp, data = sleGRanges, points.col = 'brewer.set3', r0 = at$r0, r1 = at$r1, ymax = ymax)
kpAddLabels(kp, labels = 'SLE', srt = 90, pos = 3, r0 = at$r0, r1 = at$r1, cex = cex.label, label.margin = label.margin)
kpAxis(kp, ymin = 0, ymax = ymax, r0 = at$r0, r1 = at$r1)
at<-autotrack(4, 4, margin = autotrack.margin)
kp <- kpPlotManhattan(kp, data = ucGRanges, points.col = 'brewer.set3', r0 = at$r0, r1 = at$r1, ymax = ymax)
kpAddLabels(kp, labels = 'UC', srt = 90, pos = 3, r0 = at$r0, r1 = at$r1, cex = cex.label, label.margin = label.margin)
kpAxis(kp, ymin = 0, ymax = ymax, r0 = at$r0, r1 = at$r1)
kp <- plotKaryotype(plot.type = 4, labels.plotter = NULL, plot.params = pp)
kpAddChromosomeNames(kp, col = 'black', srt = 90, cex = 1)
title(main = 'IgAD, PID, UC, and SLE GWAS', cex.main = cex.main)
at<-autotrack(1, 4, margin = autotrack.margin)
kp <- kpPlotManhattan(kp, data = pidGRanges, pval = pidGRanges$P, points.col = 'brewer.set3', r0 = at$r0, r1 = at$r1, ymax = ymax)
kpAddLabels(kp, labels = 'PID', srt = 90, pos = 3, r0 = at$r0, r1 = at$r1, cex = cex.label, label.margin = label.margin)
kpAxis(kp, ymin = 0, ymax = ymax, r0 = at$r0, r1 = at$r1)
at<-autotrack(2, 4, margin = autotrack.margin)
kp <- kpPlotManhattan(kp, data = igadGRanges, points.col = 'brewer.set3', r0 = at$r0, r1 = at$r1, ymax = ymax)
kpAddLabels(kp, labels = 'IgAD', srt = 90, pos = 3, r0 = at$r0, r1 = at$r1, cex = cex.label, label.margin = label.margin)
kpAxis(kp, ymin = 0, ymax = ymax, r0 = at$r0, r1 = at$r1)
at<-autotrack(3, 4, margin = autotrack.margin)
kp <- kpPlotManhattan(kp, data = sleGRanges, points.col = 'brewer.set3', r0 = at$r0, r1 = at$r1, ymax = ymax)
kpAddLabels(kp, labels = 'SLE', srt = 90, pos = 3, r0 = at$r0, r1 = at$r1, cex = cex.label, label.margin = label.margin)
kpAxis(kp, ymin = 0, ymax = ymax, r0 = at$r0, r1 = at$r1)
at<-autotrack(4, 4, margin = autotrack.margin)
kp <- kpPlotManhattan(kp, data = ucGRanges, points.col = 'brewer.set3', r0 = at$r0, r1 = at$r1, ymax = ymax)
kpAddLabels(kp, labels = 'UC', srt = 90, pos = 3, r0 = at$r0, r1 = at$r1, cex = cex.label, label.margin = label.margin)
kpAxis(kp, ymin = 0, ymax = ymax, r0 = at$r0, r1 = at$r1)
Adding legends
As discussed here, legends can be added as usual when working with base graphics.
kp <- plotKaryotype(plot.type = 4, labels.plotter = NULL, zoom = "chr6:25e6-36e6", plot.params = pp)
kpAddBaseNumbers(kp, add.units = T, cex = cex.tick, tick.dist = 1e6)
title(main = 'IgAD and PID GWAS, chr6:25M-36M', cex.main = cex.main)
kpAddLabels(kp, labels = "IgAD", srt = 90, pos = 3, r0 = 0.7, r1 = 1, cex = cex.label, label.margin = 0.025)
kpAxis(kp, ymin = 0, ymax = 40, r0 = 0.5)
kp <- kpPlotManhattan(kp, data = igadGRanges, r0 = 0.5, r1 = 1, ymax = 40)
kpAddLabels(kp, labels = "PID", srt = 90, pos = 3, r0 = 0, r1 = 0.3, cex = cex.label, label.margin = 0.025)
kpAxis(kp, ymin = 0, ymax = 10, r0 = 0.5, r1 = 0)
kp <- kpPlotManhattan(kp, data = pidGRanges, r0 = 0.5, r1 = 0, ymax = 10, points.col = "2blues")
legend(x = "topright", fill = c("#6caeff", "#2b5d9b"), legend = c("IgAD", "PID"))
## back_to_back_manhattan
I have written a convenience function, back_to_back_manhattan
, for drawing ‘back-to-back’ Manhattan plots with karyoploteR
. These plots allow one to compare results from two GWAS. The function itself does nothing clever and was written merely to save my having to copy and tweak the same boilerplate plotting code over and over.
In my own work I have had cause to plot (transformed) GWAS results in this manner in order to assess two competing conditional FDR methods, cfdr
and fcfdr
. These condition p-values for the association of SNPs with a principal trait upon p-values for the association of the same SNPs with some informative auxiliary trait, and produce p-value-like ‘v-values’. In this use case, it can be useful to have to hand the original principal trait p-values, so back_to_back_manhattan
allows one to specify a third Manhattan.
With back_to_back_manhattan
one can draw: - whole-genome Manhattans - results from a selection of chromosomes (see the chromosomes
argument) - a Manhattan which magnifies an interval on a single chromosome (see the zoom
argument)
In addition, the plot_genes
flag has karyoploteR
draw gene structures on the lowermost track. At the moment this works only when the zoom
argument is specified because it takes a long time to draw the genes and the larger the interval, the longer the plotting time. Varying the size of the interval in on which you have zoom
ed should give you a sense of what is a sensible interval. Longer intervals also lead to the gene track getting rather busy, which is no help if you are trying to pick out the genes in which your SNPs of interest lie.
label.margin <- 0.07
chrom.names.cex <- 0.9
pp <- getDefaultPlotParams(plot.type = 4)
pp$leftmargin <- 0.1
pp$rightmargin <- 0.1
# Can't get ymax to work atm
igadGRanges <- subset(igadGRanges, P > 1e-15)
sleGRanges <- subset(sleGRanges, P > 1e-15)
back_to_back_manhattan(pidGRanges, igadGRanges, top_label='PID', bottom_label='IgAD', main='PID and IgAD', axis_label_cex=1, axis_label_margin = label.margin, main_title_cex = cex.main, plot_params = pp, chrom_names_cex = chrom.names.cex, ymax = 15)
multitrack_manhattan
This function allows one to draw a plot of stacked Manhattans, all regularly oriented.
multitrack_manhattan(gRanges = list(pidGRanges, igadGRanges, sleGRanges, ucGRanges),
axis_labels = c('PID', 'IgAD', 'SLE', 'UC'),
main = 'PID, IgAD, SLE, and UC',
chrom_names_cex = 1.1,
track_margin = 0.18,
main_title_cex = 1.1,
axis_label_margin = 0.06,
axis_tick_cex = 0.7,
axis_label_cex = 1.5,
plot_params = pp)
multitrack_manhattan(gRanges = list(pidGRanges, igadGRanges, sleGRanges, ucGRanges),
axis_labels = c('PID', 'IgAD', 'SLE', 'UC'),
chromosomes = 'chr6',
main = 'PID, IgAD, SLE, and UC, chr6',
chrom_names_cex = 0.9,
track_margin = 0.18,
main_title_cex = 1.1,
axis_label_margin = 0.06,
axis_label_cex = 1.5,
axis_tick_cex = 0.7,
chrom_tick_dist = 5e7,
plot_params = pp)
Saving time when fiddling with parameters
A process of trial and error is usually necessary to tune the parameters of a karyoploteR
plot. Given that the number of points on a Manhattan plot routinely runs into the millions, if the parameters you wish to tune do not relate to the points (e.g. point size, style, or colour) you can save time by passing an empty GenomicRanges
object with a single (empty) P
metadata column. The plot will then be drawn without points, which should save a considerable amount of time and allow you to iterate more quickly.
axis_label_margin <- -0.05
emptyGRanges <- GRanges(P = numeric())
multitrack_manhattan(list(emptyGRanges), axis_labels = list('-log10(p)'), main = 'The virtues of an empty GenomicRanges object', axis_label_margin = axis_label_margin, chrom_names_cex = 0.9, track_margin = 0.18, main_title_cex = 1.1, axis_label_cex = 1.5, axis_tick_cex = 0.7, chrom_tick_dist = 5e7, plot_params = pp)
#> Warning in min(data$y): no non-missing arguments to min; returning Inf
Boooooo! Our axis label is in the wrong place.
axis_label_margin <- 0.05
multitrack_manhattan(list(emptyGRanges), axis_labels = list('-log10(p)'), main = 'The virtues of an empty GenomicRanges object', axis_label_margin = axis_label_margin, chrom_names_cex = 0.9, track_margin = 0.18, main_title_cex = 1.1, axis_label_cex = 1.5, axis_tick_cex = 0.7, chrom_tick_dist = 5e7, plot_params = pp)
#> Warning in min(data$y): no non-missing arguments to min; returning Inf
That’s better.