Codeon-lecture-3-2: RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR

Charity Law1, Monther Alhamdoosh2, Shian Su3, Gordon K. Smyth4 and Matthew E. Ritchie5

1The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3052, Melbourne, Australia; Department of Medical Biology, The University of Melbourne, Parkville, VIC 3010, Melbourne, Australia
2CSL Limited, Bio21 Institute, 30 Flemington Road, Parkville, Victoria 3010, Australia
3The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3052, Melbourne, Australia
4The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3052, Melbourne, Australia; School of Mathematics and Statistics, The University of Melbourne, Parkville, VIC 3010, Melbourne, Australia
5The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3052, Melbourne, Australia; Department of Medical Biology, The University of Melbourne, Parkville, VIC 3010, Melbourne, Australia; School of Mathematics and Statistics, The University of Melbourne, Parkville, VIC 3010, Melbourne, Australia\

08/12/2019

1 Follow RNA-seq analysis pipeline from: Law CW, Alhamdoosh M, Su S et al. RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR [version 2; referees: 3 approved]. F1000Research 2016, 5:1408
2 Set-up
3 Data packaging
4 Data pre-processing
5 Differential expression analysis
6 Gene set testing with camera
7 Software and code used

1 Follow RNA-seq analysis pipeline from: Law CW, Alhamdoosh M, Su S et al. RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR [version 2; referees: 3 approved]. F1000Research 2016, 5:1408

(doi: 10.12688/f1000research.9005.2)

This script will create some interactive plots for you to explore that you can access via links on the .html output file.

2 Set-up

if(!require("gplots")) install.packages("gplots")

## Loading required package: gplots

## 
## Attaching package: 'gplots'

## The following object is masked from 'package:stats':
## 
##     lowess

if(!require("R.utils")) install.packages("R.utils")

## Loading required package: R.utils

## Loading required package: R.oo

## Loading required package: R.methodsS3

## R.methodsS3 v1.7.1 (2016-02-15) successfully loaded. See ?R.methodsS3 for help.

## Registered S3 method overwritten by 'R.oo':
##   method        from       
##   throw.default R.methodsS3

## R.oo v1.22.0 (2018-04-21) successfully loaded. See ?R.oo for help.

## 
## Attaching package: 'R.oo'

## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods

## The following objects are masked from 'package:base':
## 
##     attach, detach, gc, load, save

## R.utils v2.8.0 successfully loaded. See ?R.utils for help.

## 
## Attaching package: 'R.utils'

## The following object is masked from 'package:utils':
## 
##     timestamp

## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, inherits, isOpen, nullfile, parse,
##     warnings

if(!require("BiocStyle")) BiocManager::install("BiocStyle")
if(!require("RNAseq123")) BiocManager::install("RNAseq123")

## Loading required package: RNAseq123

## Loading required package: Glimma

## Loading required package: limma

## Loading required package: edgeR

## Loading required package: RColorBrewer

## Loading required package: Mus.musculus

## Loading required package: AnnotationDbi

## Loading required package: stats4

## Loading required package: BiocGenerics

## Loading required package: parallel

## 
## Attaching package: 'BiocGenerics'

## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB

## The following object is masked from 'package:limma':
## 
##     plotMA

## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs

## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
##     dirname, do.call, duplicated, eval, evalq, Filter, Find, get,
##     grep, grepl, intersect, is.unsorted, lapply, Map, mapply, match,
##     mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position,
##     rank, rbind, Reduce, rownames, sapply, setdiff, sort, table,
##     tapply, union, unique, unsplit, which, which.max, which.min

## Loading required package: Biobase

## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.

## Loading required package: IRanges

## Loading required package: S4Vectors

## 
## Attaching package: 'S4Vectors'

## The following object is masked from 'package:gplots':
## 
##     space

## The following object is masked from 'package:base':
## 
##     expand.grid

## 
## Attaching package: 'IRanges'

## The following object is masked from 'package:R.oo':
## 
##     trim

## Loading required package: OrganismDbi

## Loading required package: GenomicFeatures

## Loading required package: GenomeInfoDb

## Loading required package: GenomicRanges

## Loading required package: GO.db

##

## Loading required package: org.Mm.eg.db

##

## Loading required package: TxDb.Mmusculus.UCSC.mm10.knownGene

if (!require("Mus.musculus")) {BiocManager::install("Mus.musculus")}

library(knitr)

# set global variables
options(digits=3)
options(width=90)

url <- "http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE63310&format=file"
utils::download.file(url, destfile="GSE63310_RAW.tar", mode="wb") 
utils::untar("GSE63310_RAW.tar", exdir = ".")
files <- c("GSM1545535_10_6_5_11.txt", "GSM1545536_9_6_5_11.txt", "GSM1545538_purep53.txt",
  "GSM1545539_JMS8-2.txt", "GSM1545540_JMS8-3.txt", "GSM1545541_JMS8-4.txt",
  "GSM1545542_JMS8-5.txt", "GSM1545544_JMS9-P7c.txt", "GSM1545545_JMS9-P8c.txt")
for(i in paste(files, ".gz", sep=""))
  R.utils::gunzip(i, overwrite=TRUE)

Note that this workflow uses the developmental version of the Glimma package, so be ensure to install as follows before running.

BiocManager::install("Glimma", suppressUpdates=TRUE)

## Bioconductor version 3.9 (BiocManager 1.30.4), R 3.6.0 (2019-04-26)

## Installing package(s) 'Glimma'

## installation path not writeable, unable to update packages: actuar, AER, AICcmodavg,
##   analogsea, aplpack, aroma.affymetrix, aroma.core, arulesViz, ATACseqQC, AUCell,
##   aws.ec2metadata, bayesplot, bayestestR, benchr, betareg, BIFIEsurvey, BiocParallel,
##   biocViews, biomaRt, blob, bold, bookdown, boot, bootstrap, brglm, bridgesampling, brms,
##   btergm, Cairo, callr, car, checkmate, ChIPpeakAnno, circlize, class, classInt, clipr,
##   cluster, clusterSEs, cmprsk, coda, codetools, covr, cowplot, coxme, crul, curl, dbplyr,
##   deldir, derfinder, derfinderData, derfinderHelper, derfinderPlot, deSolve, devEMF,
##   devtools, digest, doMC, doParallel, DOSE, doSNOW, dplyr, dqrng, DropletUtils, DT,
##   e1071, edgeR, effects, egg, ellipsis, emmeans, enc, energy, Epi, ergm, ergm.count,
##   ergm.userterms, etm, evaluate, fastICA, feather, fields, flextable, foreach, forecast,
##   foreign, formatR, fpc, future, future.apply, gam, gamlss, GDINA, gdtools, geiger,
##   GenomicAlignments, GenomicFeatures, GenomicScores, geometry, geosphere, ggeffects,
##   ggfortify, ggplot2, ggplotify, ggpubr, git2r, glmnet, googleVis, gridGraphics, gss,
##   gstat, HardyWeinberg, haven, HDF5Array, hdi, hms, httr, insight, IRanges, Iso,
##   iterators, jomo, JuliaCall, KernSmooth, KFAS, kinship2, kmi, knitr, ks, labelled,
##   lattice, lava, lavaan, lfda, lfe, limma, logspline, lpSolve, magick, markdown, MASS,
##   Matrix, maxLik, MBESS, mclust, meta, mets, mgcv, mice, miceadds, micemd, mime, mirtCAT,
##   mlogit, mlr, MPV, MultiAssayExperiment, mvtnorm, nanotime, ndtv, network, nimble, nlme,
##   nnet, nor1mix, numDeriv, officer, OpenMx, openssl, openxlsx, optmatch, optparse,
##   parallelMap, partykit, pbapply, pbmcapply, performance, PerformanceAnalytics, phangorn,
##   phytools, pillar, pkgbuild, plm, plotmo, plotrix, pmml, pmmlTransformations, polspline,
##   polycor, prabclus, pre, prediction, prettydoc, princurve, pROC, processx, profileModel,
##   progress, quantmod, quantreg, randomForestSRC, raster, rasterVis, Rcpp, RcppArmadillo,
##   RcppCCTZ, RcppNumerical, RcppParallel, RCy3, recipes, RedeR, RefManageR, regioneR,
##   regionReport, rematch2, remotes, repr, reprex, reticulate, rgl, Rhdf5lib, RItools,
##   rJava, RJSONIO, rlang, rmarkdown, Rmosek, rngtools, robust, rotl, rpart, rsample,
##   rsconnect, RSpectra, RSQLite, rstan, Rsubread, rsvd, rticles, RTN, rtracklayer,
##   R.utils, rvest, scater, segmented, seriation, servr, Seurat, shinyAce, showtext, sirt,
##   sjlabelled, sjmisc, sjPlot, sjstats, skimr, snakecase, sparsesvd, spatial, spatstat,
##   StanHeaders, statmod, statnet, statnet.common, stringdist, SummarizedExperiment,
##   survival, sva, svglite, systemPipeR, TAM, taxize, tensorflow, tergm, testthat, tibble,
##   tikzDevice, timereg, tinytex, trackViewer, tseries, tsna, TSP,
##   TxDb.Dmelanogaster.UCSC.dm6.ensGene, TxDb.Drerio.UCSC.danRer10.refGene,
##   TxDb.Ggallus.UCSC.galGal5.refGene, TxDb.Hsapiens.UCSC.hg38.knownGene,
##   TxDb.Mmusculus.UCSC.mm10.knownGene, TxDb.Rnorvegicus.UCSC.rn5.refGene,
##   TxDb.Rnorvegicus.UCSC.rn6.refGene, tximport, unitizer, UpSetR, uroot, usethis,
##   VariantAnnotation, vctrs, vdiffr, vioplot, viper, visNetwork, WGCNA, worrms, WriteXLS,
##   xfun, xgboost, XML, xml2, zip, zoo

3 Data packaging

files <- c("GSM1545535_10_6_5_11.txt", "GSM1545536_9_6_5_11.txt", 
   "GSM1545538_purep53.txt", "GSM1545539_JMS8-2.txt", 
   "GSM1545540_JMS8-3.txt", "GSM1545541_JMS8-4.txt", 
   "GSM1545542_JMS8-5.txt", "GSM1545544_JMS9-P7c.txt", 
   "GSM1545545_JMS9-P8c.txt")
read.delim(files[1], nrow=5)

##    EntrezID GeneLength Count
## 1    497097       3634     1
## 2 100503874       3259     0
## 3 100038431       1634     0
## 4     19888       9747     0
## 5     20671       3130     1

library(limma)
library(edgeR)
x <- readDGE(files, columns=c(1,3))
class(x)

## [1] "DGEList"
## attr(,"package")
## [1] "edgeR"

dim(x)

## [1] 27179     9

samplenames <- substring(colnames(x), 12, nchar(colnames(x)))
samplenames

## [1] "10_6_5_11" "9_6_5_11"  "purep53"   "JMS8-2"    "JMS8-3"    "JMS8-4"    "JMS8-5"   
## [8] "JMS9-P7c"  "JMS9-P8c"

colnames(x) <- samplenames
group <- as.factor(c("LP", "ML", "Basal", "Basal", "ML", "LP", 
                     "Basal", "ML", "LP"))
x$samples$group <- group
lane <- as.factor(rep(c("L004","L006","L008"), c(3,4,2)))
x$samples$lane <- lane
x$samples

##                              files group lib.size norm.factors lane
## 10_6_5_11 GSM1545535_10_6_5_11.txt    LP 32863052            1 L004
## 9_6_5_11   GSM1545536_9_6_5_11.txt    ML 35335491            1 L004
## purep53     GSM1545538_purep53.txt Basal 57160817            1 L004
## JMS8-2       GSM1545539_JMS8-2.txt Basal 51368625            1 L006
## JMS8-3       GSM1545540_JMS8-3.txt    ML 75795034            1 L006
## JMS8-4       GSM1545541_JMS8-4.txt    LP 60517657            1 L006
## JMS8-5       GSM1545542_JMS8-5.txt Basal 55086324            1 L006
## JMS9-P7c   GSM1545544_JMS9-P7c.txt    ML 21311068            1 L008
## JMS9-P8c   GSM1545545_JMS9-P8c.txt    LP 19958838            1 L008

library(Mus.musculus)
geneid <- rownames(x)
genes <- select(Mus.musculus, keys=geneid, columns=c("SYMBOL", "TXCHROM"), 
                keytype="ENTREZID")
head(genes)

##    ENTREZID  SYMBOL TXCHROM
## 1    497097    Xkr4    chr1
## 2 100503874 Gm19938    <NA>
## 3 100038431 Gm10568    <NA>
## 4     19888     Rp1    chr1
## 5     20671   Sox17    chr1
## 6     27395  Mrpl15    chr1

genes <- genes[!duplicated(genes$ENTREZID),]

x$genes <- genes

## An object of class "DGEList"
## $samples
##                              files group lib.size norm.factors lane
## 10_6_5_11 GSM1545535_10_6_5_11.txt    LP 32863052            1 L004
## 9_6_5_11   GSM1545536_9_6_5_11.txt    ML 35335491            1 L004
## purep53     GSM1545538_purep53.txt Basal 57160817            1 L004
## JMS8-2       GSM1545539_JMS8-2.txt Basal 51368625            1 L006
## JMS8-3       GSM1545540_JMS8-3.txt    ML 75795034            1 L006
## JMS8-4       GSM1545541_JMS8-4.txt    LP 60517657            1 L006
## JMS8-5       GSM1545542_JMS8-5.txt Basal 55086324            1 L006
## JMS9-P7c   GSM1545544_JMS9-P7c.txt    ML 21311068            1 L008
## JMS9-P8c   GSM1545545_JMS9-P8c.txt    LP 19958838            1 L008
## 
## $counts
##            Samples
## Tags        10_6_5_11 9_6_5_11 purep53 JMS8-2 JMS8-3 JMS8-4 JMS8-5 JMS9-P7c JMS9-P8c
##   497097            1        2     342    526      3      3    535        2        0
##   100503874         0        0       5      6      0      0      5        0        0
##   100038431         0        0       0      0      0      0      1        0        0
##   19888             0        1       0      0     17      2      0        1        0
##   20671             1        1      76     40     33     14     98       18        8
## 27174 more rows ...
## 
## $genes
##    ENTREZID  SYMBOL TXCHROM
## 1    497097    Xkr4    chr1
## 2 100503874 Gm19938    <NA>
## 3 100038431 Gm10568    <NA>
## 4     19888     Rp1    chr1
## 5     20671   Sox17    chr1
## 27174 more rows ...

4 Data pre-processing

cpm <- cpm(x)
lcpm <- cpm(x, log=TRUE)

table(rowSums(x$counts==0)==9)

## 
## FALSE  TRUE 
## 22026  5153

keep.exprs <- rowSums(cpm>1)>=3
x <- x[keep.exprs,, keep.lib.sizes=FALSE]
dim(x)

## [1] 14165     9

library(RColorBrewer)
nsamples <- ncol(x)
col <- brewer.pal(nsamples, "Paired")
par(mfrow=c(1,2))
plot(density(lcpm[,1]), col=col[1], lwd=2, ylim=c(0,0.21), las=2, 
     main="", xlab="")
title(main="A. Raw data", xlab="Log-cpm")
abline(v=0, lty=3)
for (i in 2:nsamples){
 den <- density(lcpm[,i])
 lines(den$x, den$y, col=col[i], lwd=2)
}
legend("topright", samplenames, text.col=col, bty="n")
lcpm <- cpm(x, log=TRUE)
plot(density(lcpm[,1]), col=col[1], lwd=2, ylim=c(0,0.21), las=2, 
     main="", xlab="")
title(main="B. Filtered data", xlab="Log-cpm")
abline(v=0, lty=3)
for (i in 2:nsamples){
   den <- density(lcpm[,i])
   lines(den$x, den$y, col=col[i], lwd=2)
}
legend("topright", samplenames, text.col=col, bty="n")

x <- calcNormFactors(x, method = "TMM")
x$samples$norm.factors

## [1] 0.896 1.035 1.044 1.041 1.032 0.922 0.984 1.083 0.979

x2 <- x
x2$samples$norm.factors <- 1
x2$counts[,1] <- ceiling(x2$counts[,1]*0.05)
x2$counts[,2] <- x2$counts[,2]*5

par(mfrow=c(1,2))
lcpm <- cpm(x2, log=TRUE)
boxplot(lcpm, las=2, col=col, main="")
title(main="A. Example: Unnormalised data",ylab="Log-cpm")
x2 <- calcNormFactors(x2)  
x2$samples$norm.factors

## [1] 0.0547 6.1306 1.2293 1.1705 1.2149 1.0562 1.1459 1.2613 1.1170

lcpm <- cpm(x2, log=TRUE)
boxplot(lcpm, las=2, col=col, main="")
title(main="B. Example: Normalised data",ylab="Log-cpm")

lcpm <- cpm(x, log=TRUE)
par(mfrow=c(1,2))
col.group <- group
levels(col.group) <-  brewer.pal(nlevels(col.group), "Set1")
col.group <- as.character(col.group)
col.lane <- lane
levels(col.lane) <-  brewer.pal(nlevels(col.lane), "Set2")
col.lane <- as.character(col.lane)
plotMDS(lcpm, labels=group, col=col.group)
title(main="A. Sample groups")
plotMDS(lcpm, labels=lane, col=col.lane, dim=c(3,4))
title(main="B. Sequencing lanes")

lcpm <- cpm(x, log=TRUE)
par(mfrow=c(1,2))
col.group <- group
levels(col.group) <- brewer.pal(nlevels(col.group), "Set1")
col.group <- as.character(col.group)
col.lane <- lane
levels(col.lane) <- brewer.pal(nlevels(col.lane), "Set2")
col.lane <- as.character(col.lane)
plotMDS(lcpm, labels=group, col=col.group)
title(main="A. Sample groups")

plotMDS(lcpm, labels=lane, col=col.lane, dim=c(3,4))
title(main="B. Sequencing lanes")

library(Glimma)
glMDSPlot(lcpm, labels=paste(group, lane, sep="_"), groups=x$samples[,c(2,5)],
          launch=FALSE)

Link to Interactive MDS plot

5 Differential expression analysis

design <- model.matrix(~0+group+lane)
colnames(design) <- gsub("group", "", colnames(design))
design

##   Basal LP ML laneL006 laneL008
## 1     0  1  0        0        0
## 2     0  0  1        0        0
## 3     1  0  0        0        0
## 4     1  0  0        1        0
## 5     0  0  1        1        0
## 6     0  1  0        1        0
## 7     1  0  0        1        0
## 8     0  0  1        0        1
## 9     0  1  0        0        1
## attr(,"assign")
## [1] 1 1 1 2 2
## attr(,"contrasts")
## attr(,"contrasts")$group
## [1] "contr.treatment"
## 
## attr(,"contrasts")$lane
## [1] "contr.treatment"

contr.matrix <- makeContrasts(
   BasalvsLP = Basal-LP, 
   BasalvsML = Basal - ML, 
   LPvsML = LP - ML, 
   levels = colnames(design))
contr.matrix

##           Contrasts
## Levels     BasalvsLP BasalvsML LPvsML
##   Basal            1         1      0
##   LP              -1         0      1
##   ML               0        -1     -1
##   laneL006         0         0      0
##   laneL008         0         0      0

par(mfrow=c(1,2))

v <- voom(x, design, plot=TRUE)
v

## An object of class "EList"
## $genes
##    ENTREZID SYMBOL TXCHROM
## 1    497097   Xkr4    chr1
## 6     27395 Mrpl15    chr1
## 7     18777 Lypla1    chr1
## 9     21399  Tcea1    chr1
## 10    58175  Rgs20    chr1
## 14160 more rows ...
## 
## $targets
##                              files group lib.size norm.factors lane
## 10_6_5_11 GSM1545535_10_6_5_11.txt    LP 29409426        0.896 L004
## 9_6_5_11   GSM1545536_9_6_5_11.txt    ML 36528591        1.035 L004
## purep53     GSM1545538_purep53.txt Basal 59598629        1.044 L004
## JMS8-2       GSM1545539_JMS8-2.txt Basal 53382070        1.041 L006
## JMS8-3       GSM1545540_JMS8-3.txt    ML 78175314        1.032 L006
## JMS8-4       GSM1545541_JMS8-4.txt    LP 55762781        0.922 L006
## JMS8-5       GSM1545542_JMS8-5.txt Basal 54115150        0.984 L006
## JMS9-P7c   GSM1545544_JMS9-P7c.txt    ML 23043111        1.083 L008
## JMS9-P8c   GSM1545545_JMS9-P8c.txt    LP 19525423        0.979 L008
## 
## $E
##         Samples
## Tags     10_6_5_11 9_6_5_11 purep53 JMS8-2 JMS8-3 JMS8-4 JMS8-5 JMS9-P7c JMS9-P8c
##   497097     -4.29    -3.87    2.52   3.30  -4.48  -3.99   3.31    -3.20    -5.29
##   27395       3.88     4.40    4.52   4.57   4.32   3.79   3.92     4.35     4.13
##   18777       4.71     5.56    5.40   5.17   5.63   5.08   5.08     5.76     5.15
##   21399       4.78     4.74    5.37   5.13   4.85   4.94   5.16     5.04     4.99
##   58175       3.94     3.29   -1.77  -1.88   2.99   3.36  -2.11     3.14     3.52
## 14160 more rows ...
## 
## $weights
##       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9]
## [1,]  1.18  1.18 20.53 20.98  1.77  1.22 21.13  1.18  1.18
## [2,] 20.88 26.56 31.60 29.66 32.56 26.75 29.79 21.90 17.15
## [3,] 28.00 33.70 34.85 34.46 35.15 33.55 34.52 31.44 25.23
## [4,] 27.67 29.60 34.90 34.43 34.84 33.16 34.49 26.14 24.50
## [5,] 19.74 18.66  3.18  2.63 24.19 24.01  2.65 13.15 14.35
## 14160 more rows ...
## 
## $design
##   Basal LP ML laneL006 laneL008
## 1     0  1  0        0        0
## 2     0  0  1        0        0
## 3     1  0  0        0        0
## 4     1  0  0        1        0
## 5     0  0  1        1        0
## 6     0  1  0        1        0
## 7     1  0  0        1        0
## 8     0  0  1        0        1
## 9     0  1  0        0        1
## attr(,"assign")
## [1] 1 1 1 2 2
## attr(,"contrasts")
## attr(,"contrasts")$group
## [1] "contr.treatment"
## 
## attr(,"contrasts")$lane
## [1] "contr.treatment"

vfit <- lmFit(v, design)
vfit <- contrasts.fit(vfit, contrasts=contr.matrix)
efit <- eBayes(vfit)
plotSA(efit)

summary(decideTests(efit))

##        BasalvsLP BasalvsML LPvsML
## Down        4127      4338   2895
## NotSig      5740      5655   8825
## Up          4298      4172   2445

tfit <- treat(vfit, lfc=1)
dt <- decideTests(tfit)
summary(dt)

##        BasalvsLP BasalvsML LPvsML
## Down        1417      1512    203
## NotSig     11030     10895  13780
## Up          1718      1758    182

de.common <- which(dt[,1]!=0 & dt[,2]!=0)
length(de.common)

## [1] 2409

head(tfit$genes$SYMBOL[de.common], n=20)

##  [1] "Xkr4"          "Rgs20"         "Cpa6"          "Sulf1"         "Eya1"         
##  [6] "Msc"           "Sbspon"        "Pi15"          "Crispld1"      "Kcnq5"        
## [11] "Ptpn18"        "Arhgef4"       "2010300C02Rik" "Aff3"          "Npas2"        
## [16] "Tbc1d8"        "Creg2"         "Il1r1"         "Il18r1"        "Il18rap"

vennDiagram(dt[,1:2], circle.col=c("turquoise", "salmon"))

write.fit(tfit, dt, file="results.txt")

basal.vs.lp <- topTreat(tfit, coef=1, n=Inf)
basal.vs.ml <- topTreat(tfit, coef=2, n=Inf)
head(basal.vs.lp)

##        ENTREZID SYMBOL TXCHROM logFC AveExpr     t  P.Value adj.P.Val
## 12759     12759    Clu   chr14 -5.44    8.86 -33.4 3.99e-10   2.7e-06
## 53624     53624  Cldn7   chr11 -5.51    6.30 -32.9 4.50e-10   2.7e-06
## 242505   242505  Rasef    chr4 -5.92    5.12 -31.8 6.06e-10   2.7e-06
## 67451     67451   Pkp2   chr16 -5.72    4.42 -30.7 8.01e-10   2.7e-06
## 228543   228543   Rhov    chr2 -6.25    5.49 -29.5 1.11e-09   2.7e-06
## 70350     70350  Basp1   chr15 -6.07    5.25 -28.6 1.38e-09   2.7e-06

head(basal.vs.ml)

##        ENTREZID  SYMBOL TXCHROM logFC AveExpr     t  P.Value adj.P.Val
## 242505   242505   Rasef    chr4 -6.51    5.12 -35.5 2.57e-10  1.92e-06
## 53624     53624   Cldn7   chr11 -5.47    6.30 -32.5 4.98e-10  1.92e-06
## 12521     12521    Cd82    chr2 -4.67    7.07 -31.8 5.80e-10  1.92e-06
## 71740     71740 Nectin4    chr1 -5.56    5.17 -31.3 6.76e-10  1.92e-06
## 20661     20661   Sort1    chr3 -4.91    6.71 -31.2 6.76e-10  1.92e-06
## 15375     15375   Foxa1   chr12 -5.75    5.63 -28.3 1.49e-09  2.28e-06

plotMD(tfit, column=1, status=dt[,1], main=colnames(tfit)[1], 
       xlim=c(-8,13))

glMDPlot(tfit, coef=1, status=dt[,1], main=colnames(tfit)[1],
   counts=x$counts, samples=colnames(x), anno=x$genes, groups=group,
   side.main="ENTREZID", display.columns=c("SYMBOL", "ENTREZID"), 
   search.by="SYMBOL", launch=FALSE)

Link to Interactive MD plot

library(gplots)
basal.vs.lp.topgenes <- basal.vs.lp$ENTREZID[1:100]
i <- which(v$genes$ENTREZID %in% basal.vs.lp.topgenes)
mycol <- colorpanel(1000,"blue","white","red")
heatmap.2(v$E[i,], scale="row",
   labRow=v$genes$SYMBOL[i], labCol=group, 
   col=mycol, trace="none", density.info="none", 
   margin=c(8,6), lhei=c(2,10), dendrogram="column")

6 Gene set testing with camera

load(url("http://bioinf.wehi.edu.au/software/MSigDB/mouse_c2_v5p1.rdata"))
idx <- ids2indices(Mm.c2,id=rownames(v))
cam.BasalvsLP <- camera(v,idx,design,contrast=contr.matrix[,1],
                        inter.gene.cor=0.01)
head(cam.BasalvsLP,5)

##                                             NGenes Direction   PValue      FDR
## LIM_MAMMARY_STEM_CELL_UP                       739        Up 1.13e-18 5.36e-15
## LIM_MAMMARY_STEM_CELL_DN                       630      Down 1.57e-15 3.71e-12
## ROSTY_CERVICAL_CANCER_PROLIFERATION_CLUSTER    163        Up 1.44e-13 2.26e-10
## SOTIRIOU_BREAST_CANCER_GRADE_1_VS_3_UP         183        Up 2.18e-13 2.58e-10
## LIM_MAMMARY_LUMINAL_PROGENITOR_UP               87      Down 6.73e-13 6.36e-10

cam.BasalvsML <- camera(v,idx,design,contrast=contr.matrix[,2],
                        inter.gene.cor=0.01)
head(cam.BasalvsML,5)

##                                             NGenes Direction   PValue      FDR
## LIM_MAMMARY_STEM_CELL_UP                       739        Up 5.09e-23 2.40e-19
## LIM_MAMMARY_STEM_CELL_DN                       630      Down 5.13e-19 1.21e-15
## LIM_MAMMARY_LUMINAL_MATURE_DN                  166        Up 8.88e-16 1.40e-12
## LIM_MAMMARY_LUMINAL_MATURE_UP                  180      Down 6.29e-13 7.43e-10
## ROSTY_CERVICAL_CANCER_PROLIFERATION_CLUSTER    163        Up 1.68e-12 1.59e-09

cam.LPvsML <- camera(v,idx,design,contrast=contr.matrix[,3],
                     inter.gene.cor=0.01)
head(cam.LPvsML,5)

##                                         NGenes Direction   PValue      FDR
## LIM_MAMMARY_LUMINAL_MATURE_UP              180      Down 8.50e-14 3.40e-10
## LIM_MAMMARY_LUMINAL_MATURE_DN              166        Up 1.44e-13 3.40e-10
## LIM_MAMMARY_LUMINAL_PROGENITOR_UP           87        Up 3.84e-11 6.05e-08
## REACTOME_RESPIRATORY_ELECTRON_TRANSPORT     91      Down 2.66e-08 3.14e-05
## NABA_CORE_MATRISOME                        222        Up 4.43e-08 4.19e-05

barcodeplot(efit$t[,3], index=idx$LIM_MAMMARY_LUMINAL_MATURE_UP, 
            index2=idx$LIM_MAMMARY_LUMINAL_MATURE_DN, main="LPvsML")

7 Software and code used

sessionInfo()

## R version 3.6.0 (2019-04-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/openblas-base/libblas.so.3
## LAPACK: /usr/lib/libopenblasp-r0.2.18.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
##  [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
## [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods  
## [9] base     
## 
## other attached packages:
##  [1] knitr_1.22                               RNAseq123_1.8.0                         
##  [3] Mus.musculus_1.3.1                       TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.4
##  [5] org.Mm.eg.db_3.8.2                       GO.db_3.8.2                             
##  [7] OrganismDbi_1.26.0                       GenomicFeatures_1.34.7                  
##  [9] GenomicRanges_1.36.0                     GenomeInfoDb_1.20.0                     
## [11] AnnotationDbi_1.46.0                     IRanges_2.18.0                          
## [13] S4Vectors_0.22.0                         Biobase_2.44.0                          
## [15] BiocGenerics_0.30.0                      RColorBrewer_1.1-2                      
## [17] edgeR_3.26.1                             limma_3.40.0                            
## [19] Glimma_1.12.0                            R.utils_2.8.0                           
## [21] R.oo_1.22.0                              R.methodsS3_1.7.1                       
## [23] gplots_3.0.1.1                           BiocStyle_2.12.0                        
## 
## loaded via a namespace (and not attached):
##  [1] httr_1.4.0                  bit64_0.9-7                 jsonlite_1.6               
##  [4] gtools_3.8.1                assertthat_0.2.1            BiocManager_1.30.4         
##  [7] RBGL_1.60.0                 blob_1.1.1                  GenomeInfoDbData_1.2.1     
## [10] Rsamtools_2.0.0             yaml_2.2.0                  progress_1.2.1             
## [13] RSQLite_2.1.1               lattice_0.20-38             digest_0.6.18              
## [16] XVector_0.24.0              htmltools_0.3.6             Matrix_1.2-17              
## [19] XML_3.98-1.19               pkgconfig_2.0.2             biomaRt_2.40.0             
## [22] bookdown_0.10               zlibbioc_1.30.0             gdata_2.18.0               
## [25] BiocParallel_1.18.0         SummarizedExperiment_1.14.0 magrittr_1.5               
## [28] crayon_1.3.4                memoise_1.1.0               evaluate_0.13              
## [31] graph_1.62.0                tools_3.6.0                 prettyunits_1.0.2          
## [34] hms_0.4.2                   matrixStats_0.54.0          stringr_1.4.0              
## [37] locfit_1.5-9.1              DelayedArray_0.10.0         Biostrings_2.52.0          
## [40] compiler_3.6.0              caTools_1.17.1.2            rlang_0.3.3                
## [43] grid_3.6.0                  RCurl_1.95-4.12             bitops_1.0-6               
## [46] rmarkdown_1.12              DBI_1.0.0                   R6_2.4.0                   
## [49] GenomicAlignments_1.20.0    rtracklayer_1.44.0          bit_1.1-14                 
## [52] KernSmooth_2.23-15          stringi_1.4.3               Rcpp_1.0.1                 
## [55] xfun_0.6