EdgeR

From SEQwiki
Jump to: navigation, search

Application data

Created by Robinson MD, Smyth GK, McCarthy DJ
Biological application domain(s) RNA-Seq, RNA-Seq quantification, ChIP-seq, Gene expression analysis, DNA methylation
Principal bioinformatics method(s) Statistical calculation
Technology Illumina, ABI SOLiD, 454, Any
Created at The Walter + Eliza Hall Institute, Melbourne, Australia
Maintained? Yes
Input format(s) Table with count data
Output format(s) Table
Programming language(s) R
Licence LGPL
Operating system(s) Windows, Mac OS X, UNIX

Summary: edgeR is an R/Bioconductor software package for statistical analysis of replicated count data. Methods are designed for assessing differential expression in comparative RNA-Seq experiments, but are generally applicable to count data from other genome-scale platforms (ChIP-seq, MeDIP-Seq, Tag-Seq, SAGE-Seq etc).

Description

edgeR is an R/Bioconductor package that provides methods for the statistical analysis of count data from comparative experiments on high-throughput sequencing platforms. Particular attention is given to designed multi-factor experiments and to experiments with minimal replication.

The package provides methods for assessing differential expression in RNA-Seq, Tag-Seq, SAGE-Seq and other digital gene expression experiments. RNA-Seq is the most common source of gene expression count data (or digital gene expression data), but the methods implemented in edgeR are general and can also be used with ChIP-seq and other genome-scale count data.

Over-dispersed Poisson count models (primarily the negative binomial) are used to distinguish biological from technical variation. We use generalized linear modelling to handle complex multi-factor experiments. Information sharing techniques ensure rigorous results even for experiments with minimal biological replication.

Through the use of the negative binomial distribution to model transcript counts we allow the possibility of gene-specific variability, whereby some genes may show more biological variability than others. A measure of this, the biological coefficient of variation (BCV), is inferred from how much the variance of the counts exceeds the variance that would arise from Poisson counts.

For simple, one-way layout experimental designs (e.g. pairwise comparisons between treatments) conditional likelihood and weighted conditional likelihood are used for estimation of the NB model parameters. This approach permits an exact test to generate p-values for assessing differential expression. Information sharing is used so that genes may take individual values for the BCV, but stabilized towards a common BCV value. This approach profoundly improves inference in small sample experiments.

For more complex experimental designs we use generalized linear models (GLMs) with the negative binomial distribution to conduct inference on differential expression. A GLM with the same set of explanatory variables but possibly different BCV is fit to the counts for each gene. Cox-Reid adjusted profile likelihood, a well-respected method for adjusting for bias when estimating the variance parameters in non-linear models, is used to estimate the BCV values. Information sharing methods are also applied in the GLM setting. A likelihood ratio test is used to compute p-values for differential expression. Extensive optimization of GLM-fitting routines allows tens of thousands of GLMs to be fit in edgeR in a matter of seconds.

With integration of the edgeR package with another R/Bioconductor package called GOSeq, differential expression results from edgeR can be related easily to existing annotation databases such as Gene Ontology or the Molecular Signatures Database, while accounting for gene-length bias on differential expression.

All in all, edgeR represents a very powerful and flexible modular pipeline for the statistical analysis of comparative RNA-Seq experiments and other genome-scale count data.




Links


References

  1. Robinson MD, Smyth GK. 2008. Biostatistics
  2. Robinson MD, Smyth GK. 2007. Bioinformatics
  3. Robinson MD, McCarthy DJ, Smyth GK.. 2010. Bioinformatics
  4. Robinson MD, Oshlack A.. 2010. Genome Biology
  5. McCarthy, D. J. Chen, Y. Smyth, G. K.. 2012. Nucleic Acids Research


To add a reference for EdgeR, enter the PubMed ID in the field below and click 'Add'.


[ edit box ]

Search for "EdgeR" in the SEQanswers forum / BioStar or:

Web Search Wiki Sites Scientific
Personal tools
Namespaces

Variants
Actions
wiki navigation
Software
Toolbox
vBSSO Login Form

Register
Reset Password
Single Sign On provided by vBSSO