Hello,
In an effort to get a reliable list of DEGs, I ran CuffDiff, EdgeR, and DESeq2. I though I would grab the overlapping genes and be good to go. However, the DEG lists outputted by each were quite different. Because of this, we decided to also write a GLMM that would be more transparent and easy to understand. This GLMM also gave a fairly different list (with some overlap of course).
The group I'm working with is inclined to go with the GLMM since we (the statistician in the group) knows what it is doing. My main worry is the CuffLinks, EdgeR, and DESeq2 are making some corrections to account for the biology of RNAseq data that we may not understand or incorporate into our GLM.
What important ways do CuffLinks, EdgeR, and DESeq2 diverge from a GLM? We're using glmer.nb with offset=log(librarysize).
Thank you,
-James
Details:
We have 14 samples done in triplicates.
Read coverage is ranges from 16-55 million 100bp PE reads per sample.
Aligned to mm10 with Tophat in cufflinks. 83-91% reads mapped.
Used tophat alignments with HTSeq-count to get tables for DESeq2, EdgeR, and GLMM.
Did 7 pairwise contrasts between treatments.
I attached a couple of images.
The venn diagrams show the overlap in DEG lists outputted by the 4 methods for the 7 different contrasts.
Second image shows MC plots for each contrast. X-axis is log2 of the mean htseq counts for all the genes. Y-axis is log2 fold change of treatment vs. mock.
In an effort to get a reliable list of DEGs, I ran CuffDiff, EdgeR, and DESeq2. I though I would grab the overlapping genes and be good to go. However, the DEG lists outputted by each were quite different. Because of this, we decided to also write a GLMM that would be more transparent and easy to understand. This GLMM also gave a fairly different list (with some overlap of course).
The group I'm working with is inclined to go with the GLMM since we (the statistician in the group) knows what it is doing. My main worry is the CuffLinks, EdgeR, and DESeq2 are making some corrections to account for the biology of RNAseq data that we may not understand or incorporate into our GLM.
What important ways do CuffLinks, EdgeR, and DESeq2 diverge from a GLM? We're using glmer.nb with offset=log(librarysize).
Thank you,
-James
Details:
We have 14 samples done in triplicates.
Read coverage is ranges from 16-55 million 100bp PE reads per sample.
Aligned to mm10 with Tophat in cufflinks. 83-91% reads mapped.
Used tophat alignments with HTSeq-count to get tables for DESeq2, EdgeR, and GLMM.
Did 7 pairwise contrasts between treatments.
I attached a couple of images.
The venn diagrams show the overlap in DEG lists outputted by the 4 methods for the 7 different contrasts.
Second image shows MC plots for each contrast. X-axis is log2 of the mean htseq counts for all the genes. Y-axis is log2 fold change of treatment vs. mock.
Comment