Seqanswers Leaderboard Ad

**Michael Love** · 01-13-2015, 06:41 AM

hi Ben,

DESeq2 with results() and no arguments is contrasting the last over the first level of condition. As we say in the vignette, if you do not specify these levels, they are chosen alphabetically.

The best approach when you have anything other than a simple two group comparison is to use the contrast argument of results() to specify exactly what you want to compare, and the software does the rest for you, e.g.:

results(dds, contrast=c("condition","Psa","Control"))

So it appears for DESeq2 you now are contrasting PsaD over Control. (It will also tell you the contrast that was performed at the top of the results object when printed to the console, and in mcols(res).)

edgeR also uses R's factor and model.matrix, so here you should have alphabetically chosen levels unless you specify otherwise. So for edgeR it looks like you are contrasting Sdw over Psa. Is Sdw the same as Control?

So it seems like you are not making the same comparison.

**tirohia** · 01-13-2015, 03:34 PM

Apologies, that was a copy/paste error on my part.
The DESeq comparisons that I've made are:

Code:

results <- results(dds,contrast=c("condition","Control","Psa"), lfcThreshold=2, altHypothesis="greaterAbs")
and
results <- results(dds,contrast=c("condition","Control","Psa"), lfcThreshold=2, altHypothesis="greaterAbs",independentFiltering=FALSE,cooksCutoff=FALSE)

The first of which gives 48 genes, the second gives 57.

The EdgeR comparison that I've made is

Code:

group <- factor(c("Psa","Psa","Psa","PsaD","PsaD","PsaD","Sdw","Sdw","Sdw","SdwD","SdwD","SdwD"))
...
lrt <- glmLRT(fit, contrast=c(-1,0,1,0))

Which, again, unless I'm reading the EdgeR manual incorrectly, should be comparing Psa and Sdw, being the first and third components of the design matrix.

Sdw is for Sterile distilled water - yes, it is the control. I probably should keep the naming conventions the same across the methods to avoid confusing - I'll change that.

**Michael Love** · 01-14-2015, 07:00 AM

Ok, so from that code it looks like you are testing the same contrast. However, the tests are not exactly the same. In DESeq2 the p-value is from a test that the absolute value of the LFC is greater than 2, while in edgeR the p-value is from a test that the absolute value of the LFC is greater than 0. For comparison can you try

Code:

res = results(dds, contrast=c("condition","Control","Psa"), addMLE=TRUE)
sum(res$padj < .05 & abs(res$lfcMLE) > 2, na.rm=TRUE)

**tirohia** · 01-14-2015, 06:48 PM

I'm assuming that plotMA() will plot using the lfcMLE if it's present over the log2FoldChange?

So in edgeR, to get those 3200 DE genes, I'm using the test, which, as you point out is testing that the p-value that the absolute LFC is >0. I'm then filtering on the logFC:

Code:

tags<- tags[tags$logFC > 2,]
tags<- tags[tags$FDR < 0.1,]

Which I thought was analogous to what I was orginally doing with DESeq2, i.e. testing lfc>0 and then filtering on logfc>2.

Code:

results <- results(dds,contrast=c("condition","Control","Psa"), lfcThreshold=2, altHypothesis="greaterAbs")
sum(res$padj < 0.05,na.rm=TRUE)

(i.e. what I have been doing) gives me 48 genes. This being the testing that the lfc > 2

Code:

res = results(dds, contrast=c("condition","Control","Psa"), addMLE=TRUE)
sum(res$padj < .05 & abs(res$lfcMLE) > 2, na.rm=TRUE)

gives me 506. Does this test that the lfcMLE is >0 or does it test the log2FoldChange > 0 and add in the lfcMLE so that I can filter on that?

Which sounds pretty much as I would expect. Though. Doing it this way, using the lfcMLE, doesn't that introduce back into the analysis all of the high variation associated with low counts, which elsewhere is assiduously done away with? And since EdgeR also takes that into account, wouldn't I now be comparing two (even more) different analyses, one with high variation at low counts and one with low variation at low counts? Or am I missing something?

Ben.

**Michael Love** · 01-15-2015, 06:57 AM

What do lfcThreshold/altHypothesis do? The answer is right at hand. check the man page for ?results:

lfcThreshold - a non-negative value, which specifies the test which should be applied to the log2 fold changes. The standard is a test that the log2 fold changes are not equal to zero. However, log2 fold changes greater or less than lfcThreshold can also be tested....

This is also described in a section on threshold tests in our paper:

Application Unavailable | Springer Nature

http://genomebiology.com/2014/15/12/550

and in the DESeq2 vignette in the section "Tests of log2 fold change above or below a threshold".

Using lfcThreshold means that the p-values are from a much more stringent test of |LFC| > lfcThreshold. It's not simply a filter on large LFCs with a test of LFC = 0.

Now that you are using more comparable steps for edgeR and DESeq2 (a test of LFC = 0 followed by a filter on MLE LFC) the numbers are coming closer together. It looks above like you are using 5% FDR for DESeq2 and 10% FDR for edgeR now.

The tests (so the p-values) from both software pkgs take into account the high variance at low counts, and therefore filtering on p-values would take care of this in both cases.

I'm not recommending that you to filter on lfcMLE, I'm just trying to explain your initial question about why there are many genes from an edgeR test of LFC = 0 and then filtering on |LFC| > 2, while there are far fewer when you test |LFC| > 2. It is because the test is much more stringent.

**tirohia** · 01-15-2015, 08:39 PM

Ahh... that light dawns.

Thanks for the link. I think I've got it all sorted in my head now.

Also, I've just noticed that when I was filtering on the lfc in edger I was filter on lfc>2 and lfc < 2 rather than lfc>2 and lfc < -2. Which was silly and changes things dramatically. Which makes the EdgeR results well within the same ballpark as the DESeq2 test on lfc>0. (Still going to use the banded results).

And entertainingly, all 48 of the genes that I get with DESeq2, banded testing, are also identified by EdgeR, so that's good.

Many thanks.
Ben.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 39 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Large difference in number of genes DESeq and EdgeR detect

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News