SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   RNA Sequencing (http://seqanswers.com/forums/forumdisplay.php?f=26)
-   -   Use of test="Wald" in the results of a time-course experiment (http://seqanswers.com/forums/showthread.php?t=62743)

EVELE 09-16-2015 03:51 AM

Use of test="Wald" in the results of a time-course experiment
 
Hi all,

Concerning the example of the manual :
http://bioconductor.org/help/workflows/rnaseqGene/#time
in the "Time Course Experiments" paragraph,

library("fission")
data("fission")
ddsTC <- DESeqDataSet(fission, ~ strain + minute + strain:minute)
ddsTC <- DESeq(ddsTC, test="LRT", reduced = ~ strain + minute)

resultsNames(ddsTC)

## [1] "Intercept" "strain_mut_vs_wt" "minute_15_vs_0" "minute_30_vs_0"
## [5] "minute_60_vs_0" "minute_120_vs_0" "minute_180_vs_0" "strainmut.minute15"
## [9] "strainmut.minute30" "strainmut.minute60" "strainmut.minute120" "strainmut.minute180"

res30 <- results(ddsTC, name="strainmut.minute30", test="Wald")
res30[which.min(resTC$padj),]

## log2 fold change (MLE): strainmut.minute30
## Wald test p-value: strainmut.minute30
## DataFrame with 1 row and 6 columns
## baseMean log2FoldChange lfcSE stat pvalue padj
## <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
## SPBC2F12.09c 174.6712 -2.601034 0.6314737 -4.11899 3.805364e-05 0.2572426

When should we use the test="Wald" in the results function?

What are the results I get when I write:
res30 <- results(ddsTC, name="strainmut.minute30")
and what is the difference in the results by adding test="Wald"?
res30 <- results(ddsTC, name="strainmut.minute30", test="Wald")?

Thanks in advance for any ideas

dpryan 09-16-2015 06:35 AM

Normally you get a Wald test unless you specify otherwise.

EVELE 09-16-2015 07:15 AM

It is not the same.
I can see differences when I apply both ways to my data.

Common part of code:
> times=factor(c(1,rep(2,2),rep(3,2),0,1,rep(2,2),rep(3,2),0), levels=c(0,1,2,3))
> condition = factor(c(rep("factors", 6), rep("controls",6)), levels=c("controls","factors"))

> colData = data.frame( condition=condition, times=times)
> rownames(colData)=colnames(duo)

> dds<-DESeqDataSetFromMatrix(countData= as.matrix(duo), colData= colData ,design=formula(~condition+times+condition:times))
> dds <- DESeq(dds, test="LRT", reduced = ~ condition + times)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing

> resultsNames(dds)
[1] "Intercept" "condition_factors_vs_controls"
[3] "times_1_vs_0" "times_2_vs_0"
[5] "times_3_vs_0" "conditionfactors.times1"
[7] "conditionfactors.times2" "conditionfactors.times3"


The way without test="Wald"

> res1 <- results(dds, name="conditionfactors.times1")
> res2 <- results(dds, name="conditionfactors.times2")
> res3 <- results(dds, name="conditionfactors.times3")


# Here I filter padj <= 0.05
> res1 <- res1[!is.na(res1$padj) & res1$padj <=0.05,]
> res2 <- res2[!is.na(res2$padj) & res2$padj <=0.05,]
> res3 <- res3[!is.na(res3$padj) & res3$padj <=0.05,]

For the gene PTET.51.1.G0050374 I obtain:

> res1["PTET.51.1.G0050374",]

log2 fold change (MLE): conditionfactors.times1
LRT p-value: '~ condition + times + condition:times' vs '~ condition + times'
DataFrame with 1 row and 6 columns
baseMean log2FoldChange lfcSE stat pvalue
<numeric> <numeric> <numeric> <numeric> <numeric>
PTET.51.1.G0050374 481.4894 5.916279 0.5890732 140.9655 2.340315e-30
padj
<numeric>
PTET.51.1.G0050374 6.717171e-26



The way with test="Wald"

> res1W <- results(dds, name="conditionfactors.times1",test="Wald")
> res2W <- results(dds, name="conditionfactors.times2",test="Wald")
> res3W <- results(dds, name="conditionfactors.times3",test="Wald")

> res1W <- res1W[!is.na(res1W$padj) & res1W$padj <=0.05,]
> res2W <- res2W[!is.na(res2W$padj) & res2W$padj <=0.05,]
> res3W <- res3W[!is.na(res3W$padj) & res3W$padj <=0.05,]

For the same gene I obtain different values in stat pvalue and padj columns

> res1W["PTET.51.1.G0050374",]

log2 fold change (MLE): conditionfactors.times1
Wald test p-value: conditionfactors.times1
DataFrame with 1 row and 6 columns
baseMean log2FoldChange lfcSE stat pvalue
<numeric> <numeric> <numeric> <numeric> <numeric>
PTET.51.1.G0050374 481.4894 5.916279 0.5890732 10.04337 9.826083e-24
padj
<numeric>
PTET.51.1.G0050374 1.210279e-19

Furthermore, the number of genes (after applying the same filter) vary between res1W, res2W and res3W
> dim(res1W)[1]
[1] 46
> dim(res2W)[1]
[1] 2
> dim(res3W)[1]
[1] 345


while the numbers are the same for res1, res2, res3 (without test="Wald")

> dim(res1)[1]
[1] 780
> dim(res2)[1]
[1] 780
> dim(res3)[1]
[1] 780

I cannot see what's happening here, but I'd like to know which of the 2 methods I should use in order to obtain all genes having a time-specific differential expression and a padj value <=0.05 in my time-course experiment.

Thank you

EVELE 09-16-2015 07:35 AM

Actually when you don't put test="Wald" I can see that it is a LRT applied and we get LRT pvalues while when putting test="Wald" it is the Wald test pvalues we obtain.

I don't know which approach would give me what I want but I tend to believe in the LRT method for my case (genes with time-specific differential expression).

In other words, I think I should not put test="Wald".

dpryan 09-16-2015 08:25 AM

Ah, I had missed that you explicitly requested an LRT with DESeq(). The default is Wald there too. The models that you're testing are completely different between the two methods. With a Wald test, you can test individual coefficient, with the LRT you're testing a more generic interaction.

Edit: Yes, an LRT is likely what you want, since it asks the question, "what genes generally interact in a time:condition dependent manner, regardless of the exact time point".

EVELE 09-16-2015 08:41 AM

In the manual link I sent, it goes like that:

Wald tests for the log2 fold changes at individual time points can be investigated using the test argument to results:

res30 <- results(ddsTC, name="strainmut.minute30", test="Wald")

which seems to be consistent with what you say "With a Wald test, you can test individual coefficient".
However, I remain with my questions because I cannot really see what a Wald Test and LRT test do.

For the moment, I will just apply an LRT test and I'll try to read some things about Wald and LRT tests.

Thank you Devon


All times are GMT -8. The time now is 12:00 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.