SEQanswers DESq2 lrt with multiple factors and batch effect
 Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

 Similar Threads Thread Thread Starter Forum Replies Last Post oselm De novo discovery 1 11-10-2016 07:43 PM ea11 Bioinformatics 5 10-27-2015 08:13 AM emolinari Bioinformatics 5 06-30-2014 09:09 AM Amative Bioinformatics 3 04-28-2013 05:01 PM

01-25-2017, 05:00 AM   #1
krausezuhause
Junior Member

Location: Netherlands

Join Date: May 2013
Posts: 5
DESeq2 lrt with multiple factors and batch effect

Dear all,
I have a question concerning a multiple factor analysis with a batch effect reflecting the day of the library preparation (2 dates). I am using the likrlihood ratio test in DESeq2. My variable of interest is a continious variable indicating how much a person is exposed. Further I want to control for possible confounders sex, age and BMI.

My first question would be if the design makes sense:

Code:
```dds <- DESeqDataSetFromMatrix(countData = MyCounts,
colData = MyData,
design = ~libbatch + sex + age + BMI + Exposure)

dds<-DESeq(dds,test= "LRT",full = design(dds),reduced = ~libbatch)

res<-results(dds,name = "Exposure" ,pAdjustMethod = "fdr")```
or would I need to do something like this:

Code:
`dds<-DESeq(dds,test= "LRT",full = design(dds),reduced = ~libbatch + sex + age + BMI +)`
And the second question concerning the results:
Why are so many NAs among the adjusted pvalues? and why are many of them equal?

Quote:
 res<-results(dds,name = "Expo.delta" ,pAdjustMethod = "fdr") baseMean log2FoldChange lfcSE stat pvalue padj gene1 13009.48564 -0.0005561894 0.001880162 25.89735 0.0002326616 0.01561069 gene2 163.28590 -0.0043968404 0.003520945 25.21172 0.0003119647 0.01561069 gene4 88.93107 -0.0074868961 0.006939026 26.88819 0.0001519605 0.01561069 gene5 121.15589 -0.0092059826 0.004727699 25.50741 0.0002749380 0.01561069 ... ... ... ... ... ... ... genex 24.22494 -0.0117729650 0.011910591 5.048386 0.5376224 NA geney 23.03576 0.0070158920 0.010191693 5.260325 0.5108840 NA

Last edited by krausezuhause; 01-25-2017 at 05:09 AM.

 01-25-2017, 05:18 AM #2 dpryan Devon Ryan   Location: Freiburg, Germany Join Date: Jul 2011 Posts: 3,479 Your reduced design should be "~libbatch + sex + age + BMI", though I'm curious why you explicitly want an LRT. The NAs are probably due to independent filtering. I'd have to look up which method the "fdr" correction is using, I only ever use the standard BH method.
 01-25-2017, 05:36 AM #3 krausezuhause Junior Member   Location: Netherlands Join Date: May 2013 Posts: 5 I did not know you can also correct for a batch effect using the Wald test. How would the model look like then? the reduced model is ignored in a Wald test.
 01-25-2017, 05:41 AM #4 dpryan Devon Ryan   Location: Freiburg, Germany Join Date: Jul 2011 Posts: 3,479 Your full design is the same for Wald and LRT, only the latter needs a reduced design. Code: ```dds <- DESeqDataSetFromMatrix(countData = MyCounts, colData = MyData, design = ~libbatch + sex + age + BMI + Exposure) dds<-DESeq(dds) res <- resuls(dds) # I think this will default to Exposure, being the last variable in the design```
01-25-2017, 05:44 AM   #5
krausezuhause
Junior Member

Location: Netherlands

Join Date: May 2013
Posts: 5

and what would be the interpretation of this model?

Quote:
 dds <- DESeqDataSetFromMatrix(countData = MyCounts, colData = MyData, design = ~libbatch + sex + age + BMI + Exposure) dds<-DESeq(dds,test= "LRT",full = design(dds),reduced = ~libbatch)
Only correcting for batch effect and the other confounders are ignored?

 01-25-2017, 05:57 AM #6 dpryan Devon Ryan   Location: Freiburg, Germany Join Date: Jul 2011 Posts: 3,479 ? It's the same model, you're still correcting for the batch and accounting for changes in the confounders. You're just getting your p-value according to whether the log2FC of "Exposure" is different from 0 (Wald test) as opposed to whether the full or reduced models fit better (LRT).
 01-25-2017, 06:19 AM #7 krausezuhause Junior Member   Location: Netherlands Join Date: May 2013 Posts: 5 So If I understand correctly now, with the LRT above I account for batch and confounders but can not relate the significant genes to exposure? I get about 300 significant hits with the LRT (reduced = ~libbatch) model, but 0 hits with the Wald or LRT (reduced = ~libbatch + sex + age + BMI )
 01-25-2017, 06:58 AM #8 dpryan Devon Ryan   Location: Freiburg, Germany Join Date: Jul 2011 Posts: 3,479 Your confounders are masking any effect of exposure. Sorry your results didn't turn out better.
 01-25-2017, 07:03 AM #9 krausezuhause Junior Member   Location: Netherlands Join Date: May 2013 Posts: 5 Many thanks for the explanation!

 Tags batch effect, deseq2, lrt, rnaseq