SEQanswers DESq2 lrt with multiple factors and batch effect
krausezuhause
DESeq2 lrt with multiple factors and batch effect

Dear all,
I have a question concerning a multiple factor analysis with a batch effect reflecting the day of the library preparation (2 dates). I am using the likrlihood ratio test in DESeq2. My variable of interest is a continious variable indicating how much a person is exposed. Further I want to control for possible confounders sex, age and BMI.

My first question would be if the design makes sense:

Code:
```dds <- DESeqDataSetFromMatrix(countData = MyCounts,
colData = MyData,
design = ~libbatch + sex + age + BMI + Exposure)

dds<-DESeq(dds,test= "LRT",full = design(dds),reduced = ~libbatch)

res<-results(dds,name = "Exposure" ,pAdjustMethod = "fdr")```
or would I need to do something like this:

Code:
`dds<-DESeq(dds,test= "LRT",full = design(dds),reduced = ~libbatch + sex + age + BMI +)`
And the second question concerning the results:
Why are so many NAs among the adjusted pvalues? and why are many of them equal?

Quote:
 res<-results(dds,name = "Expo.delta" ,pAdjustMethod = "fdr") baseMean log2FoldChange lfcSE stat pvalue padj gene1 13009.48564 -0.0005561894 0.001880162 25.89735 0.0002326616 0.01561069 gene2 163.28590 -0.0043968404 0.003520945 25.21172 0.0003119647 0.01561069 gene4 88.93107 -0.0074868961 0.006939026 26.88819 0.0001519605 0.01561069 gene5 121.15589 -0.0092059826 0.004727699 25.50741 0.0002749380 0.01561069 ... ... ... ... ... ... ... genex 24.22494 -0.0117729650 0.011910591 5.048386 0.5376224 NA geney 23.03576 0.0070158920 0.010191693 5.260325 0.5108840 NA

 Your reduced design should be "~libbatch + sex + age + BMI", though I'm curious why you explicitly want an LRT. The NAs are probably due to independent filtering. I'd have to look up which method the "fdr" correction is using, I only ever use the standard BH method.
 I did not know you can also correct for a batch effect using the Wald test. How would the model look like then? the reduced model is ignored in a Wald test.
 Your full design is the same for Wald and LRT, only the latter needs a reduced design. Code: ```dds <- DESeqDataSetFromMatrix(countData = MyCounts, colData = MyData, design = ~libbatch + sex + age + BMI + Exposure) dds<-DESeq(dds) res <- resuls(dds) # I think this will default to Exposure, being the last variable in the design```
krausezuhause
krausezuhause
Junior Member

Location: Netherlands

Join Date: May 2013
Posts: 5

and what would be the interpretation of this model?

Quote:
 dds <- DESeqDataSetFromMatrix(countData = MyCounts, colData = MyData, design = ~libbatch + sex + age + BMI + Exposure) dds<-DESeq(dds,test= "LRT",full = design(dds),reduced = ~libbatch)
Only correcting for batch effect and the other confounders are ignored?

 ? It's the same model, you're still correcting for the batch and accounting for changes in the confounders. You're just getting your p-value according to whether the log2FC of "Exposure" is different from 0 (Wald test) as opposed to whether the full or reduced models fit better (LRT).
 So If I understand correctly now, with the LRT above I account for batch and confounders but can not relate the significant genes to exposure? I get about 300 significant hits with the LRT (reduced = ~libbatch) model, but 0 hits with the Wald or LRT (reduced = ~libbatch + sex + age + BMI )
 Your confounders are masking any effect of exposure. Sorry your results didn't turn out better.
 Many thanks for the explanation!

