-   RNA Sequencing (http://seqanswers.com/forums/forumdisplay.php?f=26)
-   -   multifactor analysis or pairwise comparison? (http://seqanswers.com/forums/showthread.php?t=67902)

 illinu 04-27-2016 04:50 AM

multifactor analysis or pairwise comparison?

I can't find a simple answer to what's the difference between doing multifactor analysis or pairwise comparisons, example:

I have two factors to consider: condition (treatment or control) and genotype (A and B). If I do
1. a pairwise comparison between control and treatment across both genotypes and
2. a pairwise comparison between treatment and control in genotype A and treatment and control in genotype B
I expect the intersection of these two pairwise comparisons to give the same results as if I do mutlifactor analysis taking into account both factors: condition and genotype (in DESeq2 something like design(ddsM) <- formula(~condition*genotype)) but I don't

What's in this case the biological interpretation of both analyses?

 dpryan 04-27-2016 05:29 AM

Your wording is a little confused so it's hard to tell exactly what you're doing in method (1) and (2). How about you just tell us the biological question you want to answer and then one of us can then just tell you the simplest way to do it.

In general, you can get the same answer treating each condition:genotype combination as a separate group and doing comparisons as if you use a more classical factorial design (e.g., "~condition*genotype"). Some questions are simpler to answer with one design vs. the other.

I would strongly encourage you not to do multiple comparisons and then take the intersection of the results. You're absolutely killing your statistical power when you're doing that. You also then lack a fold-change or p-value, which you kind of need to both publishing and follow-up experiments.

 illinu 04-27-2016 06:07 AM

My biological question is how do genotype A and B respond differently to the treatment. I need to take both factors into account. I wan't to know what's the difference between doing multifactor (condition*genotype) or treat vs control in A and treat vs ctl in B and compare the list of genes. Basically if I do the latter I get more genes differentially expressed than in the former.

How do I interepret biologically the results of the two approaches?

 dpryan 04-27-2016 06:11 AM

You'll want to use the ~condition*genotype model and extract the interaction term from it. That will directly address your question. The second method is statistically invalid and not answering any biological question.

 illinu 04-27-2016 06:51 AM

Thank you, I would like to know why the second option is statistically incorrect.

And for sanity could you or someone confirm that this is the correct code in R (DESeq2 package):

Code:

```countData=read.table(file.choose(),header=TRUE,row.names=1,sep="\t") condition=rep(rep(c("Ctr","Treat"),each=3),4) genotype=rep(rep(c("A","B"),each=6),2) colData=data.frame(condition,genotype,row.names=names(countData)) dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData,design = ~ condition+genotype) ddsM=dds design(ddsM) <- formula(~condition*genotype) ddsM <- DESeq(ddsM) resM <- results(ddsM) ddsMN=ddsM ddsMN = estimateSizeFactors(ddsM) ddsMN = estimateDispersions(ddsM) resGenotype=nbinomLRT(ddsMN, full = design(ddsM), reduced =  formula(~condition*genotype), maxit = 1000) resGenotype <- results(resMTolFact, contrast=c("genotype","A","B"))```

 dpryan 04-27-2016 11:03 PM

By intersecting lists you're not measuring anything, just asking, "if I take two sets of results and intersect them, what do I get?" That's not answering the question, "what's the additional effect of condition on genotype B?".

Regarding the code, you can skip the LRT stuff and just use "results(ddsM)" and specify the interaction coefficient. For the actual LRT test, you're using the same formula for the full and reduced design. You want just "reduced = ~condition+genotype". "resMTolFact" is never defined. You can skip all of the "ddsMN" definition stuff.

 illinu 04-28-2016 05:00 AM

Hi Devon,

Thank you for replying. If I understand correct I should avoid the LRT test? I could use the interaction term like so:
resGenotype <- results(ddsM, contrast=list("conditionTreat.genotypeB")) ??

This arises a new question, if I define my conditions variable as:

Code:

` condition=rep(rep(c("A.ctl","A.treat","B.ctl","B.treat"),each=3),2)`
To the question, how do A and B respond differently to the treatment could also be answered like this?

Code:

```AB <- DESeqDataSetFromMatrix(countData = countData,colData = colDataAB, design = ~ condition) ABpairwise <- DESeq(AB) resAB <- results(ABpairwise, contrast = c("condition", "A.treat", "B.treat"))```

 dpryan 04-28-2016 05:38 AM

Code:

`results(ABpairwise, contrast=c(1, -1, -1, 1))`
You want the additional effect due to the interaction of treatment and genotype B. In other words, you need to take B.treat - B.ctl - (A.treat - A.ctl) (represented by a vector of "1, -1, -1, 1").

Regarding the LRT, you can do that too, but you'd need to be a bit more comfortable with what that's actually testing.

 All times are GMT -8. The time now is 11:05 PM.