
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
DESeq2 vs EdgeR for multifactor designs  keysoon  Bioinformatics  10  01132015 05:04 AM 
DESeq2 multifactor designs  id0  Bioinformatics  6  01082015 12:23 PM 
DESeq2 multifactor designs  id0  Bioinformatics  3  01102014 06:25 AM 
DESeq2: Multifactor designs  sindrle  Bioinformatics  10  10212013 07:47 AM 
DEXSeq for multifactor design  alittleboy  Bioinformatics  5  06272013 09:43 AM 

Thread Tools 
04072015, 08:21 AM  #1 
Senior Member
Location: Germany Join Date: May 2010
Posts: 150

multifactor vs. pairwisw design
Hi everyone,
I have a data set of nine different groups, each with three samples. All these groups a are being compared against each other to find differentially regulated genes. All in all I have 14 different comparisons. I have tested one design matrix for all samples vs. a pairwise approach, where only the two compared samples were uploaded. I was wondering which way make more sense, since I'm getting different results, when comparing the two ways. this is my design matrix for all the samples: Code:
conditionT HP4_1 HP4 HP4_2 HP4 HP4_3 HP4 HP24_1 HP24 HP24_2 HP24 HP24_3 HP24 CR4w_1 CR4w4 CR4w_2 CR4w4 CR4w_3 CR4w4 CR4w24_1 CR4w24 CR4w24_2 CR4w24 CR4w24_3 CR4w24 CTRL4_1 CTRL4 CTRL4_2 CTRL4 CTRL4_3 CTRL4 CTRL24_1 CTRL24 CTRL24_2 CTRL24 CTRL24_3 CTRL24 basalCR4w_1 basalCR4w basalCR4w_2 basalCR4w basalCR4w_3 basalCR4w basalCTRL_1 basalCTRL basalCTRL_2 basalCTRL basalCTRL_3 basalCTRL basalHP_1 basalHP basalHP_2 basalHP basalHP_3 basalHP Code:
conditionT HP4_1 HP4 HP4_2 HP4 HP4_3 HP4 CTRL4_1 CTRL4 CTRL4_2 CTRL4 CTRL4_3 CTRL4 Code:
conditionT HP24_1 HP24 HP24_2 HP24 HP24_3 HP24 basalHP_1 basalHP basalHP_2 basalHP basalHP_3 basalHP Here is a sample of one of the comparisons from the full matrix design Code:
miRNA log2FoldChange padj mmumiR29a3p 0.534368658 0.000259248 mmumiR26a5p 0.378956528 0.000310647 mmumiR200a3p 0.299780505 0.00060916 mmumiR29c3p 0.433273797 0.00060916 mmumiR29b3p 0.625200783 0.001034352 mmumiR30d5p 0.253729371 0.00715 mmumiR30a5p 0.289108972 0.00715 mmumiR26b5p 0.287258966 0.009435688 mmumiR30a3p 0.263099811 0.012596849 mmumiR200c3p 0.480164731 0.016093411 mmumiR4553p 0.734375756 0.016093411 mmumiR101a3p 0.231741597 0.019381496 mmumiR101c 0.23216037 0.021264359 mmumiR30e3p 0.276381941 0.026896293 mmumiR92b3p 0.491022916 0.041665933 mmumiR99a5p 0.332214684 0.049609316 mmumiR1515p 0.259334395 0.08039887 mmumiR181c5p 0.226533316 0.08039887 mmumiR1273p 0.5404365 0.092739149 mmumiR1825p 0.460856503 0.095664474 mmumiR30e5p 0.212060214 0.095664474 Code:
log2FoldChange padj mmumiR29a3p 0.488112296 0.054110034 mmumiR29b3p 0.531545957 0.080779499 mmumiR29c3p 0.398383972 0.080779499 mmumiR451a 0.515259487 0.080779499 mmumiR26a5p 0.35141262 0.086831362 On the other hand, probably also due to the differences in the size factors I am getting log2FC values also in miRNAs, which have no reads attached at all. Code:
miRNA baseMeanA baseMeanB baseMean log2FoldChange lfcSE stat pvalue padj HP24_1 HP24_2 HP24_3 CTRL24_1 CTRL24_2 CTRL24_3 mmumiR376b5p 0 0 0.127918331 0.000507796 0.046785551 0.010853703 0.991340168 NA 0 0 0 0 0 0 mmumiR19683p 0 0 0.127682606 0.000515398 0.046192878 0.011157523 NA NA 0 0 0 0 0 0 The full design matrix shows better pvalues, but creates possible artefacts in the data set. The pairwise design shows less significant results. So, which one of the matrices will show me more realistic results? thanks, Assa 
04072015, 09:06 AM  #2 
Senior Member
Location: Cambridge, UK Join Date: May 2010
Posts: 311

In general it is preferable to fit the model to all the samples you have and then, using appropriate contrast matrices, get the comparisons of interest. This way you get better estimates of the coefficients. So your first option is recommended.
If you search the limma vignettes and/or edgeR vignettes and/or bioconductor mailing list you should be able to find similar cases. 
04072015, 09:59 PM  #3  
Senior Member
Location: Germany Join Date: May 2010
Posts: 150

Quote:
My question was more about finding out why I have such a difference in the two tables of results with the pvalues. Is there a way to see exactly what design matrix is used for the actual DESeq2 analysis ( I mean the one with the 0 and 1)? 

04102015, 06:31 AM  #4 
Senior Member
Location: Boston Join Date: Jul 2013
Posts: 333

The difference in the results table is typically due to increased or decreased dispersion estimates, when including other samples.
Take a look at the PCA plot. If the two groups you are comparing, say A and B have higher within group variance than the other groups, then what might be happening is that the dispersion estimates can be lowered by including the other groups (because we estimate a single dispersion value per gene). See the DESeq2 paper for details on the dispersion estimation. See the vignette section "Access to all calculated values", for extracting parameters. the model matrix is attr(dds, "modelMatrix") Regarding the LFCs which are near zero but not equal to zero for a contrast of two groups with zeros within a larger analysis, this is expected, but I have also fixed this behavior in the next release (v1.8 released in one week), so that these will be zeroed out ( https://support.bioconductor.org/p/65213/#65254 ) 
04172015, 01:40 AM  #5 
Senior Member
Location: Germany Join Date: May 2010
Posts: 150

thanks for the information and the news about the new version. I will look at the PCA.

Tags 
deseq2, multifactor design, pairwise 
Thread Tools  

