![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
DESeq2 vs EdgeR for multi-factor designs | keysoon | Bioinformatics | 10 | 01-13-2015 05:04 AM |
DESeq2 multi-factor designs | id0 | Bioinformatics | 6 | 01-08-2015 12:23 PM |
DESeq2 multi-factor designs | id0 | Bioinformatics | 3 | 01-10-2014 06:25 AM |
DESeq2: Multi-factor designs | sindrle | Bioinformatics | 10 | 10-21-2013 07:47 AM |
DEXSeq for multi-factor design | alittleboy | Bioinformatics | 5 | 06-27-2013 09:43 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: Germany Join Date: May 2010
Posts: 150
|
![]()
Hi everyone,
I have a data set of nine different groups, each with three samples. All these groups a are being compared against each other to find differentially regulated genes. All in all I have 14 different comparisons. I have tested one design matrix for all samples vs. a pair-wise approach, where only the two compared samples were uploaded. I was wondering which way make more sense, since I'm getting different results, when comparing the two ways. this is my design matrix for all the samples: Code:
conditionT HP4_1 HP4 HP4_2 HP4 HP4_3 HP4 HP24_1 HP24 HP24_2 HP24 HP24_3 HP24 CR4w_1 CR4w4 CR4w_2 CR4w4 CR4w_3 CR4w4 CR4w24_1 CR4w24 CR4w24_2 CR4w24 CR4w24_3 CR4w24 CTRL4_1 CTRL4 CTRL4_2 CTRL4 CTRL4_3 CTRL4 CTRL24_1 CTRL24 CTRL24_2 CTRL24 CTRL24_3 CTRL24 basalCR4w_1 basalCR4w basalCR4w_2 basalCR4w basalCR4w_3 basalCR4w basalCTRL_1 basalCTRL basalCTRL_2 basalCTRL basalCTRL_3 basalCTRL basalHP_1 basalHP basalHP_2 basalHP basalHP_3 basalHP Code:
conditionT HP4_1 HP4 HP4_2 HP4 HP4_3 HP4 CTRL4_1 CTRL4 CTRL4_2 CTRL4 CTRL4_3 CTRL4 Code:
conditionT HP24_1 HP24 HP24_2 HP24 HP24_3 HP24 basalHP_1 basalHP basalHP_2 basalHP basalHP_3 basalHP Here is a sample of one of the comparisons from the full matrix design Code:
miRNA log2FoldChange padj mmu-miR-29a-3p 0.534368658 0.000259248 mmu-miR-26a-5p 0.378956528 0.000310647 mmu-miR-200a-3p 0.299780505 0.00060916 mmu-miR-29c-3p 0.433273797 0.00060916 mmu-miR-29b-3p 0.625200783 0.001034352 mmu-miR-30d-5p 0.253729371 0.00715 mmu-miR-30a-5p 0.289108972 0.00715 mmu-miR-26b-5p 0.287258966 0.009435688 mmu-miR-30a-3p 0.263099811 0.012596849 mmu-miR-200c-3p 0.480164731 0.016093411 mmu-miR-455-3p 0.734375756 0.016093411 mmu-miR-101a-3p 0.231741597 0.019381496 mmu-miR-101c 0.23216037 0.021264359 mmu-miR-30e-3p 0.276381941 0.026896293 mmu-miR-92b-3p 0.491022916 0.041665933 mmu-miR-99a-5p 0.332214684 0.049609316 mmu-miR-151-5p 0.259334395 0.08039887 mmu-miR-181c-5p -0.226533316 0.08039887 mmu-miR-127-3p -0.5404365 0.092739149 mmu-miR-182-5p 0.460856503 0.095664474 mmu-miR-30e-5p 0.212060214 0.095664474 Code:
log2FoldChange padj mmu-miR-29a-3p 0.488112296 0.054110034 mmu-miR-29b-3p 0.531545957 0.080779499 mmu-miR-29c-3p 0.398383972 0.080779499 mmu-miR-451a -0.515259487 0.080779499 mmu-miR-26a-5p 0.35141262 0.086831362 On the other hand, probably also due to the differences in the size factors I am getting log2FC values also in miRNAs, which have no reads attached at all. Code:
miRNA baseMeanA baseMeanB baseMean log2FoldChange lfcSE stat pvalue padj HP24_1 HP24_2 HP24_3 CTRL24_1 CTRL24_2 CTRL24_3 mmu-miR-376b-5p 0 0 0.127918331 -0.000507796 0.046785551 -0.010853703 0.991340168 NA 0 0 0 0 0 0 mmu-miR-1968-3p 0 0 0.127682606 -0.000515398 0.046192878 -0.011157523 NA NA 0 0 0 0 0 0 The full design matrix shows better p-values, but creates possible artefacts in the data set. The pair-wise design shows less significant results. So, which one of the matrices will show me more realistic results? thanks, Assa |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Cambridge, UK Join Date: May 2010
Posts: 311
|
![]()
In general it is preferable to fit the model to all the samples you have and then, using appropriate contrast matrices, get the comparisons of interest. This way you get better estimates of the coefficients. So your first option is recommended.
If you search the limma vignettes and/or edgeR vignettes and/or bioconductor mailing list you should be able to find similar cases. |
![]() |
![]() |
![]() |
#3 | |
Senior Member
Location: Germany Join Date: May 2010
Posts: 150
|
![]() Quote:
My question was more about finding out why I have such a difference in the two tables of results with the p-values. Is there a way to see exactly what design matrix is used for the actual DESeq2 analysis ( I mean the one with the 0 and 1)? |
|
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: Boston Join Date: Jul 2013
Posts: 333
|
![]()
The difference in the results table is typically due to increased or decreased dispersion estimates, when including other samples.
Take a look at the PCA plot. If the two groups you are comparing, say A and B have higher within group variance than the other groups, then what might be happening is that the dispersion estimates can be lowered by including the other groups (because we estimate a single dispersion value per gene). See the DESeq2 paper for details on the dispersion estimation. See the vignette section "Access to all calculated values", for extracting parameters. the model matrix is attr(dds, "modelMatrix") Regarding the LFCs which are near zero but not equal to zero for a contrast of two groups with zeros within a larger analysis, this is expected, but I have also fixed this behavior in the next release (v1.8 released in one week), so that these will be zeroed out ( https://support.bioconductor.org/p/65213/#65254 ) |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: Germany Join Date: May 2010
Posts: 150
|
![]()
thanks for the information and the news about the new version. I will look at the PCA.
|
![]() |
![]() |
![]() |
Tags |
deseq2, multi-factor design, pair-wise |
Thread Tools | |
|
|