SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics
Similar Threads
Thread Thread Starter Forum Replies Last Post
DESeq2 vs EdgeR for multi-factor designs keysoon Bioinformatics 10 01-13-2015 05:04 AM
DESeq2 multi-factor designs id0 Bioinformatics 6 01-08-2015 12:23 PM
DESeq2 multi-factor designs id0 Bioinformatics 3 01-10-2014 06:25 AM
DESeq2: Multi-factor designs sindrle Bioinformatics 10 10-21-2013 07:47 AM
DEXSeq for multi-factor design alittleboy Bioinformatics 5 06-27-2013 09:43 AM

Reply
 
Thread Tools
Old 04-07-2015, 08:21 AM   #1
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 150
Default multi-factor vs. pair-wisw design

Hi everyone,

I have a data set of nine different groups, each with three samples. All these groups a are being compared against each other to find differentially regulated genes. All in all I have 14 different comparisons.
I have tested one design matrix for all samples vs. a pair-wise approach, where only the two compared samples were uploaded.

I was wondering which way make more sense, since I'm getting different results, when comparing the two ways.
this is my design matrix for all the samples:
Code:
            conditionT
HP4_1              HP4
HP4_2              HP4
HP4_3              HP4
HP24_1            HP24
HP24_2            HP24
HP24_3            HP24
CR4w_1           CR4w4
CR4w_2           CR4w4
CR4w_3           CR4w4
CR4w24_1        CR4w24
CR4w24_2        CR4w24
CR4w24_3        CR4w24
CTRL4_1          CTRL4
CTRL4_2          CTRL4
CTRL4_3          CTRL4
CTRL24_1        CTRL24
CTRL24_2        CTRL24
CTRL24_3        CTRL24
basalCR4w_1  basalCR4w
basalCR4w_2  basalCR4w
basalCR4w_3  basalCR4w
basalCTRL_1  basalCTRL
basalCTRL_2  basalCTRL
basalCTRL_3  basalCTRL
basalHP_1      basalHP
basalHP_2      basalHP
basalHP_3      basalHP
and accordingly the pair-wise design:
Code:
            conditionT
HP4_1              HP4
HP4_2              HP4
HP4_3              HP4
CTRL4_1          CTRL4
CTRL4_2          CTRL4
CTRL4_3          CTRL4
or
Code:
            conditionT
HP24_1            HP24
HP24_2            HP24
HP24_3            HP24
basalHP_1      basalHP
basalHP_2      basalHP
basalHP_3      basalHP
When comparing the different results I am getting for the first matrix design better adjusted p-values as for the pair-wise approach. As I expected, I get similar (but not identical, probably due to the different size factors) log2 fold-changes.

Here is a sample of one of the comparisons from the full matrix design
Code:
miRNA	log2FoldChange	padj
mmu-miR-29a-3p	0.534368658	0.000259248
mmu-miR-26a-5p	0.378956528	0.000310647
mmu-miR-200a-3p	0.299780505	0.00060916
mmu-miR-29c-3p	0.433273797	0.00060916
mmu-miR-29b-3p	0.625200783	0.001034352
mmu-miR-30d-5p	0.253729371	0.00715
mmu-miR-30a-5p	0.289108972	0.00715
mmu-miR-26b-5p	0.287258966	0.009435688
mmu-miR-30a-3p	0.263099811	0.012596849
mmu-miR-200c-3p	0.480164731	0.016093411
mmu-miR-455-3p	0.734375756	0.016093411
mmu-miR-101a-3p	0.231741597	0.019381496
mmu-miR-101c	0.23216037	0.021264359
mmu-miR-30e-3p	0.276381941	0.026896293
mmu-miR-92b-3p	0.491022916	0.041665933
mmu-miR-99a-5p	0.332214684	0.049609316
mmu-miR-151-5p	0.259334395	0.08039887
mmu-miR-181c-5p	-0.226533316	0.08039887
mmu-miR-127-3p	-0.5404365	0.092739149
mmu-miR-182-5p	0.460856503	0.095664474
mmu-miR-30e-5p	0.212060214	0.095664474
and the same samples from the pair-wise design:
Code:
	log2FoldChange	padj
mmu-miR-29a-3p	0.488112296	0.054110034
mmu-miR-29b-3p	0.531545957	0.080779499
mmu-miR-29c-3p	0.398383972	0.080779499
mmu-miR-451a	-0.515259487	0.080779499
mmu-miR-26a-5p	0.35141262	0.086831362
It is clear that there are far less DE miRNA in the pair-wise comparison, than in the full matrix design.
On the other hand, probably also due to the differences in the size factors I am getting log2FC values also in miRNAs, which have no reads attached at all.

Code:
miRNA	baseMeanA	baseMeanB	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj	HP24_1	HP24_2	HP24_3	CTRL24_1	CTRL24_2	CTRL24_3
mmu-miR-376b-5p	0	0	0.127918331	-0.000507796	0.046785551	-0.010853703	0.991340168	NA	0	0	0	0	0	0
mmu-miR-1968-3p	0	0	0.127682606	-0.000515398	0.046192878	-0.011157523	NA	NA	0	0	0	0	0	0
I can see that the adjusted p-values are neglectable here, but still it make me wonders which of the two designs are better to continue with.

The full design matrix shows better p-values, but creates possible artefacts in the data set. The pair-wise design shows less significant results.

So, which one of the matrices will show me more realistic results?

thanks,
Assa
frymor is offline   Reply With Quote
Old 04-07-2015, 09:06 AM   #2
dariober
Senior Member
 
Location: Cambridge, UK

Join Date: May 2010
Posts: 311
Default

In general it is preferable to fit the model to all the samples you have and then, using appropriate contrast matrices, get the comparisons of interest. This way you get better estimates of the coefficients. So your first option is recommended.

If you search the limma vignettes and/or edgeR vignettes and/or bioconductor mailing list you should be able to find similar cases.
dariober is offline   Reply With Quote
Old 04-07-2015, 09:59 PM   #3
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 150
Default

Quote:
Originally Posted by dariober View Post
In general it is preferable to fit the model to all the samples you have and then, using appropriate contrast matrices, get the comparisons of interest. This way you get better estimates of the coefficients. So your first option is recommended.
thanks for the reply, I know about that fact. This is why I did it in the the first place.
My question was more about finding out why I have such a difference in the two tables of results with the p-values.

Is there a way to see exactly what design matrix is used for the actual DESeq2 analysis ( I mean the one with the 0 and 1)?
frymor is offline   Reply With Quote
Old 04-10-2015, 06:31 AM   #4
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

The difference in the results table is typically due to increased or decreased dispersion estimates, when including other samples.

Take a look at the PCA plot. If the two groups you are comparing, say A and B have higher within group variance than the other groups, then what might be happening is that the dispersion estimates can be lowered by including the other groups (because we estimate a single dispersion value per gene). See the DESeq2 paper for details on the dispersion estimation.

See the vignette section "Access to all calculated values", for extracting parameters. the model matrix is attr(dds, "modelMatrix")

Regarding the LFCs which are near zero but not equal to zero for a contrast of two groups with zeros within a larger analysis, this is expected, but I have also fixed this behavior in the next release (v1.8 released in one week), so that these will be zeroed out ( https://support.bioconductor.org/p/65213/#65254 )
Michael Love is offline   Reply With Quote
Old 04-17-2015, 01:40 AM   #5
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 150
Default

thanks for the information and the news about the new version. I will look at the PCA.
frymor is offline   Reply With Quote
Reply

Tags
deseq2, multi-factor design, pair-wise

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



All times are GMT -8. The time now is 02:00 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO