![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
EDGE-pro into DESeq, DESeq run error? | epistatic | Bioinformatics | 5 | 09-02-2015 05:32 AM |
DESeq2 | Simon Anders | Bioinformatics | 123 | 07-06-2015 02:45 AM |
Cuffdiff or DESeq | ETHANol | RNA Sequencing | 47 | 08-27-2013 12:19 AM |
DESeq error | Carmen | Bioinformatics | 2 | 01-17-2013 06:22 AM |
DESeq - get chromosome | glados | Bioinformatics | 4 | 09-13-2012 08:51 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: Ireland Join Date: Nov 2011
Posts: 6
|
![]()
Hi
I get alot more DE genes when I analyse my RNA Seq dataset using the current version of DESeq versus DESeq2 -just using the default settings for both. Its a simple design of 10 disease V 10 control samples but I get an extra ~1500 genes identified as significantly DE at an FDR of 5% using DESeq2. I know DESeq2 has more powerful statistic etc but is this great of a difference likely or could I be doing something wrong...(more likely!) . I'd appreciated any advice! thanks! |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]()
This is actually interesting. Would you mind sending me your count table? (Blank out gene names and obscure sample names if your data is confidential.)
|
![]() |
![]() |
![]() |
#3 |
not just another member
Location: Belgium Join Date: Aug 2010
Posts: 264
|
![]()
Yes that seems interesting to analyze. 1500 extra DE genes seems a lot.. I suppose you used the same count matrix as input ?
|
![]() |
![]() |
![]() |
#4 |
Junior Member
Location: Ireland Join Date: Nov 2011
Posts: 6
|
![]()
Thanks Simon -I've sent you the count table via email if that's OK.
Yep its the same count matrix. |
![]() |
![]() |
![]() |
#5 |
Junior Member
Location: Nebraska Join Date: Sep 2011
Posts: 9
|
![]()
I am actually seeing something quite similar.
Using the same counts table, DESeq2 is giving vastly more differentially expressed genes than DESeq (at FDR 0.05). (3 biological rep vs 3 biological rep comparisons) Set1: 5469 DEG with DESeq, 13307 DEG with DESeq2 Set2: 15123 DEG with DESeq, 22127 DEG with DESeq2 Set3: 3818 DEG with DESeq, 10221 DEG with DESeq2 That just seems a bit odd to me. (Using R 3.0.1) |
![]() |
![]() |
![]() |
#6 |
Member
Location: Aperture Science Join Date: Mar 2012
Posts: 59
|
![]()
Same thing for me, getting A LOT of more significantly different genes (tried with 2 different projects). Though I assumed that was intended.
|
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]()
Well, yes, it is intended. DESeq had a rather ugly hack to ensure type-I error control, which cost a lot of power, and fixing this was the core aim of the development of DESeq2.
It's only that for the datasets we used to test it on the improvement was not as dramatic - hence my surprise. However, we used designed experiments with cell cultures to test, and we expect that the improvement will be more pronounced for experiments involving patient data with different genotype, which show stronger within-group variability. So, I guess this is all as it should be, and our improvement was just really needed. |
![]() |
![]() |
![]() |
#8 |
Member
Location: Ireland Join Date: May 2012
Posts: 11
|
![]()
hi simon
as a converse to the above i used deseq to analyse data from human t cells from controls (n=6) vs disease (n=5). the analysis that showed differential expression was based on a dispersion estimate using the method "per-condition" and sharing mode "fit-only" now having analysed the same samples using the standard deseq2 analysis, there is no differential expression between the 2 groups (at a fdr<0.1). is it reasonable to conclude that there is in fact no difference between groups as indicated by the deseq2 analysis and that the only reason deseq gave a significant result was becuase the tweaks above rendered the analysis less robust? |
![]() |
![]() |
![]() |
#9 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]()
Why did you use the "fit-only" mode? Ysing DESeq with "fit-only" is not robust, which is why we advise against using it except in special cases. The more robust analysis with DESeq2 is most likely closer to the truth.
|
![]() |
![]() |
![]() |
#10 |
Member
Location: Hong Kong Join Date: Feb 2012
Posts: 21
|
![]()
Hi there, in my case, I got some genes that were significant (padj of 0.000000000495 in DESeq and also 2.06326E-15 in edgeR) but got NA in DESeq2. I am not sure what is the particular behaviour of NA or when will DESeq2 return NA. I used the standard pipeline of DESeq2 as written in the manual and the counts for this gene are as follow:
Case: 7 30 60 24 35 Control: 127 188 101 236 116 And I did get some output from DESeq2, not all fields are NA: baseMean log2FoldChange lfcSE pvalue padj 89.18039938 1.87590805 0.299465406 NA NA |
![]() |
![]() |
![]() |
#11 |
Senior Member
Location: Boston Join Date: Jul 2013
Posts: 333
|
![]()
hi choishingwan,
Could you provide the version number for DESeq2? Also could you provide the output of mcols( dds )[geneNumber, ] It could either be due to a count outlier (although it doesn't appear as if any of these are outliers) or due to independent filtering if you are using the latest development branch. I plan to add a metadata column which would explain why a p-value is set to NA. |
![]() |
![]() |
![]() |
#12 |
Member
Location: Hong Kong Join Date: Feb 2012
Posts: 21
|
![]()
Hi Michael,
Thank you for your reply, my sessionInfo are as follow: Code:
R version 3.0.1 (2013-05-16) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] DESeq2_1.0.18 RcppArmadillo_0.3.900.0 Rcpp_0.10.4 lattice_0.20-15 Biobase_2.20.1 GenomicRanges_1.12.4 IRanges_1.18.2 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] annotate_1.38.0 AnnotationDbi_1.22.6 DBI_0.2-7 genefilter_1.42.0 grid_3.0.1 locfit_1.5-9.1 RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.0.1 stats4_3.0.1 survival_2.37-4 [12] XML_3.98-1.1 xtable_1.7-1 When using mcols(dds)[geneNumber,], I've got the following: Code:
DataFrame with 1 row and 26 columns baseMean baseVar allZero dispGeneEst dispGeneEstConv dispFit dispersion dispIter dispConv dispOutlier dispMAP Intercept condition_Control_vs_Case SE_Intercept SE_condition_Control_vs_Case WaldStatistic_Intercept <numeric> <numeric> <logical> <numeric> <logical> <numeric> <numeric> <numeric> <logical> <logical> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> 1 89.1804 4592.645 FALSE 0.1996928 TRUE 0.02251914 0.1365872 9 TRUE FALSE 0.1365872 5.166256 1.875908 0.2398174 0.2994654 21.54245 WaldStatistic_condition_Control_vs_Case WaldPvalue_Intercept WaldPvalue_condition_Control_vs_Case WaldAdjPvalue_Intercept WaldAdjPvalue_condition_Control_vs_Case betaConv betaIter deviance maxCooks cooksOutlier <numeric> <numeric> <numeric> <numeric> <numeric> <logical> <numeric> <numeric> <numeric> <logical> 1 6.26419 NA NA NA NA TRUE 9 95.78522 2.211495 TRUE It seems messy when I paste the output here, is there anyway for me to make it more organized? Last edited by choishingwan; 07-25-2013 at 12:48 AM. Reason: Just make the output more organize |
![]() |
![]() |
![]() |
#13 | |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]() Quote:
Code:
Put "[CODE_]" (without the "_") before and "[/CODE_]" (again, without the "_") after the block to make it appear in monospace and with line breaks at the right places. |
|
![]() |
![]() |
![]() |
#14 |
Senior Member
Location: Boston Join Date: Jul 2013
Posts: 333
|
![]()
The Cook's cutoff used here is the .75 quantile of the F(p, m-p) distribution. So for 2 parameters (intercept and condition) and 10 samples, we use a cutoff of:
> qf(.75, 2, 10 - 2) [1] 1.656854 The sample with count 60 is getting 2.211495, a high estimate of Cook's distance (meaning the log fold change would change a lot if this sample were removed), although from looking at the counts I agree that this is not desirable behavior to give this gene a p-value of NA. I will investigate how we might better set this cutoff. You can set a higher cutoff manually or turn off the filtering by Cook's with: dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) # or use a higher cutoff: p <- 2 m <- 10 dds <- nbinomWaldTest(dds, cooksCutoff=qf(.95,p,m-p)) # turn off filtering: dds <- nbinomWaldTest(dds, cooksCutoff=FALSE) Last edited by Michael Love; 07-25-2013 at 03:22 AM. |
![]() |
![]() |
![]() |
#15 |
Member
Location: Hong Kong Join Date: Feb 2012
Posts: 21
|
![]()
I see, maybe I should try turning off the Cook's filtering and see if these NA still persist.
Thank you |
![]() |
![]() |
![]() |
#16 |
Member
Location: Hong Kong Join Date: Feb 2012
Posts: 21
|
![]()
By turning off the cook's cutoff I was able to obtain a p-value for all the genes that have a p-value from DESeq and EdgeR but not from DESeq2. However, I also notice something interesting: Some of the genes have a different p-value before and after turning off the Cook's Cutoff (The change was small though)
using mcols shows that all the values are the same. I guess the difference is not that big so it should be fine. Code:
On Off 2.0392E-06 1.91466E-06 6.24279E-06 5.89022E-06 3.3727E-06 3.17301E-06 8.30502E-06 7.83552E-06 2.62679E-06 2.46954E-06 3.31702E-06 3.11984E-06 5.42979E-06 5.11889E-06 2.6574E-06 2.49848E-06 3.11848E-06 2.93252E-06 1.50445E-05 1.42226E-05 5.14826E-06 4.85033E-06 2.46503E-06 2.31563E-06 3.11574E-06 2.92985E-06 3.64453E-06 3.43143E-06 |
![]() |
![]() |
![]() |
#17 |
Senior Member
Location: Boston Join Date: Jul 2013
Posts: 333
|
![]()
I would guess that these are adjusted p-values?
By excluding the genes with an outlier count (which is independent from the test statistic under the null hypothesis), we get a small boost in the number of genes with adjusted p-value < alpha. |
![]() |
![]() |
![]() |
#18 |
Member
Location: Hong Kong Join Date: Feb 2012
Posts: 21
|
![]()
Yes, that's true, those are the padj values
|
![]() |
![]() |
![]() |
#19 |
Junior Member
Location: Melbourne, Australia Join Date: Jul 2014
Posts: 1
|
![]()
We have also found that DESeq2 is giving many more differentially expressed genes than DESeq (at adjusted p-value 0.05).
[on our 2 sets of 3 biological replicates at 2 different timepoints] Given the explanation by Simon Anders, if I understand correctly, this increase in DEGs in DESeq2 is reflective of too tight a constriction on false positives in the original DESeq so that actual DEGs were being missed. However I am curious as to why some of the DEGs we obtained with DESeq are no longer classified as DEGs in the results from DESeq2. For example if you compare the new DEGs [DESeq2] with the original DEGs [DESeq] (we used a cut-off of log2FC of <-1 to >+1), there is always quite a large proportion of genes that are different: eg. in our T80 dataset 55 of 103 under-expressed DEGs are different (58 are the same) However in our T80 dataset NONE of 29 over-expressed DEGs is the same Has anyone else had a similar experience with their data? Thanks! Tanya |
![]() |
![]() |
![]() |
#20 |
Member
Location: Hong Kong Join Date: Feb 2012
Posts: 21
|
![]()
Hi Tanya,
From what I understand, the log2FC provided by the DESeq2 might be a bit different from that produced from the DESeq programme because it tried to perform a shrinkage according to the gene count. If you look at the MA plot on page 8 of the tutorial, it states that: "... shows the shrinkage of log2 fold changes resulting from the incorporation of zero-centered normal prior. The shrinkage is greater for the log2 fold change estimates from genes with low counts and high dispersion, as can be seen by the narrowing of spread of leftmost points in the right plot." Which suggest that the log2 fold change is changed according to the gene counts. Maybe you can check if all those genes that were significant in DESeq but not in DESeq2 have a low count? If so, than that might explain the difference |
![]() |
![]() |
![]() |
Thread Tools | |
|
|