
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
How to assign values to size factor (DESeq)  kentnf  Bioinformatics  4  01142014 11:48 PM 
maq indelpe does not give any results  RobinVanS  Bioinformatics  0  08032011 12:20 AM 
Cuffdiff and zero FPKM values give enormous log ratio  altodor  Bioinformatics  2  04152011 08:01 AM 
interpreting DESeq result, padj  values  Azazel  Bioinformatics  0  10062010 05:49 PM 
TopHat results negative coverage values  seqseqseq  Bioinformatics  0  01212010 12:54 PM 

Thread Tools 
08222011, 07:51 AM  #1 
Member
Location: Dundee, Scotland Join Date: Apr 2008
Posts: 52

DESeq results give extremely small pvalues?
Hi,
I've got some sequencing data which following DE analysis with DESeq gives pvalues for many genes of < 1x1080. Is this typical? How does DESeq generate such small pvalues where I don't think there's enough information to do so? Does this mean that using a pvalue cutoff of 0.05 or 0.01 is too lenient? Or am I missing something. The data is four replicates of one condition versus 6 replicates of another sequenced with direct RNA seq. Any clarification much appreciated. 
08222011, 10:19 AM  #2 
Senior Member
Location: Germany Join Date: Feb 2011
Posts: 108

Roughly put, a pvalue from a statistical test is a measure of how probable is that data (or observation) likely to have occurred by chance (assuming Null hypothesis is true). A pvalue of 1e80 (appx. 0) means that this data is very unlikely to have occurred in random. Usually, this means something significant and you reject your null hypothesis.
It is normal, depending on the observation, to have pvalues very small or 0. Usually, I test for my hypothesis at alpha=0.05. Of course you could do both. In case you do multiple testing, then a correction for multiple testing must also be done to modify the pvalues accounting for false positives. If you used DESeq for differential gene expression, then it just means that you have a lot of them which are very highly significant / differentially expressed. Last edited by cedance; 08222011 at 10:23 AM. 
08232011, 12:09 AM  #3 
Member
Location: Dundee, Scotland Join Date: Apr 2008
Posts: 52

Thanks for the reply.
I do understand what pvalues mean, but the issue here (for me) is that the pvalues are so small. I've never seen such small pvalues in a differential expression analysis and am querying whether these values are believable. The example in the manual shows pvalues as small as 1x1017. I've compared the results from DESeq with limma and I know limma is not ideal for read count data, but although I see the log foldchanges are very comparable, the pvalues are completely different. 
08232011, 12:38 AM  #4 
Senior Member
Location: Germany Join Date: Feb 2011
Posts: 108

If you could give an example of your data for which you got a pvalue, maybe I or others could comment further on it. But, there is no reason to believe, assuming your statistic and observations are right, the pvalues, how small it maybe, is wrong. It just tells you your data corresponding to that particular observation(s) are *very* significant.
Did you go back to check these observations for which you obtained high significance to see if they are indeed the case? I mean, could you just look at them and tell that they could be differentially expressed? 
08232011, 06:54 AM  #5 
Member
Location: Dundee, Scotland Join Date: Apr 2008
Posts: 52

Here's an example of a gene in my data (normalised counts):
cond1 (6 reps): 32.94 52.71 53.99 33.60 38.03 49.97 cond2 (4reps): 53.09 42.20 1.02 0.64 Limma gives an adjusted pvalue of 0.326 whereas DESeq gives 5.14e06. For me those counts do not show a believable difference between the two conditions, so limma is correct. However, in the context of DESeq values of <1e80 a pvalue 5e6 is not sigificant either. So does that mean for DESeq I need to arbitrarily reduce my pvalue cutoff to p < 5e6? That doesn't seem right. BTW I believe the counts as the gene is a sex determining gene and I have 8 females and 2 males in my samples, however I'm not looking at sex in this data. 
08232011, 07:27 AM  #6 
Senior Member
Location: Germany Join Date: Feb 2011
Posts: 108

Okay, just one more question I'm sure you'd be aware of it, but just to clarify... (else, I am out of ideas! )
Limma gave an adjusted pvalue of 0.326, (accept null hypothesis) good. Does 5.14e06 from DESeq account for multiple testing? In other words, is it also "adjusted" pvalue? Or its the pvalue that is obtained directly for this particular table? If so, you'll have to use a package such as "multtest" to operate on your individual pvalues and obtain adjusted ones. 
08232011, 07:30 AM  #7 
Member
Location: Dundee, Scotland Join Date: Apr 2008
Posts: 52

Yup, they're both adjusted for multiple hypothesis testing using the BH method.

08232011, 07:34 AM  #8 
Senior Member
Location: Germany Join Date: Feb 2011
Posts: 108

Okay, in that case, I would be in doubt as to what to infer as well. To be safe, how about going with those where both of them give p<0.05?

08232011, 07:45 AM  #9 
Junior Member
Location: USA Join Date: Mar 2011
Posts: 2

Your values in cond2 have a very high variance. You need to filter these out of DESeq's D.E. calls by thresholding on variance. DESeq estimates it for you in the resVarA and resVarB columns. Try filtering out all calls with resVarA or resVarB above the 99th percentile.

08232011, 11:40 AM  #10 
Senior Member
Location: Santa Fe, NM Join Date: Oct 2010
Posts: 250

It depends on how you think about the number of observations. Is 300 reads aligning to a single loci 300 observations(technical replicates), or is it one observation of value 300 with one technical replicates . The former will give you much more power in discerning a difference in expression than the latter, though the latter may have just as much biological relevance. IMO it is 300 semiindependent observations, though if you dump the data into your jump genomics(SAS) workbench it will assume the latter, because it was built for analyzing microarrays where it was one real value(or at least some small number) of spots that was observed per chip.

08242011, 06:08 AM  #11  
Member
Location: Dundee, Scotland Join Date: Apr 2008
Posts: 52

Quote:


08292011, 06:33 AM  #12 
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994

Please also consider trying the new development (i.e., prerelease) version of DESeq, and see this thread for an explanation what is going on.

Thread Tools  

