Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq results give extremely small p-values?

    Hi,

    I've got some sequencing data which following DE analysis with DESeq gives p-values for many genes of < 1x10-80. Is this typical? How does DESeq generate such small p-values where I don't think there's enough information to do so? Does this mean that using a p-value cut-off of 0.05 or 0.01 is too lenient? Or am I missing something.

    The data is four replicates of one condition versus 6 replicates of another sequenced with direct RNA seq.

    Any clarification much appreciated.

  • #2
    Roughly put, a p-value from a statistical test is a measure of how probable is that data (or observation) likely to have occurred by chance (assuming Null hypothesis is true). A p-value of 1e-80 (appx. 0) means that this data is very unlikely to have occurred in random. Usually, this means something significant and you reject your null hypothesis.

    It is normal, depending on the observation, to have p-values very small or 0. Usually, I test for my hypothesis at alpha=0.05. Of course you could do both. In case you do multiple testing, then a correction for multiple testing must also be done to modify the p-values accounting for false positives.

    If you used DESeq for differential gene expression, then it just means that you have a lot of them which are very highly significant / differentially expressed.
    Last edited by cedance; 08-22-2011, 10:23 AM.

    Comment


    • #3
      Thanks for the reply.

      I do understand what p-values mean, but the issue here (for me) is that the p-values are so small. I've never seen such small p-values in a differential expression analysis and am querying whether these values are believable. The example in the manual shows p-values as small as 1x10-17.

      I've compared the results from DESeq with limma and I know limma is not ideal for read count data, but although I see the log fold-changes are very comparable, the p-values are completely different.

      Comment


      • #4
        If you could give an example of your data for which you got a p-value, maybe I or others could comment further on it. But, there is no reason to believe, assuming your statistic and observations are right, the p-values, how small it maybe, is wrong. It just tells you your data corresponding to that particular observation(s) are *very* significant.

        Did you go back to check these observations for which you obtained high significance to see if they are indeed the case? I mean, could you just look at them and tell that they could be differentially expressed?

        Comment


        • #5
          Here's an example of a gene in my data (normalised counts):

          cond1 (6 reps): 32.94 52.71 53.99 33.60 38.03 49.97
          cond2 (4reps): 53.09 42.20 1.02 0.64

          Limma gives an adjusted p-value of 0.326 whereas DESeq gives 5.14e-06.

          For me those counts do not show a believable difference between the two conditions, so limma is correct. However, in the context of DESeq values of <1e-80 a p-value 5e-6 is not sigificant either. So does that mean for DESeq I need to arbitrarily reduce my p-value cut-off to p < 5e-6? That doesn't seem right.

          BTW I believe the counts as the gene is a sex determining gene and I have 8 females and 2 males in my samples, however I'm not looking at sex in this data.

          Comment


          • #6
            Okay, just one more question I'm sure you'd be aware of it, but just to clarify... (else, I am out of ideas! )
            Limma gave an adjusted pvalue of 0.326, (accept null hypothesis) good.
            Does 5.14e-06 from DESeq account for multiple testing? In other words, is it also "adjusted" p-value? Or its the p-value that is obtained directly for this particular table? If so, you'll have to use a package such as "multtest" to operate on your individual p-values and obtain adjusted ones.

            Comment


            • #7
              Yup, they're both adjusted for multiple hypothesis testing using the BH method.

              Comment


              • #8
                Okay, in that case, I would be in doubt as to what to infer as well. To be safe, how about going with those where both of them give p<0.05?

                Comment


                • #9
                  Your values in cond2 have a very high variance. You need to filter these out of DESeq's D.E. calls by thresholding on variance. DESeq estimates it for you in the resVarA and resVarB columns. Try filtering out all calls with resVarA or resVarB above the 99th percentile.

                  Comment


                  • #10
                    It depends on how you think about the number of observations. Is 300 reads aligning to a single loci 300 observations(technical replicates), or is it one observation of value 300 with one technical replicates . The former will give you much more power in discerning a difference in expression than the latter, though the latter may have just as much biological relevance. IMO it is 300 semi-independent observations, though if you dump the data into your jump genomics(SAS) workbench it will assume the latter, because it was built for analyzing micro-arrays where it was one real value(or at least some small number) of spots that was observed per chip.

                    Comment


                    • #11
                      Originally posted by paulr View Post
                      Your values in cond2 have a very high variance. You need to filter these out of DESeq's D.E. calls by thresholding on variance. DESeq estimates it for you in the resVarA and resVarB columns. Try filtering out all calls with resVarA or resVarB above the 99th percentile.
                      Yup, I see that's the issue, but I thought DESeq took consideration of large variance in calculating the p-values. I'll take a look at the percentiles and see what difference it makes.

                      Comment


                      • #12
                        Please also consider trying the new development (i.e., pre-release) version of DESeq, and see this thread for an explanation what is going on.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        30 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        32 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        28 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        53 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X