Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • edgeR: How important is the FDR value?

    Hi,

    I was wondering if any one can answer my question about FDR

    How exactly we should determine the cutoff limit for FDR value? Is 0.1 acceptable or 0.2? Because the number of significantly expressed genes changes dramatically even for slightest changes in FDR value. For a Publication, how much FDR is a good FDR?

    If I select P.value < 0.05 and ignore FDR, I am getting around 200 differentially expressed genes. But If I use FDR <0.085 along with P value <0.05, the number drops to 65. can we publish without FDR?

    Thanks in advance.

  • #2
    So, you mean "if ((FDR < 0.085) OR (p.value < 0.05)) ?
    Sounds like your fishing for the cutoff that includes the really cool thing you want to show.

    You'd probably want FDR<0.05 .

    Comment


    • #3
      I feel that this is an appropriate contribution:

      Comment


      • #4
        Originally posted by Richard Finney View Post
        So, you mean "if ((FDR < 0.085) OR (p.value < 0.05)) ?
        Sounds like your fishing for the cutoff that includes the really cool thing you want to show.

        You'd probably want FDR<0.05 .
        No I meant "if ((FDR < 0.085) AND (p.value < 0.05))...no "fishing" or "fishy" business here. I am genuinely curious, how would any one sets the cutoff? There is no consensus in published literature either.


        @swbarnes2 - thanks for the link.

        Comment


        • #5
          You'll want to "bonferoni adjust" your p-values or use FDR.
          Stick with < 0.05 for FDR.

          Comment


          • #6
            The good thing about the false discovery rate (FDR) is that it has a clear, easily understandable, meaning. If you cut at an FDR value of 0.1 (10%), your list of significant hits has (in expectation) at most 10% false positives. So, if you get 60 genes with FDR-adjusted p value below 10%, this list will contain around 6 false ones.

            The reason that there is no consensus on which FDR level to chose is that it is not asked too much to make an informed case-by-case decision what FDR might be acceptable for a given experiment, depending on the kind of conclusions one whishes to draw.

            And just as a reminder: Don't even think about thresholding the raw p values in genomic experiments. This is nearly always nonsense, and I wish editors would make it a rule to simply reject papers doing that immediately instead of waiting for the referees to spot it.

            Comment


            • #7
              Originally posted by Simon Anders View Post
              And just as a reminder: Don't even think about thresholding the raw p values in genomic experiments. This is nearly always nonsense, and I wish editors would make it a rule to simply reject papers doing that immediately instead of waiting for the referees to spot it.
              First off excuse my ignorance of statistics ... but I'm trying to get better. So here's the stupid question?

              So why is it bad to threshold raw p-values? I always threshold FDR just because it makes more sense to my simplistic viewpoint.

              Off topic, agree, editors should have a list of stuff that just is not allowed. My personal favorites, quantitating western blots with no standard curve and ChIPs were IgG is the only negative control.
              --------------
              Ethan

              Comment


              • #8
                Originally posted by ETHANol View Post
                So why is it bad to threshold raw p-values?
                Didn't see the question until now, but to not leave it unanswered:

                Imagine your genome has 10,000 genes, You think that some of them are differentially expressed, but, in reality, none of them is. You cut your p values at 0.05.

                Now remember the definition of a p value: If a test result is assigned the p value p, the probability of seeing a result this strong or stronger only due to noise (i.e., with there being no real effect) is p.

                Hence, even if no genes are differentially expressed, 5% of the genes will have a p value below 5%. For 10,000 genes, these are 500.

                Now. let's assume there are truly differentially expressed genes in your study. Let's say, you find 1,000 of your 10,000 genes to have a raw p value below 5%. From the argument above, you should still expect this list of 1000 genes to contain 500 false positives, i.e., your false dicovery rate is 500/1000=50%. This is clearly unacceptably large.

                The Benjamini-Hochberg adjustment, which formalizes this argument, will hence adjust a raw p value of 0.05 to an adjusted p value of 0.5. In practise, you use the logic the other way round and decide on a false discovery rate that you deem acceptable, and look up which genes got an adjusted value below this.
                Last edited by Simon Anders; 02-09-2012, 01:21 PM.

                Comment


                • #9
                  Simon thank your very much for the lesson. I 'm trying to become more statistically literate at the moment. I didn't appreciate the difference between p-value and FDR.

                  BTW, the DESeq Bioconductor vignette is one of the few Bioconductor vignettes that make any sense whatsoever.
                  --------------
                  Ethan

                  Comment


                  • #10
                    That was indeed clarifying!

                    I agree on the DEseq2 manual, but my personal favourite is the edgeR manual still, can never be to many examples or simple explanations, in my taste.

                    Comment


                    • #11
                      "make an informed case-by-case decision what FDR might be acceptable for a given experiment"

                      Could you please elaborate?

                      How about this case:

                      You run DESeq2, you pick out 10 genes you want to look at including p values.
                      Say 6 genes have p < 0.05.
                      You then use p.adjust in R.
                      What FDR do you choose and why?
                      Which n do you set?
                      Last edited by sindrle; 01-27-2014, 06:03 AM.

                      Comment


                      • #12
                        Originally posted by polsum View Post
                        Hi,

                        I was wondering if any one can answer my question about FDR

                        How exactly we should determine the cutoff limit for FDR value? Is 0.1 acceptable or 0.2? Because the number of significantly expressed genes changes dramatically even for slightest changes in FDR value. For a Publication, how much FDR is a good FDR?

                        If I select P.value < 0.05 and ignore FDR, I am getting around 200 differentially expressed genes. But If I use FDR <0.085 along with P value <0.05, the number drops to 65. can we publish without FDR?

                        Thanks in advance.
                        I don't think FDR is very important for RNA-seq. For multiple hypothesis tests where each test has uniform variance and is sufficiently powered, FDR might be OK, however for the counts data FDR doesn't take into account the fact that many of the tests were negative due to insufficient coverage rather than the tests not being discernible, so FDR is confounded by the sampling methodology. IE if you had sampled 1000 genes, and the null hypothesis was rejected for 50, and of the other 950, 80% had low coverage. In theory you could sequence more from the same samples then some of the other 80% could be significant, which doesn't make sense from an FDR stand point. I think this also applies to Bonferroni. On the flip side, you could get a fabulous FDR, by simply not sequencing very much.

                        Comment


                        • #13
                          @rskr: Given that this is in the context of DESeq2 (I realize that the thread is titled with edgeR...), low-count genes are automatically dropped and power maximized (I have to admit that it's handy to not have to do this myself anymore). So, the low-coverage genes screwing the p-values critique doesn't apply.

                          @sindrle: The informed decision is basically short-hand for what you want to do downstream (at least that's what I would mean had I written that...perhaps Simon means something else). If you're just interested in generally describing broad changes (e.g. in enriched GO terms) then you can be a bit more lax with the adjusted p-value cutoff. If, on the other hand, you're going to generate a bunch of transgenic mice or start a large-scale drug screen (i.e., your next step involves large amounts of time/money), then you really really need to be positive that you're not following up a spurious result. In those cases, you'd use a much lower adjusted p-value threshold. A bit of understanding of the underlying biology can also help make an informed decision here.

                          Other considerations could be:
                          1) How many hits did you find at a given threshold and how many did you expect (given preliminary data or published literature)?
                          2) If there are known changes, how many of those did you get at a given threshold?
                          3) Do you lack ethics and just want to make a nice, but likely false, story to publish in Science/Cell/Nature? Then just use raw p-values (or "better" yet, fold-changes!) and request reviewers who only understand Western blots.

                          Comment


                          • #14
                            Edited, look below.
                            Last edited by sindrle; 01-27-2014, 11:05 AM. Reason: Wrong quote..

                            Comment


                            • #15
                              Originally posted by dpryan View Post
                              @rskr: Given that this is in the context of DESeq2 (I realize that the thread is titled with edgeR...), low-count genes are automatically dropped and power maximized (I have to admit that it's handy to not have to do this myself anymore). So, the low-coverage genes screwing the p-values critique doesn't apply.
                              Do you know how to do this in edgeR?


                              Originally posted by dpryan View Post
                              3) Do you lack ethics and just want to make a nice, but likely false, story to publish in Science/Cell/Nature? Then just use raw p-values (or "better" yet, fold-changes!) and request reviewers who only understand Western blots.
                              Thanks for the tip, Ill go for this one.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X