Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    I'm not sure I can follow you here, but I guess this is the wrong place for this debate. As a co-worker of the authors of that paper, and after having discussed it with them several times, I got convinced that their argument is correct. (Intuitively, it looks wrong, of course. The fact that a statistician's intuition gives you a wrong idea here was actuallt the motivation to write a paper on this seemingly simple point.)

    I understand that you see a flaw in their argument but I guess it would take a rather lengthy post to point it out, and SeqAnswers may not be the right forum for such a discussion, so let's better leave it at that.

    Comment


    • #32
      Originally posted by Simon Anders View Post
      I'm not sure I can follow you here, but I guess this is the wrong place for this debate. As a co-worker of the authors of that paper, and after having discussed it with them several times, I got convinced that their argument is correct. (Intuitively, it looks wrong, of course. The fact that a statistician's intuition gives you a wrong idea here was actuallt the motivation to write a paper on this seemingly simple point.)

      I understand that you see a flaw in their argument but I guess it would take a rather lengthy post to point it out, and SeqAnswers may not be the right forum for such a discussion, so let's better leave it at that.
      Well, isn't that convenient, you say it needs to be simple, no need to here what I say, then you say your idea is to complicated to explain.

      Comment


      • #33
        No, my idea is not at all "too complicated to explain". And I think I did that -- or rather I just said that Bourgon et al.'s result is applicable and that we are applying it -- and that's all there is to it.

        In our DESeq vignette, we recommend a filtering strategy, which is based on the results of the Bourgon et al. paper. You came along and said that this filtering strategy is too simplistic. Upon my reply, you become more precise and said that, actually, you consider it invalid because you consider the paper by Bourgon et al. flawed.

        If you claimed that we are wrongly applying the result of Bourgon et al to the problem at hand, then this forum would indeed be the right place for the debate. However, if I understood you correctly, you consider the paper itself flawed.

        And in this case it's your turn to explain specifically which of the paper's arguments is wrong. And I wondered if you would really we willing to now write a critique of this paper for the forum.
        Last edited by Simon Anders; 06-04-2013, 06:20 AM.

        Comment


        • #34
          Originally posted by Simon Anders View Post
          No need to make things overly complicated.

          The point of the Bourgon et al. paper is that the following is perfectly fine: Try different thresholds on the count sums (by simply scanning through a gird of values), always adjust the p values of the genes above the count sum with BH and then use the threshold that gives the largest absolute number of genes with an adjusted p value below your chosen FDR. (It may sound that such a post-hoc choosing of the threshold by peaking at the test outcome is "cheating" and breaks FDR control, but this is, somewhat surprisingly, not the case, as Bourgon et al. showed.)

          Of course, if you are specifically interested in lowly expressed genes then such a way of choosing the filter may be permissible but disadvantagous because your goal is not to optimise power to get many hits but to learn about the small genes. Then, it might be better to choose a lower threshold, just so low that you do not lose any hits at all compared to the no-filtering case.
          Maybe you could explain to me what was too complicated about my post?

          Comment


          • #35
            You wrote "I think you have to use a Chinese restaurant process or biological diversity estimate to justify your thresholds, since reads counts aren't independent."

            What do you mean by "justify"? To me, this sounds as if you claimed that a choice of a threshold without taking into account the issues that you listed will yield incorrect results.

            So, sure, "too complicated" was a poor choice of word. What I meant is: I disagree that the threshold needs justification.

            So, now, please clarify, because I still don't get it: Is your point that our way of choosing a threshold leads to invalid results? Or merely that it leads to suboptimal power?

            Comment


            • #36
              Originally posted by Simon Anders View Post
              You wrote "I think you have to use a Chinese restaurant process or biological diversity estimate to justify your thresholds, since reads counts aren't independent."

              What do you mean by "justify"? To me, this sounds as if you claimed that a choice of a threshold without taking into account the issues that you listed will yield incorrect results.

              So, sure, "too complicated" was a poor choice of word. What I meant is: I disagree that the threshold needs justification.

              So, now, please clarify, because I still don't get it: Is your point that our way of choosing a threshold leads to invalid results? Or merely that it leads to suboptimal power?
              You can use a Chinese restaurant process to derive an expected number of genes with genes being analogous to tables as in http://en.wikipedia.org/wiki/Chinese_restaurant_process

              To account for different probabilities of observing the genes, you could model the genes as species from UNTB http://en.wikipedia.org/wiki/Unified...f_biodiversity

              Then you could calculate the expected abundance of the ith most abundant gene for example.

              Comment


              • #37
                I know what a Chinese Restaurant Process is.

                What I don't know is what our argument is about. So, could you please either stop the discussion or finally reply to my question, which was: "Is your point that our way of choosing a threshold leads to invalid results? Or merely that it leads to suboptimal power?"

                Or did you simply want to suggest an alternative approach without making any claims about the validity of the approaches discussed so far in the thread?

                Comment


                • #38
                  Originally posted by Simon Anders View Post
                  I know what a Chinese Restaurant Process is.

                  What I don't know is what our argument is about. So, could you please either stop the discussion or finally reply to my question, which was: "Is your point that our way of choosing a threshold leads to invalid results? Or merely that it leads to suboptimal power?"

                  Or did you simply want to suggest an alternative approach without making any claims about the validity of the approaches discussed so far in the thread?
                  I am just saying that maybe if you used a multinomial(negative multinomial),http://www.ncbi.nlm.nih.gov/pmc/articles/PMC320217/
                  for assessing the probability that a gene will be significantly differentially expressed, then you won't need to resort to crude filtering.

                  Comment


                  • #39
                    But we don't use any filtering "for assessing the probability that a gene is significantly expressed". It still seems to me that you are discussing a very different task then we are.

                    Comment


                    • #40
                      Originally posted by Simon Anders View Post
                      But we don't use any filtering "for assessing the probability that a gene is significantly expressed". It still seems to me that you are discussing a very different task then we are.
                      No you are discussing the effect of ranked filtering on the p-values after accounting for false discovery, and I am saying this is an incorrect approach your filtering and p-value selection need to be done in one step for this type of data, since there are dependence between the genes with the sampling technique, furthermore it could be derived from first principles...

                      Comment


                      • #41
                        rskr, you make this debate more than painful! And I am getting fed up. There are two very different statements you can make in a debate on statistics:
                        (a) claim that a procedure yields wrong results, i.e., in the case of hypothesis testing, fails to maintain type-I error control.
                        (b) claim that a procedure is suboptimal, i.e. unnecessarily conservative or less robust or more complicated than needed

                        In post #38, you make a statement of type (b), saying that you think that filtering is "crude", which, I suppose, means inelegant. Now, you suddenly say, you consider the approach to be incorrect; even though I have problems parsing your sentence which seems to lack commas. Specifically, you seem to be saying that the filtering approach per se is invalid.

                        And this is why the debate is so annoying: After so many posts, I still do not know whether you want to claim that the paper by Bourgon et al. is flawed or not.

                        So, to make it easy for you, I give you a flowchart of questions for you:

                        1. Do you claim that the paper by Bourgon et al. is flawed?

                        If Yes:
                        2a. Can you pinpoint the flaw in their reasoning? If so, please do.

                        If No:
                        2b. Do you dispute that the approach discussed here is a straightforward application of the procedure suggested by Bourgon et al.? If so, where do we deviate?

                        You will understand that it is no fun debating when not even the claims are clearly stated.

                        Comment


                        • #42
                          Originally posted by Simon Anders View Post
                          rskr, you make this debate more than painful! And I am getting fed up. There are two very different statements you can make in a debate on statistics:
                          (a) claim that a procedure yields wrong results, i.e., in the case of hypothesis testing, fails to maintain type-I error control.
                          (b) claim that a procedure is suboptimal, i.e. unnecessarily conservative or less robust or more complicated than needed

                          In post #38, you make a statement of type (b), saying that you think that filtering is "crude", which, I suppose, means inelegant. Now, you suddenly say, you consider the approach to be incorrect; even though I have problems parsing your sentence which seems to lack commas. Specifically, you seem to be saying that the filtering approach per se is invalid.

                          And this is why the debate is so annoying: After so many posts, I still do not know whether you want to claim that the paper by Bourgon et al. is flawed or not.

                          So, to make it easy for you, I give you a flowchart of questions for you:

                          1. Do you claim that the paper by Bourgon et al. is flawed?

                          If Yes:
                          2a. Can you pinpoint the flaw in their reasoning? If so, please do.

                          If No:
                          2b. Do you dispute that the approach discussed here is a straightforward application of the procedure suggested by Bourgon et al.? If so, where do we deviate?

                          You will understand that it is no fun debating when not even the claims are clearly stated.
                          They used the standard normal, to model discrete data, then filtered out "low variance" data, then used a t-test. The article is just laced with all kinds of assumptions like that. They probably even generated the data according to the standard normal in their simulation. I am getting quite a bit of relish from poking fun at people that take this particular view of reality so seriously.

                          Comment


                          • #43
                            Sigh. This is getting way too stupid.

                            For the last 20 years, people modelled the continuous fluorescence data from microarrays as log-normal. If you think all the thousands of papers who do so overlooked that microarray data are in reality discrete? Write a paper about this great discovery!

                            And before you now point out that this thread was about RNA-Seq data, remember that you were talking about the paper, which is not.

                            Anyway, I feel a bit stupid that I really allowed you to drag me into such a waste of time of a discussion. I'm outta here.

                            Comment


                            • #44
                              Originally posted by Simon Anders View Post
                              Sigh. This is getting way too stupid.

                              For the last 20 years, people modelled the continuous fluorescence data from microarrays as log-normal. If you think all the thousands of papers who do so overlooked that microarray data are in reality discrete? Write a paper about this great discovery!

                              And before you now point out that this thread was about RNA-Seq data, remember that you were talking about the paper, which is not.

                              Anyway, I feel a bit stupid that I really allowed you to drag me into such a waste of time of a discussion. I'm outta here.
                              So you are applying questionable microarray theory to sequencing data, how could you possibly be wrong?

                              Comment


                              • #45
                                rskr--- are you suggesting that one should NOT filter out low-count features prior to running a differential expression test? I thought that's what we were discussing in this thread - filtering out low count genes prior to running DE in order to eliminate a lot of extreme fold change values that such low-count data tend to produce. Or did you guys somehow get into a debate over filtering p-values AFTER the DE test?
                                /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                                Salk Institute for Biological Studies, La Jolla, CA, USA */

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                30 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                32 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                28 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                52 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X