Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DEGseq VS edgeR, which one is more reliable?

    hi, there.

    i am working on the RNA seq data analysis and i both use the R package DEGseq and edgeR to obtain DEGs .however, the DEG lists i get from these two packages are not much alike.

    here is the total number of DEGs i get:
    No code has to be inserted here.and the total matched gene is about 17380(71.40%) in dataset 1 and 16000 (65.72%) in dataset 2respectively.
    and DEGs filter threshold are : FDR <=0.001, |log2 FC|>1
    i am confused now and i just want to know which one is more reasonable?
    Last edited by tianyub836; 10-09-2011, 05:04 PM.

  • #2
    If that is supposed to be a table, enclose it in [ code ] ... [ / code ] tags to preserve the spacing.

    ...or use the [ table ] tag (syntax example in this thread: http://seqanswers.com/forums/showthread.php?t=948)

    Comment


    • #3
      I think edgeR better. I have made a compare among DEGseq, DESeq and edgeR,and made venn diagrams to find the overlap, finding the DESeq and edgeR have a better overlap. So I think edgeR better.

      It depends on you!
      Wishes!
      Engineer of Data Analysis
      E-mail: [email protected]

      Comment


      • #4
        Originally posted by Chrevan View Post
        I think edgeR better. I have made a compare among DEGseq, DESeq and edgeR,and made venn diagrams to find the overlap, finding the DESeq and edgeR have a better overlap. So I think edgeR better.

        It depends on you!
        Wishes!
        thanks for you reply, Chrevan.

        i know that DEGseq based on Poisson distribution while edgeR based on negative binomial distribution. and what i want to know is that apart from the methodology, output from which pakcage is reasonable based on common sense if there was such a thing?

        Comment


        • #5
          DESeq is basically edgeR with some improvements, so if you want common sense, that seems to be the winner. Since DESeq and edgeR use the same distribution while DEGseq uses a different one, they naturally get more similar results, and that's not a sensible way to conclude that they're better. However, both of the negative-binomial methods' authors provide good evidence that DEGseq's Poisson assumption is invalid.

          Here is the DESeq paper: http://genomebiology.com/2010/11/10/R106

          Comment


          • #6
            Originally posted by jwfoley View Post
            DESeq is basically edgeR with some improvements, so if you want common sense, that seems to be the winner. Since DESeq and edgeR use the same distribution while DEGseq uses a different one, they naturally get more similar results, and that's not a sensible way to conclude that they're better. However, both of the negative-binomial methods' authors provide good evidence that DEGseq's Poisson assumption is invalid.

            Here is the DESeq paper: http://genomebiology.com/2010/11/10/R106
            thanks, jwfoley.

            well, i have read the DESeq paper and the edgeR one. they use the same NB distribution medel and both claimed taht they suit for the identification of DEGs from RNA-seq without any replicates and that meet my situations.

            i am working on looking for DEGs of plants in abiotic stresses and my samples contain only control and treated groups. and both papers mentioned above have suggested that Poisson distribution model for no-replicates samples is acceptable. Am I right?

            as i have mentioned in the former table, nearly 1/3 matched genes outputted from DEGseq were DEGs. does that make any sense?

            Comment


            • #7
              No, the Poisson distribution is never appropriate, and I thought we said that quite clearly in our paper. You will always end up with loads of false positives.

              You simply cannot perform a proper analysis without replicates. The correct solution is to start over. (See also http://seqanswers.com/forums/showpos...04&postcount=2 )

              DESeq offers the possibility to perform a very conservative analysis for the no-replicates case which shows you only those genes which really "stick out". This can give you at least a few results.

              Comment


              • #8
                Originally posted by Simon Anders View Post
                No, the Poisson distribution is never appropriate, and I thought we said that quite clearly in our paper. You will always end up with loads of false positives.

                You simply cannot perform a proper analysis without replicates. The correct solution is to start over. (See also http://seqanswers.com/forums/showpos...04&postcount=2 )

                DESeq offers the possibility to perform a very conservative analysis for the no-replicates case which shows you only those genes which really "stick out". This can give you at least a few results.
                well , you just frightened me, Simon Anders.

                i did not understand your words by saying "You simply cannot perform a proper analysis without replicates. The correct solution is to start over".

                did you mean that, the data i was working on which simply came from control and treatment samples were meaningless?
                Last edited by tianyub836; 10-11-2011, 04:49 PM.

                Comment


                • #9
                  Originally posted by tianyub836 View Post
                  well , you just frightened me, Simon Anders.

                  i did not understand your words by saying "You simply cannot perform a proper analysis without replicates. The correct solution is to start over".

                  did you mean that, the data i was working on which simply came from control and treatment samples were meaningless?
                  If you have no measure of variability of your measurements, how can you make any conclusions about how reliable/reproducible the differential expression you observe is?

                  Comment


                  • #10
                    Originally posted by frozenlyse View Post
                    If you have no measure of variability of your measurements, how can you make any conclusions about how reliable/reproducible the differential expression you observe is?
                    what if i assumed that the variability of my measurement was ignorable or not significant enough to imapct my final output?

                    i mean that i am sure of the technical noise is minimumized and can be ignored and the biological variance is not significant.

                    Comment


                    • #11
                      Originally posted by tianyub836 View Post
                      what if i assumed that the variability of my measurement was ignorable or not significant enough to imapct my final output?

                      i mean that i am sure of the technical noise is minimumized and can be ignored and the biological variance is not significant.

                      Well, then you'd be lying to yourself. But getting list of DE genes isn't the problem (edgeR will of course still give you a table of pvals and logFC) but knowing how many of those are at all trustworthy is the problem.

                      Comment


                      • #12
                        Originally posted by tianyub836 View Post
                        i mean that i am sure of the technical noise is minimumized and can be ignored and the biological variance is not significant.
                        I am curious what makes you so sure of that?


                        There are, of course some possibilities to find something in your data. You might just guess the amount of sample-to-sample variability, and inject this information into the DESeq workflow. For a reasonable guess, however, you have better performed this kind of analysis before, with replication, and still, I would not want to see something like this in a publication. You might also estimate the variance from comparing your treatment and control samples and limit your hits to genes with so extreme fold changes that they stick out even there. DESeq's "blind" dispersion estimation is meant for that. Again, such an analysis is not publication quality.

                        Comment


                        • #13
                          Originally posted by Simon Anders View Post
                          I am curious what makes you so sure of that?


                          There are, of course some possibilities to find something in your data. You might just guess the amount of sample-to-sample variability, and inject this information into the DESeq workflow. For a reasonable guess, however, you have better performed this kind of analysis before, with replication, and still, I would not want to see something like this in a publication. You might also estimate the variance from comparing your treatment and control samples and limit your hits to genes with so extreme fold changes that they stick out even there. DESeq's "blind" dispersion estimation is meant for that. Again, such an analysis is not publication quality.
                          well, that was just an assumption of not significant impacts.

                          actually, when samples were prepared and we collected samples from multiple plants both for the control and treatment groups, which meant we had sent mixed samples for each group to be sequenced respectively. and we oringally thought that the biological replicates' impact might be reduced.

                          Did that make any sense?

                          Comment


                          • #14
                            Originally posted by tianyub836 View Post
                            actually, when samples were prepared and we collected samples from multiple plants both for the control and treatment groups, which meant we had sent mixed samples for each group to be sequenced respectively. and we oringally thought that the biological replicates' impact might be reduced.

                            Did that make any sense?
                            Only a bit. If you pool N plants, the your variance goes down to 1/N (or your standard error of expression estimates to 1/sqrt(N) of the value for a single plant.)

                            So, of course, the variance got smaller, but by pooling everything, you have lost all possibility of figuring out how small it is now.

                            What you should have done is make two or three pools for each group and add multiplexing tags to the samples so that you can put them together in one sequencing lane. Comparing the pools from the same group would have enabled you to assess the variance. Without is, you have to guess it blindly, and whatever guess you may come up with, you cannot expect anybody (especially not a reviewer of your paper) to believe that to be a good guess.

                            Comment


                            • #15
                              Originally posted by Simon Anders View Post
                              Only a bit. If you pool N plants, the your variance goes down to 1/N (or your standard error of expression estimates to 1/sqrt(N) of the value for a single plant.)

                              So, of course, the variance got smaller, but by pooling everything, you have lost all possibility of figuring out how small it is now.

                              What you should have done is make two or three pools for each group and add multiplexing tags to the samples so that you can put them together in one sequencing lane. Comparing the pools from the same group would have enabled you to assess the variance. Without is, you have to guess it blindly, and whatever guess you may come up with, you cannot expect anybody (especially not a reviewer of your paper) to believe that to be a good guess.
                              thanks, Simon Anders.

                              i admit that it was not a perfect experiment design and also i ve many details to take care.

                              it was wonderful to discuss with you

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-27-2024, 06:37 PM
                              0 responses
                              12 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-27-2024, 06:07 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              69 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X