Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • counting wars ;) HTSeq vs RSEM

    We have right now an internal lab discussion, a bit of classic "mapper wars" case. What is better, eg in the sense "closer to real biology":

    HTSeq (opponents claim: "does not do counting well in case of overlapping genes")
    RSEM on the level of gene summaries... - is the model there good enough to distinguish where the read is from in case of overlapping genes? If so - is this advantage so important that we should give up HTSeq?

    I was defending a bit HTSEq side, as I know that SimonA. knows well what he's doing and RSEM is more for transcript de-convolution not for gene-level counting... but I run out of arguments.

    Did anyone do a comparison like that or has a good intuition to help?

    Any suggestions welcome! Thanks!

  • #2
    If overlapping genes is such an issue for whatever you're working on, just use a stranded library prep. The likely more common objection to HTSeq is that it "ignores" multimappers rather than trying to extract some meaning from them. Honestly, that particular objection has never really swayed me, since the regions of genes not giving rise to multimapping reads should suffice to provide enough reliable single for differential expression.

    Which method you choose will largely come down to how risk averse you are and what your downstream needs will be. If I'm going to use RNAseq results to generate a transgenic mouse or start some drug screens, I'm not going to spend time with RSEM data, the validity of which I'm no where near 100% certain of.

    Comment


    • #3
      Thanks dpryan! The stranded protocol is definitely a good point here. Still it costs some $100 per sample, so thrifty biologists often skip it...

      Originally posted by dpryan View Post
      If I'm going to use RNAseq results to generate a transgenic mouse or start some drug screens, I'm not going to spend time with RSEM data, the validity of which I'm no where near 100% certain of.
      Could you briefly write down your objections towards RSEM? I have mine - like heavy dependence on annotation, not being sure in case of many isoforms, etc etc. Thanks!

      Comment


      • #4
        So I pulled up HTSeq data and RSEM data from the same run, which I have because i've been trying to come up with a good metric to judge quantitation (both of genes and transcripts).

        Generally, the HTS count and the RSEM expected counts are within a few percent of one another. However, there are some significant outliers, which from a cursory inspection appear to be almost exclusively mitochondrial genes - presumably ones which are consisting entirely of multi-mapped reads. HTS also assigns some low counts to some pseudogenes which RSEM seems to avoid doing.

        I usually advocate HTSeq for gene counting due to its simplicity, but I'd say that RSEM is on the right side of what we consider to be biological 'truth' in this comparison.

        Comment


        • #5
          Thanks a lot too! That's what I suspected - some small artifacts on both sides, no big differences, at least at the gene level. Have to stop being lazy and try myself What was the species? H.Sapiens?

          Originally posted by jparsons View Post
          both of genes and transcripts
          Did you do HTSeq on transcript level? and was it similar indeed?

          Comment


          • #6
            It was a human sample. HTSeq claims not to work on the transcript level, I used other programs there. I might just throw it at the wall anyway, but don't have high expectations.

            Comment


            • #7
              Originally posted by jparsons View Post
              So I pulled up HTSeq data and RSEM data from the same run, which I have because i've been trying to come up with a good metric to judge quantitation (both of genes and transcripts).

              Generally, the HTS count and the RSEM expected counts are within a few percent of one another. However, there are some significant outliers, which from a cursory inspection appear to be almost exclusively mitochondrial genes - presumably ones which are consisting entirely of multi-mapped reads. HTS also assigns some low counts to some pseudogenes which RSEM seems to avoid doing.

              I usually advocate HTSeq for gene counting due to its simplicity, but I'd say that RSEM is on the right side of what we consider to be biological 'truth' in this comparison.
              The "HTS also assigns some low counts to some pseudogenes which RSEM seems to avoid doing" does not make sense to me given how htseq-count works, those reads assigned to pseudogenes would have to be uniquely aligned there in the first place by the aligner. Unless of course, these are specifically psuedogenes overlapping other genes, which even then, the read would have to largely come from the pseudogene not to be discarded by htseq-counts default settings.

              Comment


              • #8
                It didn't make sense to me either, but when I was looking for places where there were discrepancies, that's what popped. If I had to hypothesize, i would think that the pseudo gene has unique sequence relative to the main gene, which by chance a sequencing error manages to catch. The alignment settings that RSEM uses were not identical to the ones I used for HTS, and may have been differently tolerant of mismatches, or maybe RSEM decided that a mm1 alignment to the main gene was more likely than a perfect match to the pseudo gene.

                Comment


                • #9
                  Originally posted by jparsons View Post
                  It didn't make sense to me either, but when I was looking for places where there were discrepancies, that's what popped. If I had to hypothesize, i would think that the pseudo gene has unique sequence relative to the main gene, which by chance a sequencing error manages to catch. The alignment settings that RSEM uses were not identical to the ones I used for HTS, and may have been differently tolerant of mismatches, or maybe RSEM decided that a mm1 alignment to the main gene was more likely than a perfect match to the pseudo gene.
                  Then that is a difference between aligners, not htseq-count vs RSEM. htseq-count does not align reads or determine their locations. That is done by whatever aligner is used prior to that. So an observed discrepancy in this instance will have occurred at earlier steps and is not a valid comparison of RSEM or htseq-count.

                  Comment


                  • #10
                    I would like to add that RSEM and htseq-count are tools with different purposes. RSEM aim is designed to quantify expression strength; htseq-count is not! Rather, it is a tool for the express and sole purpose of forming the first step of an analysis for diferential expression on the gene level. See my post #4 in this thread for an elaboration why these two goals suggest different treatments of overlapping genes and multimapping reads.

                    Comment


                    • #11
                      Thanks a lot Simon! Precisely and down to the point as usual!!

                      Comment


                      • #12
                        Its tempting to think that how one counts doesn't matter (for differential expression purposes), but here I argue that it does:

                        RNA-Seq is the new kid on the block, but there is still something to be learned from the stodgy microarray. One of the lessons is hidden in a tech report by Daniela Witten and Robert Tibshirani fro…

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM
                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 06:37 PM
                        0 responses
                        11 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Yesterday, 06:07 PM
                        0 responses
                        10 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-22-2024, 10:03 AM
                        0 responses
                        51 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-21-2024, 07:32 AM
                        0 responses
                        67 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X