Seqanswers Leaderboard Ad

**dpryan** · 08-29-2013, 05:18 AM

If overlapping genes is such an issue for whatever you're working on, just use a stranded library prep. The likely more common objection to HTSeq is that it "ignores" multimappers rather than trying to extract some meaning from them. Honestly, that particular objection has never really swayed me, since the regions of genes not giving rise to multimapping reads should suffice to provide enough reliable single for differential expression.

Which method you choose will largely come down to how risk averse you are and what your downstream needs will be. If I'm going to use RNAseq results to generate a transgenic mouse or start some drug screens, I'm not going to spend time with RSEM data, the validity of which I'm no where near 100% certain of.

**MichalO** · 08-29-2013, 05:28 AM

Thanks dpryan! The stranded protocol is definitely a good point here. Still it costs some $100 per sample, so thrifty biologists often skip it...

Originally posted by dpryan View Post

If I'm going to use RNAseq results to generate a transgenic mouse or start some drug screens, I'm not going to spend time with RSEM data, the validity of which I'm no where near 100% certain of.

Could you briefly write down your objections towards RSEM? I have mine - like heavy dependence on annotation, not being sure in case of many isoforms, etc etc. Thanks!

**jparsons** · 08-29-2013, 10:12 AM

So I pulled up HTSeq data and RSEM data from the same run, which I have because i've been trying to come up with a good metric to judge quantitation (both of genes and transcripts).

Generally, the HTS count and the RSEM expected counts are within a few percent of one another. However, there are some significant outliers, which from a cursory inspection appear to be almost exclusively mitochondrial genes - presumably ones which are consisting entirely of multi-mapped reads. HTS also assigns some low counts to some pseudogenes which RSEM seems to avoid doing.

I usually advocate HTSeq for gene counting due to its simplicity, but I'd say that RSEM is on the right side of what we consider to be biological 'truth' in this comparison.

**MichalO** · 08-29-2013, 10:50 AM

Thanks a lot too! That's what I suspected - some small artifacts on both sides, no big differences, at least at the gene level. Have to stop being lazy and try myself

What was the species? H.Sapiens?

Originally posted by jparsons View Post

both of genes and transcripts

Did you do HTSeq on transcript level? and was it similar indeed?

**jparsons** · 08-29-2013, 10:53 AM

It was a human sample. HTSeq claims not to work on the transcript level, I used other programs there. I might just throw it at the wall anyway, but don't have high expectations.

**chadn737** · 08-29-2013, 11:40 AM

Originally posted by jparsons View Post

So I pulled up HTSeq data and RSEM data from the same run, which I have because i've been trying to come up with a good metric to judge quantitation (both of genes and transcripts).

Generally, the HTS count and the RSEM expected counts are within a few percent of one another. However, there are some significant outliers, which from a cursory inspection appear to be almost exclusively mitochondrial genes - presumably ones which are consisting entirely of multi-mapped reads. HTS also assigns some low counts to some pseudogenes which RSEM seems to avoid doing.

I usually advocate HTSeq for gene counting due to its simplicity, but I'd say that RSEM is on the right side of what we consider to be biological 'truth' in this comparison.

The "HTS also assigns some low counts to some pseudogenes which RSEM seems to avoid doing" does not make sense to me given how htseq-count works, those reads assigned to pseudogenes would have to be uniquely aligned there in the first place by the aligner. Unless of course, these are specifically psuedogenes overlapping other genes, which even then, the read would have to largely come from the pseudogene not to be discarded by htseq-counts default settings.

**jparsons** · 08-29-2013, 12:02 PM

It didn't make sense to me either, but when I was looking for places where there were discrepancies, that's what popped. If I had to hypothesize, i would think that the pseudo gene has unique sequence relative to the main gene, which by chance a sequencing error manages to catch. The alignment settings that RSEM uses were not identical to the ones I used for HTS, and may have been differently tolerant of mismatches, or maybe RSEM decided that a mm1 alignment to the main gene was more likely than a perfect match to the pseudo gene.

**chadn737** · 08-29-2013, 12:07 PM

Originally posted by jparsons View Post

It didn't make sense to me either, but when I was looking for places where there were discrepancies, that's what popped. If I had to hypothesize, i would think that the pseudo gene has unique sequence relative to the main gene, which by chance a sequencing error manages to catch. The alignment settings that RSEM uses were not identical to the ones I used for HTS, and may have been differently tolerant of mismatches, or maybe RSEM decided that a mm1 alignment to the main gene was more likely than a perfect match to the pseudo gene.

Then that is a difference between aligners, not htseq-count vs RSEM. htseq-count does not align reads or determine their locations. That is done by whatever aligner is used prior to that. So an observed discrepancy in this instance will have occurred at earlier steps and is not a valid comparison of RSEM or htseq-count.

**Simon Anders** · 09-01-2013, 05:17 AM

I would like to add that RSEM and htseq-count are tools with different purposes. RSEM aim is designed to quantify expression strength; htseq-count is not! Rather, it is a tool for the express and sole purpose of forming the first step of an analysis for diferential expression on the gene level. See my post #4 in this thread for an elaboration why these two goals suggest different treatments of overlapping genes and multimapping reads.

**MichalO** · 09-04-2013, 04:03 AM

Thanks a lot Simon! Precisely and down to the point as usual!!

**lpachter** · 09-05-2013, 12:20 AM

Its tempting to think that how one counts doesn't matter (for differential expression purposes), but here I argue that it does:

Magnitude of effect vs. statistical significance

http://liorpachter.wordpress.com/2013/08/26/magnitude-of-effect-vs-statistical-significance/

RNA-Seq is the new kid on the block, but there is still something to be learned from the stodgy microarray. One of the lessons is hidden in a tech report by Daniela Witten and Robert Tibshirani fro…

Topics	Statistics	Last Post
New Software Simplifies 3D Gene Expression Mapping by seqadmin Started by seqadmin, Yesterday, 10:17 AM	0 responses 7 views 0 reactions	Last Post by seqadmin Yesterday, 10:17 AM
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 59 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM
Mapping the snoRNAome in Zebrafish to Advance Disease Research by seqadmin Started by seqadmin, 03-18-2025, 12:50 PM	0 responses 50 views 0 reactions	Last Post by seqadmin 03-18-2025, 12:50 PM

Seqanswers Leaderboard Ad

counting wars ;) HTSeq vs RSEM

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News