SEQanswers (
-   Bioinformatics (
-   -   How to use HTSeq to count RNA-Seq overlap w/ whole genes (intron+exon) vs. exon only (

achamess 02-18-2017 08:07 AM

How to use HTSeq to count RNA-Seq overlap w/ whole genes (intron+exon) vs. exon only
So I recently performed RNA-seq using the SmartSeq2 on RNA from neuronal nuclei.

My sample was a pool of sorted neuronal nuclei from mouse. Uing STAR to align, I found that about 20% is exonic, and 60-70% is intronic. Even though I polyA selected, there is still a ton of retained intron, which is consistent with a recent paper that did single nucleus RNASeq in the human brain

But the question is, what to do with the intronic reads?

One of my goals is differential gene expression. I want to take my RNASeq reads and count the overlap with the whole gene (intron + exon), rather than just exons.

How would I do this? What annotation file should I use for HTSeq? I have a GTF from UCSC, but I see exons, CDS but not the whole gene. How would I go about doing this? Do I need to make a custom GTF using the transcription start and stop sites?

Michael.Ante 02-20-2017 11:36 PM

I don't know if it's a good idea to do it this way, since you'll get a higher amount of overlapping features. I hope you have a stranded library.

Nevertheless, in order to get what you described you'll need a GTF which has also "transcript" in the third field ( head -n 1000 my.gtf | cut -f 3 | sort -u ; should also report transcript). ENSEMBL's gtf has it for instance.
With such a gtf, you can use this feature type instead of exons for counting:


htseq-count -t transcript my_alignment.sam my.gtf


All times are GMT -8. The time now is 08:44 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.