Seqanswers Leaderboard Ad

**chadn737** · 04-04-2012, 12:29 PM

Read the instructions for htseq-count:

http://www-huber.embl.de/users/anders/HTSeq/doc/count.html

-t <feature type>, --type=<feature type>

feature type (3rd column in GFF file) to be used, all features of other type are ignored (default, suitable for RNA-Seq and Ensembl GTF files: exon)

By using the -t, --type= argument you can tell what features you want htseq-count to use. The default is exon, so you cant use the defaults. The question then is do you want reads mapping only to the UTR or to the entire transcript, including the UTR? That will determine what feature type you give it.

So say you want reads mapping only to UTRs. Whatever your UTRs are named in your GFF file, use that with the -t or --type= arguments when you run htseq-count.

**emilyjia2000** · 04-06-2012, 06:15 AM

Hi chadn737,

I used -t feature in the command line and set feature to utr, please see the command i used:
htseq-count -s no -t utr -m intersection-nonempty file.sam utr.gff > urtcount.txt

the output looks like:

"06100073560ik"
"06100071803ik"
"0610008608Rik"
"0610008008Rik"
"0610008007Rik"
"0610008007Rik"
"0610009760Rik"
"06100094371ik"
"06100094166ik"
"06100094858ik"
"0610009020Rik"
"06100091744ik"

Only gene_id is there and no count. BTW, we used NCBI reference and modified it according to ensembl gff file format.

THanks,

**chadn737** · 04-06-2012, 06:37 AM

Look in your gff file to find out specifically how the UTRs are labeled. I'm not familiar with your specific species/gff file. For instance in the TAIR10 GFF they are named as three_prime_UTR or five_prime_UTR. Assuming your gff file even has the UTRs listed? You may need to modify the names using sed or alternatively count reads for 5' UTRs and 3' UTRs seperately.

**emilyjia2000** · 04-06-2012, 06:47 AM

Hi Chand737,

The GFF used had been modified according to ensembl format as following

1 bed2gff utr5 8794025 8794051 . - . transcript_id "NM_27671"; utr5_number "0016"; gene_id "NM_27671"
1 bed2gff utr5 8872242 8872295 . - . transcript_id "NM_27671"; utr5_number "0017"; gene_id "NM_27671"
1 bed2gff utr5 8989267 8989330 . - . transcript_id "NM_27671"; utr5_number "0018"; gene_id "NM_27671"
1 bed2gff utr5 9193260 9193341 . - . transcript_id "NM_27671"; utr5_number "0019"; gene_id "NM_27671"
1 bed2gff utr5 9288214 9289811 . - . transcript_id "NM_27671"; utr5_number "0020"; gene_id "NM_27671"

total 9 columns. each columns separated by tab. Starting from transcript_id to gene_id is one column (the last column).
I am using mouse genome. 5'UTR

Thanks,
Li

**chadn737** · 04-06-2012, 06:51 AM

Originally posted by emilyjia2000 View Post

Hi Chand737,

The GFF used had been modified according to ensembl format as following

1 bed2gff utr 8794025 8794051 . - . transcript_id "NM_27671"; utr_number "0016"; gene_id "NM_27671"
1 bed2gff utr 8872242 8872295 . - . transcript_id "NM_27671"; utr_number "0017"; gene_id "NM_27671"
1 bed2gff utr 8989267 8989330 . - . transcript_id "NM_27671"; utr_number "0018"; gene_id "NM_27671"
1 bed2gff utr 9193260 9193341 . - . transcript_id "NM_27671"; utr_number "0019"; gene_id "NM_27671"
1 bed2gff utr 9288214 9289811 . - . transcript_id "NM_27671"; utr_number "0020"; gene_id "NM_27671"

I am using mouse genome.

Thanks,
Li

Why does your output from your last post:

"06100073560ik"
"06100071803ik"
"0610008608Rik"
"0610008008Rik"
"0610008007Rik"
"0610008007Rik"
"0610009760Rik"
"06100094371ik"
"06100094166ik"
"06100094858ik"
"0610009020Rik"
"06100091744ik"

Not match the gene_ids of your gff file?

That suggests to me that the problem has to do with naming of something in one of your files.

**emilyjia2000** · 04-06-2012, 11:05 AM

Sorry I have two annotation files, one is the posted, another is the one with geneid as listed in the output. two files have the same format.

Sorry I didn't make it clear.

**chadn737** · 04-06-2012, 11:10 AM

Originally posted by emilyjia2000 View Post

Hi Chand737,

The GFF used had been modified according to ensembl format as following

1 bed2gff utr5 8794025 8794051 . - . transcript_id "NM_27671"; utr5_number "0016"; gene_id "NM_27671"
1 bed2gff utr5 8872242 8872295 . - . transcript_id "NM_27671"; utr5_number "0017"; gene_id "NM_27671"
1 bed2gff utr5 8989267 8989330 . - . transcript_id "NM_27671"; utr5_number "0018"; gene_id "NM_27671"
1 bed2gff utr5 9193260 9193341 . - . transcript_id "NM_27671"; utr5_number "0019"; gene_id "NM_27671"
1 bed2gff utr5 9288214 9289811 . - . transcript_id "NM_27671"; utr5_number "0020"; gene_id "NM_27671"

total 9 columns. each columns separated by tab. Starting from transcript_id to gene_id is one column (the last column).
I am using mouse genome. 5'UTR

Thanks,
Li

It looks like you edited this post while or after I replied. In this gff file your feature type is listed as "utr5" not utr.

So try using -t utr5 instead of -t utr or alternatively --type=utr5.

**chadn737** · 04-06-2012, 11:16 AM

Something else to check. At the very end of your output from htseq-count there should be the following categories:

no_feature
ambiguous
too_low_aQual
not_aligned
alignment_not_unique

How many reads fall into these categories? Assuming that everything in your sam and gff files is right, and the only problem is the feature type you give to htseq-count, then you should see read counts in these categories. Particularly since the majority of your reads are unlikely to be coming from UTR regions.

If these categories are empty, then something else is wrong.

**emilyjia2000** · 04-06-2012, 12:13 PM

Thanks Chand737, I tried both features: -t utr or -t utr5, both of them delivered the same results.

As you pointed out, I checked the category as follows:

no_feature 122533035
ambiguous 1417743
too_low_aQual 0
not_aligned 0
alignment_not_unique 20439904

Could you please let me know what's wrong with it?

Thanks

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 26 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

htseq-count on UTR

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News