Seqanswers Leaderboard Ad

**dariober** · 01-15-2014, 12:29 AM

Originally posted by zzhao2 View Post

1) Why are there so many "no_feature" reads? Is this normal for a typical RNA-Seq experiment? Is 11732479 the actual number of reads (paired or not) or the number of single reads + pairs counted as one?

Hi- Do the chromosome names in the GTF file match those in the sam file? Ensembl files have chromosome names like 1, 2, 3 etc. while UCSC uses chr1, chr2, chr3 etc. Also, maybe check you are using the same version for both the reference fasta and the gtf (e.g. *both* are hg19).

I guess 11732479 refers to number of fragments so a single end read counts as 1 and the two ends of a paired-end read count 1 as well.

Good luck
Dario

**kmcarr** · 01-15-2014, 04:09 AM

Originally posted by zzhao2 View Post

Command:

Code:

python -m HTSeq.scripts.count -o out.sam -i gene_name sorted.sam Ensembl_GRCh37.74.gtf > counts.txt

You did not specify a --stranded option for htseq-count which means it is using the default setting, --stranded=yes. There are three possible settings for --stranded (yes, no, reverse). This has to be set properly according to the protocol used to generate your RNA-Seq library. In fact --stranded=yes is probably the least likely. What kit/protocol was used to construct the library in this case?

**zzhao2** · 01-16-2014, 08:57 AM

Hi Dario,
thanks for your reply. Yes I've noticed the chromosome name issue and changed them to chr1, chr2, etc. Without changing them I could only get a result with all genes' read counts being zero, and all reads as "no_feature".

I didn't use a reference fasta file. Should I use it, and where?

If I have 11732479 "no_feature" fragments, then it's even worse, because the total number of my fragments is 14098588, so only 17% of them are mapped to exons.

Originally posted by dariober View Post

Hi- Do the chromosome names in the GTF file match those in the sam file? Ensembl files have chromosome names like 1, 2, 3 etc. while UCSC uses chr1, chr2, chr3 etc. Also, maybe check you are using the same version for both the reference fasta and the gtf (e.g. *both* are hg19).

I guess 11732479 refers to number of fragments so a single end read counts as 1 and the two ends of a paired-end read count 1 as well.

Good luck
Dario

**zzhao2** · 01-16-2014, 09:05 AM

Hi kmcarr,
Thanks for pointing this out. I didn't specify the stranded option because my reads are stranded. I've actually tried all three settings: yes, no, and reverse, but none of them gave a number of "no_feature" reads that's significantly less than others.

I don't actually know the kit used, because what I have is only the data from our sequencing facility, but their report says that my reads are stranded.

Originally posted by kmcarr View Post

You did not specify a --stranded option for htseq-count which means it is using the default setting, --stranded=yes. There are three possible settings for --stranded (yes, no, reverse). This has to be set properly according to the protocol used to generate your RNA-Seq library. In fact --stranded=yes is probably the least likely. What kit/protocol was used to construct the library in this case?

**bob-loblaw** · 01-16-2014, 09:23 AM

I wonder though in everyone's experience, what's an "acceptable" percent of no features? Surely no organism's annotation is perfect? Plus how often is it that the RNA that you get is 100% mRNA

In this nature paper they found about 86% mapped to known exons http://www.nature.com/nature/journal...ture08872.html

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

htseq-count: many reads are "no_feature".

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News