Hi,
I just started to use htseq-count for my paired-end RNA-Seq data. I've name-sorted uniquely mapped reads outputted by tophat, converted the sorted bam file to sam, and used the sam file with an ensembl gtf file (version 74 for human) as the input to htseq-count. Here's the command and output.
Command:
python -m HTSeq.scripts.count -o out.sam -i gene_name sorted.sam Ensembl_GRCh37.74.gtf > counts.txt
Output:
...
14098588 sam line pairs processed.
no_feature 11732479
ambiguous 52152
too_low_aQual 0
not_aligned 0
alignment_not_unique 0
There are 21150698 reads (some paired, some not) in my sorted sam file. I understand that paired reads are counted as one, and that's why there are only 14098588 processed by htseq-count. Here are my questions:
1) Why are there so many "no_feature" reads? Is this normal for a typical RNA-Seq experiment? Is 11732479 the actual number of reads (paired or not) or the number of single reads + pairs counted as one?
2) I also checked the output.sam but there are only 11585093 reads there. It looks like htseq-count didn't record all the reads that it processed. So how does htseq-count choose which reads to record?
Thank you,
Sylvia
I just started to use htseq-count for my paired-end RNA-Seq data. I've name-sorted uniquely mapped reads outputted by tophat, converted the sorted bam file to sam, and used the sam file with an ensembl gtf file (version 74 for human) as the input to htseq-count. Here's the command and output.
Command:
python -m HTSeq.scripts.count -o out.sam -i gene_name sorted.sam Ensembl_GRCh37.74.gtf > counts.txt
Output:
...
14098588 sam line pairs processed.
no_feature 11732479
ambiguous 52152
too_low_aQual 0
not_aligned 0
alignment_not_unique 0
There are 21150698 reads (some paired, some not) in my sorted sam file. I understand that paired reads are counted as one, and that's why there are only 14098588 processed by htseq-count. Here are my questions:
1) Why are there so many "no_feature" reads? Is this normal for a typical RNA-Seq experiment? Is 11732479 the actual number of reads (paired or not) or the number of single reads + pairs counted as one?
2) I also checked the output.sam but there are only 11585093 reads there. It looks like htseq-count didn't record all the reads that it processed. So how does htseq-count choose which reads to record?
Thank you,
Sylvia
Comment