Here is how my data has been processed: 150bp PE RNAseq, trimmed, tophat2 mapped on both transcriptome and genome using default setting apart from insert-size and std (calculated from sampling), then HTSeq-count using:
htseq-count -q -a 5 --mode=union --stranded=no --type=exon --
idattr=gene_id\
$indir/$sample/accepted_nsort.sam \
$ref/Homo_sapiens.GRCh37.72.gtf > $outdir/$sample.htseq.counts
What worries me is there is only around 28% of the total unique mapped from tophat2 seem to have been properly counted by htseq-count. Could any one divulge what going on here?
Summary of unique mapped reads from tophat2 using samtools flagstat is:
60657139 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
60657139 + 0 mapped (100.00%:-nan%)
60657139 + 0 paired in sequencing
31424549 + 0 read1
29232590 + 0 read2
53208062 + 0 properly paired (87.72%:-nan%)
55095536 + 0 with itself and mate mapped
5561603 + 0 singletons (9.17%:-nan%)
365348 + 0 with mate mapped to a different chr
71148 + 0 with mate mapped to a different chr (mapQ>=5)
Bottom of htseq-count output:
no_feature 11321706
ambiguous 1434804
too_low_aQual 0
not_aligned 0
alignment_not_unique 13865867
The sum of all reads counts by gen_id by htseq-count using awk is 17,068,532.
htseq-count -q -a 5 --mode=union --stranded=no --type=exon --
idattr=gene_id\
$indir/$sample/accepted_nsort.sam \
$ref/Homo_sapiens.GRCh37.72.gtf > $outdir/$sample.htseq.counts
What worries me is there is only around 28% of the total unique mapped from tophat2 seem to have been properly counted by htseq-count. Could any one divulge what going on here?
Summary of unique mapped reads from tophat2 using samtools flagstat is:
60657139 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
60657139 + 0 mapped (100.00%:-nan%)
60657139 + 0 paired in sequencing
31424549 + 0 read1
29232590 + 0 read2
53208062 + 0 properly paired (87.72%:-nan%)
55095536 + 0 with itself and mate mapped
5561603 + 0 singletons (9.17%:-nan%)
365348 + 0 with mate mapped to a different chr
71148 + 0 with mate mapped to a different chr (mapQ>=5)
Bottom of htseq-count output:
no_feature 11321706
ambiguous 1434804
too_low_aQual 0
not_aligned 0
alignment_not_unique 13865867
The sum of all reads counts by gen_id by htseq-count using awk is 17,068,532.
Comment