View Single Post
Old 10-28-2014, 03:22 AM   #1
Location: London

Join Date: Jul 2010
Posts: 16
Default samtools sorting issue or HTSeq-count problem?

Hello all,

This has puzzled me for days and I couldnt find any explanation on the internet.
I noticed HTSeq-count gives the reads counts of a gene as 0. But when viewed in IGV with the accepted_hits.bam from tophat2 alignment, there are hundreds to thousands reads aligned in that gene region (from different samples).

This is how I used HTSeq-count from HTSeq/0.6.1p1 and samtools/1.1
samtools sort -T tmp -o accepted_hits_nsort.bam -n accepted_hits.bam
htseq-count --format=bam --strand=no --order=name -a 5 --mode=union accepted_hits_nsort.bam Homo_sapiens.GRCh37.72.gtf > htseq_count_0.6.txt

However, when I break down the accepted_hits.bam and extract the chromosome where the gene is, HTSeq-count gives the right reads counts for this BAM file.

My accepted_hits.bam contains 60,845,216 alginemnt using samtools view -c accepted_hits.bam. I dont think for PE RNA-seq, file at size of 4.5G would be a problem for samtools or HTSeq-count to deal with?

My suspicions are:
1. samtools name sort problem
2. htseq-count bug
3. out of memory

Could any one give any hint here? Any suggestion would be appreciated. Thanks.
bbl is offline   Reply With Quote