View Single Post
Old 12-24-2016, 07:00 AM   #1
ronaldrcutler
Member
 
Location: Virginia

Join Date: May 2016
Posts: 80
Default HTSeq-count: Warning: x reads with missing mate encountered.

As I am running through an RNA-Seq pipeline using Hisat2 for alignment and HTSeq-count for counting reads in features I notice this warning at the bottom of the log file
Code:
Warning: 284233 reads with missing mate encountered.
Looking at the stats of the bam file that gave the HTSeq-count warnings using "samtools flagstat"
Code:
76075665 + 0 in total (QC-passed reads + QC-failed reads)
1565341 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
71435955 + 0 mapped (93.90% : N/A)
74510324 + 0 paired in sequencing
37255162 + 0 read1
37255162 + 0 read2
64430312 + 0 properly paired (86.47% : N/A)
67187398 + 0 with itself and mate mapped
2683216 + 0 singletons (3.60% : N/A)
2452092 + 0 with mate mapped to a different chr
2095660 + 0 with mate mapped to a different chr (mapQ>=5)
Now for the previous RNA-Seq pipeline on the same data, with the only difference being Tophat2 for alignment, I do not see this warning in the HTSeq-count log files.

Looking at the stats of the tophat2 aligned bam file that came from the same sample above.
Code:
85046681 + 0 in total (QC-passed reads + QC-failed reads)
18181171 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
85046681 + 0 mapped (100.00% : N/A)
66865510 + 0 paired in sequencing
34237825 + 0 read1
32627685 + 0 read2
16861294 + 0 properly paired (25.22% : N/A)
61704000 + 0 with itself and mate mapped
5161510 + 0 singletons (7.72% : N/A)
4055974 + 0 with mate mapped to a different chr
1899988 + 0 with mate mapped to a different chr (mapQ>=5)

I know this HTSeq-count warning is characteristic of unsorted bam files as I have run into that problem in the past. However, I made sure that I was still getting this warning even with name sorted files and making sure HTSeq-count was expecting name sorted files! I can see that in the hisat2 alignment, I did not have 100% mapping, which may explain the error - Why are these different? Both aligners were run with default settings.

Moreover, I am wondering how/why this warning occurs as I know HTSeq-count needs only paired or single alignments and cannot deal with both at the same time. Otherwise that is characteristic of this error message:
Code:
'pair_alignments' needs a sequence of paired-end alignments
Although I see in both the Tophat2 and Hisat2 stats that there are singletons.

TLDR; Why isn't there 100% mapping in Hisat2 output alignments when there is 100% mapping for tophat2 output alignments using default settings for each?

Last edited by ronaldrcutler; 12-24-2016 at 07:04 AM.
ronaldrcutler is offline   Reply With Quote