SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
htseq-out reduced number of the reads in the output sam file capricy Bioinformatics 0 03-02-2015 03:12 PM
BWASW more reads in the output SAM file than in the input file nanto Bioinformatics 2 09-18-2012 01:41 AM
Does output-file of bwa-sw contain unmapping reads? louis7781x 454 Pyrosequencing 1 04-06-2011 08:25 AM
Corona Lite - output statistic file for matching reads agali Bioinformatics 0 07-02-2010 02:30 AM

Reply
 
Thread Tools
Old 04-12-2016, 11:52 AM   #1
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 144
Default hisat2 output more reads than are in the file

Hi all,
I am in the middle of testing the hisat2 mapper and encountered a discrepancy between the output hisat2 gives me at the end of the mapping step and the number of reads samtools flagstat counts.

this is the output I get, when hisat2 is finished:
Code:
cat ../hisat2Mapping/WCE7.stat 
11389273 reads; of these:
  11389273 (100.00%) were paired; of these:
    5961828 (52.35%) aligned concordantly 0 times
    4647893 (40.81%) aligned concordantly exactly 1 time
    779552 (6.84%) aligned concordantly >1 times
    ----
    5961828 pairs aligned concordantly 0 times; of these:
      138165 (2.32%) aligned discordantly 1 time
    ----
    5823663 pairs aligned 0 times concordantly or discordantly; of these:
      11647326 mates make up the pairs; of these:
        10779117 (92.55%) aligned 0 times
        510907 (4.39%) aligned exactly 1 time
        357302 (3.07%) aligned >1 times
52.68% overall alignment rate
@Question - how are the 52.68% are calculated? what reads are being considered here as mapped?

and this is the number of reads I get, when I run samtools flagstat on the sorted/indexed bam file:
Code:
samtools flagstat ../hisat2Mapping/WCE7.sorted.bam
14328481 + 0 in total (QC-passed reads + QC-failed reads)
2329052 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
14328481 + 0 mapped (100.00%:-nan%)
11999429 + 0 paired in sequencing
6098403 + 0 read1
5901026 + 0 read2
10854890 + 0 properly paired (90.46%:-nan%)
11407498 + 0 with itself and mate mapped
591931 + 0 singletons (4.93%:-nan%)
23248 + 0 with mate mapped to a different chr
18678 + 0 with mate mapped to a different chr (mapQ>=5)
As you can see, samtools find more reads than there suppose to be originally in the file.

Is there a simple explanation for that?

thanks,
Assa
frymor is offline   Reply With Quote
Old 04-12-2016, 12:01 PM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,476
Default

52.68% = (2*(4647893+779552+138165) + 510907 + 357302)/(2*11389273)

Samtools is correct, since if sum the numerator you'll get just under 12 million total alignments, which is exactly what's in the BAM file (14328481-2329052).
dpryan is offline   Reply With Quote
Reply

Tags
flagstat, hisat2, mapped reads, samtools

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:19 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO