SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Count number of genes matched for each primer pair - Bowtie2 - samtools danova Bioinformatics 0 07-20-2015 06:42 AM
Samtools flagstat - low % reads mapping nr23 Bioinformatics 5 11-01-2012 08:04 AM
Samtools flagstat Anelda Bioinformatics 0 09-26-2011 03:55 AM
SAMTools showing incorrect depth for a given position rdeborja Bioinformatics 1 05-07-2011 03:25 PM
samtools flagstat bair Bioinformatics 3 05-28-2010 06:15 AM

Reply
 
Thread Tools
Old 02-15-2016, 02:37 PM   #1
jmartin
Member
 
Location: St. Louis

Join Date: Dec 2009
Posts: 74
Default samtools flagstat reportin incorrect input count for Bowtie2 mapping

I am using bowtie2 (default settings) to map ~1.95 million reads against a small reference. The input queries I fed to bowtie2 were all fastq. There were 4 total files:

2 paired fastq (1523516 total reads ... 761758 pairs)
2 'orphan' files (not paired ... file1: 232282 reads, file2: 193747 reads)

In total my query files contain 1,949,545 reads. I ran all this in a single bowtie2 command using -1, -2 & -U with the 2 orphan files being comma-delimited list for -U.

Bowtie2 finished without complaint, and it reports these mapping stats:
+++++++++++++++++++++
1187787 reads; of these:
761758 (64.13%) were paired; of these:
471690 (61.92%) aligned concordantly 0 times
290068 (38.08%) aligned concordantly exactly 1 time
0 (0.00%) aligned concordantly >1 times
----
471690 pairs aligned concordantly 0 times; of these:
1089 (0.23%) aligned discordantly 1 time
----
470601 pairs aligned 0 times concordantly or discordantly; of these:
941202 mates make up the pairs; of these:
746446 (79.31%) aligned 0 times
194756 (20.69%) aligned exactly 1 time
0 (0.00%) aligned >1 times
426029 (35.87%) were unpaired; of these:
290590 (68.21%) aligned 0 times
135439 (31.79%) aligned exactly 1 time
0 (0.00%) aligned >1 times
46.81% overall alignment rate
+++++++++++++++++++++

Those numbers are wrong. My input had exactly 1,949,545 reads. Bowtie2 is reporting 1187787 total reads.

Then I took the output sam file and ran samtools flagstats, getting these numbers:
1252499 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
215463 + 0 mapped (17.20%:-nan%)
961790 + 0 paired in sequencing
519797 + 0 read1
441993 + 0 read2
212926 + 0 properly paired (22.14%:-nan%)
215046 + 0 with itself and mate mapped
298 + 0 singletons (0.03%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

this is also wrong. Then I tried running samstats, and when I add up all the MAPQ category numbers (including the unmapped), it sums up to:

1252499 reads


So, then I tried a simple samtools view command to try and count the total # reads reported:

samtools view <bam> | cut -f1 | sort | uniq | wc -l

and I get:

868833 reads



None of these methods agree with any other method. And none come close to my actual input read number. My understanding is that bowtie2 should (by default) only be reporting 1 alignment per query, so the total numbers reported shouldn't be confused by issues of supplemental & non-primary alignments.

Can anyone tell me what is going on? I see no evidence of truncation or failure in any of my outputs. And even if there had been, I would have expected these methods to report the same number of total reads. Help?
jmartin is offline   Reply With Quote
Old 02-16-2016, 06:05 AM   #2
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

This is bowtie2's behavior. But to check:


1) What version of bowtie2 are you using? Is it the most recent?
1a) Ditto with samtools.

2) Please copy and paste the *exact* command line you are using.

3) My version 2.2.7 shows output like the following (for 50 pairs, 25 singletons, and another 15 singletons. Note that the total starting number is exactly like yours: #_of_pairs plus #_of_singletons.

Your numbers are 761758 + 232282 + 193747 = 1187787

Mine are: 50 + 25 + 15 = 85

Code:
bowtie2 -x coli.fasta -1 R1.fastq -2 R2.fastq -U single.fastq,single2.fastq  -S r.sam


85 reads; of these:
  50 (58.82%) were paired; of these:
    39 (78.00%) aligned concordantly 0 times
    10 (20.00%) aligned concordantly exactly 1 time
    1 (2.00%) aligned concordantly >1 times
    ----
    39 pairs aligned concordantly 0 times; of these:
      3 (7.69%) aligned discordantly 1 time
    ----
    36 pairs aligned 0 times concordantly or discordantly; of these:
      72 mates make up the pairs; of these:
        71 (98.61%) aligned 0 times
        1 (1.39%) aligned exactly 1 time
        0 (0.00%) aligned >1 times
  35 (41.18%) were unpaired; of these:
    18 (51.43%) aligned 0 times
    17 (48.57%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
34.07% overall alignment rate
4) As for the samtools issue this does seem to be a bit odd but I'd like to get a version check first before diving deeper into the problem.
westerman is offline   Reply With Quote
Old 02-16-2016, 07:22 AM   #3
Roy
Member
 
Location: Sheffield, UK

Join Date: Oct 2009
Posts: 17
Default

I think by 1187787 reads, Bowtie2 means 1187787 "read pairs and singlets", i.e. the paired reads are only counted once. Note that 1187787+761758=1949545

The rest of the figures do suggest a truncated file (or possibly a file mix-up). Have a look at the last line to check it is complete. Samtools flagstat should give the correct read count (1949545), and your samtools view command should give the same count as bowtie2 (assuming paired reads have the same id), so 1187787.
Roy is offline   Reply With Quote
Old 02-16-2016, 12:32 PM   #4
jmartin
Member
 
Location: St. Louis

Join Date: Dec 2009
Posts: 74
Default

Thanks for the replies & testing done on your end:

@westerman:
1) I am using bowtie2 v2.1.0. So it is an old version. I don't have permissions that allow me to do installations, but I'll see if I can somehow get the latest version and try that.

1a) samtools is also old. v0.1.18. I'll look into getting a new version of that as well

2) The exact command line (minus local paths) was this:

bowtie22.1.0 -x <db> -q -1 <fastq_PE_end1> -2 <fastq_PE_end2 -U <fastq_fragmentreads_end1>,<fastq_fragmentreads_end2> -S <output_name>.sam --rg-id <RGID> --rg ID:<RGID> --rg PL:illumina --rg PU:<UNIT> --rg LB:<LIB> --rg SM:<SAMPLE>

3) So bowtie2 output is reporting mapping % for PAIRS for the paired end input, and as FRAGMENTS for the fragment input. In hindsight I guess that makes sense. In the old version at least the reported numbers are not very clear. But I do see buried inside the report numbers that add up to my exact input.

4) The other methods for looking at these numbers still are confusing to me too, but I do have an old version of samtools. If I can get an updated version maybe at least that would agree w/ my input


@Roy:
Now that you & westerman say that I do see it. The organization & format are confusing to me, but in fairness I am using an old version of bowtie2.

I've been scrutinizing my files for truncation, and I don't see anything. But that was my first thought too. I am re-mapping the data and will confirm that I get the same results.

I still don't have any idea why the other counting methods are also not working to me, nor seem to agree with my input # or the bowtie2 reported numbers. Maybe running fragments and pairs in the same alignment is beyond what those other methods can handle?
jmartin is offline   Reply With Quote
Old 02-17-2016, 02:55 PM   #5
jmartin
Member
 
Location: St. Louis

Join Date: Dec 2009
Posts: 74
Default

I wanted to share my findings after reviewing & doing some small test runs using bowtie2's output for getting mapping %.

First let me say that bowtie2 is awesome, now that I see what its doing, I like it! The other methods I tried seem confused by a mapping that includes both fragment reads and read pairs in the same starting query pool. But bowtie2's alignment report does a good job of sorting things out once I knew what I was looking at.

Here is a typical report (using bowtie v2.2.5 ... not quite the newest, but I think the report format is the same):
++++++++++++++++++++++++++++
1187787 reads; of these:
761758 (64.13%) were paired; of these:
471690 (61.92%) aligned concordantly 0 times
290068 (38.08%) aligned concordantly exactly 1 time
0 (0.00%) aligned concordantly >1 times
----
471690 pairs aligned concordantly 0 times; of these:
1089 (0.23%) aligned discordantly 1 time
----
470601 pairs aligned 0 times concordantly or discordantly; of these:
941202 mates make up the pairs; of these:
746446 (79.31%) aligned 0 times
194756 (20.69%) aligned exactly 1 time
0 (0.00%) aligned >1 times
426029 (35.87%) were unpaired; of these:
290590 (68.21%) aligned 0 times
135439 (31.79%) aligned exactly 1 time
0 (0.00%) aligned >1 times
46.81% overall alignment rate
++++++++++++++++++++++++++++

By counting my input fastq query, I have exactly 1,949,545 reads being mapped. The report very diligently breaks down the component PE and FRAGMENT reads, and reports on them separately in these sections:

PE:
761758 (64.13%) were paired; of these:
471690 (61.92%) aligned concordantly 0 times
290068 (38.08%) aligned concordantly exactly 1 time
0 (0.00%) aligned concordantly >1 times

FRAGMENT:
426029 (35.87%) were unpaired; of these:
290590 (68.21%) aligned 0 times
135439 (31.79%) aligned exactly 1 time
0 (0.00%) aligned >1 times

The number in the 'PE' block are PAIRS, while the numbers in the 'FRAGMENT' block are single reads. So 761758x2 + 426029 = 1,949,545 reads, which is exactly my input.

What had confused me is that Bowtie2's report then breaks down and reports all everything that does NOT align as PAIRS in series under the 'PE' block:

----
471690 pairs aligned concordantly 0 times; of these:
1089 (0.23%) aligned discordantly 1 time
----
470601 pairs aligned 0 times concordantly or discordantly; of these:
941202 mates make up the pairs; of these:
746446 (79.31%) aligned 0 times
194756 (20.69%) aligned exactly 1 time
0 (0.00%) aligned >1 times

This section is basically reviewing all the PAIRS that didn't map concordantly. And its saying that of those pairs, it found 1089 PAIRS that mapped discordantly (which I believe could mean having too much space between them, or possibly in the wrong orientation). Then of the PAIRS that still don't map, it looks at them as single reads ('mates') and it reports on them mapping as single reads. The 470601 PAIRS that don't align (either concordantly or disconcordantly) make up 941202 READS (mates). And of those guys, 194756 reads map by themselves.

So in total, we have this count of mapped reads:

471690 PAIRS mapped concordantly
1089 PAIRS mapped disconcordantly
194756 READS (that had been memberes of a PAIR) map by themselves
135439 READS (my FRAGMENT input) map

290068x2 + 1089x2 + 194756 + 135439 = 912509 reads mapping

So 912509 (mapped reads) / 1,949,545 (input reads) = 0.468062548

Which is exactly what the bowtie2 report lists as the "overall alignment rate" listed at the bottom of the report (46.81%).

Thanks again for the help!
jmartin is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:40 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO