Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • wariobrega
    Member
    • Jul 2012
    • 11

    Bowtie 2 explanation of the pair-end mode report

    Hello everybody. I'm a Master Student in Molecular Biology and Bioinformatics and pretty new to NGS technologies. I am using bowtie2 in pair-end mode after using bowtie1 in single end mode on some nanoCAGE RNA seq data. The reason I'm doing this is because bowtie1 in pair end mode has less customizable options than bowtie2 and is not possible to choose the size for the alignment window for the pair end mates, so I had very low alignment score against my reference genome (between 12 and 14% against my 90% and over in single end mode).

    I cannot understand the report of the alignment score of Bowtie 2 in some parts. Here's an example:
    16182999 reads; of these:
    16182999 (100.00%) were paired; of these:
    5731231 (35.42%) aligned concordantly 0 times
    4522376 (27.95%) aligned concordantly exactly 1 time
    5929392 (36.64%) aligned concordantly >1 times
    ----
    5731231 pairs aligned concordantly 0 times; of these:
    2381431 (41.55%) aligned discordantly 1 time
    ----
    3349800 pairs aligned 0 times concordantly or discordantly; of these:
    6699600 mates make up the pairs; of these:
    3814736 (56.94%) aligned 0 times
    1883429 (28.11%) aligned exactly 1 time
    1001435 (14.95%) aligned >1 times
    88.21% overall alignment rate

    I have run bowtie 2 in pair end mode using the
    Code:
    -fr
    option for the orientation of my reads.

    according to the bolded part of the report, i have a lot of reads take in PE (because it specifies 6699600 mates out of 3349800 pairs which aligns nor concordantly nor discordantly (so they are not aligned on the genome). But still a fraction of these mates aligns. How is that possible? They shouldn't align at all!

    also, I do not understand the overall score; Is there a place in which I can look to the criteria of this score? How does that 88.21% comes from?

    Thanks for any kind of answer that you can give to me!

    Daniele
  • amitm
    Member
    • Feb 2011
    • 52

    #2
    Originally posted by wariobrega View Post
    I cannot understand the report of the alignment score of Bowtie 2 in some parts. Here's an example:

    according to the bolded part of the report, i have a lot of reads take in PE (because it specifies 6699600 mates out of 3349800 pairs which aligns nor concordantly nor discordantly (so they are not aligned on the genome). But still a fraction of these mates aligns. How is that possible? They shouldn't align at all!

    also, I do not understand the overall score; Is there a place in which I can look to the criteria of this score? How does that 88.21% comes from?

    Thanks for any kind of answer that you can give to me!

    Daniele
    hi daniele,
    the Bowtie2 result summary is divided in 3 sections:
    • Concordant alignment - In your data (4522376 + 5929392) reads align concordantly. Which is 64.59% of reads
    • Discordant alignment - So now 5731231 reads remain which is 35.41% (100-64.59). Of these, 2381431 reads align discordantly. That is to say, of the non-concordant fraction, 41.55% of reads (2381431 reads) align discordantly.
    • The rest - Now, remember that alignment whether concord. or discord., but both are aligned in paired-end mode. The rest of the reads either align as singles (i.e. Read1 in one locus & Read2 in completely different locus or one mate aligned and the other unaligned) or may not align at all. So the reads that are in this section is Total -(Concord.+Discord.). That is 16182999 -(10451768+2381431) = 3349800 reads.
      Since alignment, if any, here is in single fashion so we calculate in mates (readsx2).


    Now to reach the overall alignment, count the mates in total (i.e. mates aligned in paired and mates aligned in single fashion). That would be -
    (10451768 x2)+(2381431 x2)+1883429+1001435 = 28551262 mates
    That is 28551262 mates aligned of total (16182999 x2) mates, which is 88.21%.

    Comment

    • wariobrega
      Member
      • Jul 2012
      • 11

      #3
      Originally posted by amitm View Post
      hi daniele,
      the Bowtie2 result summary is divided in 3 sections:
      • Concordant alignment - In your data (4522376 + 5929392) reads align concordantly. Which is 64.59% of reads
      • Discordant alignment - So now 5731231 reads remain which is 35.41% (100-64.59). Of these, 2381431 reads align discordantly. That is to say, of the non-concordant fraction, 41.55% of reads (2381431 reads) align discordantly.
      • The rest - Now, remember that alignment whether concord. or discord., but both are aligned in paired-end mode. The rest of the reads either align as singles (i.e. Read1 in one locus & Read2 in completely different locus or one mate aligned and the other unaligned) or may not align at all. So the reads that are in this section is Total -(Concord.+Discord.). That is 16182999 -(10451768+2381431) = 3349800 reads.
        Since alignment, if any, here is in single fashion so we calculate in mates (readsx2).


      Now to reach the overall alignment, count the mates in total (i.e. mates aligned in paired and mates aligned in single fashion). That would be -
      (10451768 x2)+(2381431 x2)+1883429+1001435 = 28551262 mates
      That is 28551262 mates aligned of total (16182999 x2) mates, which is 88.21%.
      Thanks a lot for the explanation. So Bowtie 2 DOES align the reads in SE that do not align concordantly or discordantly in PE mode. But why this further alignment should be consider useful if it the reads does not match the PE criteria? Won't they probably align in other non significant regions and maybe false the output score?

      Thanks a lot, also for the explanation of the Out score

      Comment

      • vittoria_1198
        Junior Member
        • Jan 2013
        • 2

        #4
        Hi,
        I am a very new user of bowtie. I used bowtie 2.0.0 to run a map step with single reads against a transcriptome reference.
        At the end of the process, the alignment summary looed like this:

        20000 reads; of these:
        20000 (100.00%) were unpaired; of these:
        1247 (6.24%) aligned 0 times
        18739 (93.69%) aligned exactly 1 time
        14 (0.07%) aligned >1 times
        93.77% overall alignment rate

        I am having hard time to understand the difference between the reads that align 1 time and the one that align more then 1 times. What does those last one mean? Should I considerate in my stats report only those theta align 1 time?
        Moreover, I tried to take a look at the SAM file generated, and I could not be able to understand what is in it.
        This is one line of the SAM file
        @HD VN:1.0 SO:unsorted
        @SQ SN:comp5504567_c0_seq1 LN:554

        What are the meaning of each symbols?
        Thanks so much for the help

        Vittoria

        Comment

        • winsettz
          Member
          • Sep 2012
          • 91

          #5
          Originally posted by vittoria_1198 View Post
          Hi,
          I am a very new user of bowtie. I used bowtie 2.0.0 to run a map step with single reads against a transcriptome reference.
          At the end of the process, the alignment summary looed like this:

          20000 reads; of these:
          20000 (100.00%) were unpaired; of these:
          1247 (6.24%) aligned 0 times
          18739 (93.69%) aligned exactly 1 time
          14 (0.07%) aligned >1 times
          93.77% overall alignment rate

          I am having hard time to understand the difference between the reads that align 1 time and the one that align more then 1 times. What does those last one mean? Should I considerate in my stats report only those theta align 1 time?
          Moreover, I tried to take a look at the SAM file generated, and I could not be able to understand what is in it.
          This is one line of the SAM file
          @HD VN:1.0 SO:unsorted
          @SQ SN:comp5504567_c0_seq1 LN:554

          What are the meaning of each symbols?
          Thanks so much for the help

          Vittoria
          It suggests that some reads can theoretically align to more than one part of your reference.

          For the SAM specification:


          Tag Description
          @HD The header line. The rst line if present
          VN* Format version. Accepted format: /^[0-9]+\.[0-9]+$/.
          SO Sorting order of alignments. Valid values: unknown (default),unsorted, queryname andcoordinate.

          @SQ Reference sequence dictionary. The order of @SQ lines de fines the alignment sorting order
          SN* Reference sequence name. Each @SQ line must have a unique SN tag
          LN* Reference sequence length

          Comment

          • jajclement
            Junior Member
            • May 2012
            • 6

            #6
            Hi!
            Thank you a lot to amitm for his detailed answer. I had the same interpretation problem and it is now clearer for me after reading his reply to wariobrega.

            I need another precision about the terms "concordantly" and "discondordantly": what does it mean exactly?

            Thank you to all answers
            Last edited by jajclement; 08-07-2013, 01:10 AM.

            Comment

            • dpryan
              Devon Ryan
              • Jul 2011
              • 3478

              #7
              Originally posted by jajclement View Post
              Hi!
              Thank you a lot to amitm for his detailed answer. I had the same interpretation problem and it is now clearer for me after reading his reply to wariobrega.

              I need another precision about the terms "concordantly" and "discondordantly": what does it mean exactly?

              Thank you to all answers
              Did you read the bowtie2 manual page? It's explained there.

              Comment

              • jajclement
                Junior Member
                • May 2012
                • 6

                #8
                Thank you a lot for your answer.
                In fact I just found the answer to my question...and was unable to clear my request.
                Just a lesson to keep in mind: RTFM!!!

                Comment

                • chin
                  Junior Member
                  • Mar 2017
                  • 2

                  #9
                  Bowtie2

                  Hello, I am seriously new to NGS technologies, so there is a lot of non-understanding parts. I am currently new user for Bowtie2 therefore cannot understand the report of the alignment score of Bowtie2 results. Here's an example:
                  112291170 reads; of these:
                  95730389 (85.25%) were paired; of these:
                  78503591 (82.00%) aligned concordantly 0 times
                  1638154 (1.71%) aligned concordantly exactly 1 time
                  15588644 (16.28%) aligned concordantly >1 times
                  ----
                  78503591 pairs aligned concordantly 0 times; of these:
                  1695209 (2.16%) aligned discordantly 1 time
                  ----
                  76808382 pairs aligned 0 times concordantly or discordantly; of these:
                  153616764 mates make up the pairs; of these:
                  98441019 (64.08%) aligned 0 times
                  17302306 (11.26%) aligned exactly 1 time
                  37873439 (24.65%) aligned >1 times
                  16560781 (14.75%) were unpaired; of these:
                  4655378 (28.11%) aligned 0 times
                  1290964 (7.80%) aligned exactly 1 time
                  10614439 (64.09%) aligned >1 times
                  50.44% overall alignment rate
                  In previously, there was an useful explanation onto it but I still unable to get the exact amount for overall alignment rate-how does that 50.44% comes from?
                  Thanks for helping me.

                  Comment

                  • chin
                    Junior Member
                    • Mar 2017
                    • 2

                    #10
                    How about if there not 100% for aligned, such as:
                    112291170 reads; of these:
                    95730389 (85.25%) were paired; of these:
                    78503591 (82.00%) aligned concordantly 0 times
                    1638154 (1.71%) aligned concordantly exactly 1 time
                    15588644 (16.28%) aligned concordantly >1 times
                    ----
                    78503591 pairs aligned concordantly 0 times; of these:
                    1695209 (2.16%) aligned discordantly 1 time
                    ----
                    76808382 pairs aligned 0 times concordantly or discordantly; of these:
                    153616764 mates make up the pairs; of these:
                    98441019 (64.08%) aligned 0 times
                    17302306 (11.26%) aligned exactly 1 time
                    37873439 (24.65%) aligned >1 times
                    16560781 (14.75%) were unpaired; of these:
                    4655378 (28.11%) aligned 0 times
                    1290964 (7.80%) aligned exactly 1 time
                    10614439 (64.09%) aligned >1 times
                    50.44% overall alignment rate
                    How should I calculate its overall alignment rate?
                    Thank you.

                    Comment

                    • yksikaksi
                      Member
                      • Dec 2009
                      • 20

                      #11
                      Originally posted by chin View Post
                      How about if there not 100% for aligned, such as:
                      112291170 reads; of these:
                      95730389 (85.25%) were paired; of these:
                      78503591 (82.00%) aligned concordantly 0 times
                      1638154 (1.71%) aligned concordantly exactly 1 time
                      15588644 (16.28%) aligned concordantly >1 times
                      ----
                      78503591 pairs aligned concordantly 0 times; of these:
                      1695209 (2.16%) aligned discordantly 1 time
                      ----
                      76808382 pairs aligned 0 times concordantly or discordantly; of these:
                      153616764 mates make up the pairs; of these:
                      98441019 (64.08%) aligned 0 times
                      17302306 (11.26%) aligned exactly 1 time
                      37873439 (24.65%) aligned >1 times
                      16560781 (14.75%) were unpaired; of these:
                      4655378 (28.11%) aligned 0 times
                      1290964 (7.80%) aligned exactly 1 time
                      10614439 (64.09%) aligned >1 times
                      50.44% overall alignment rate
                      How should I calculate its overall alignment rate?
                      Thank you.
                      Please give more information about the data above. A brief summary of your method and work will be hlepful for those who might able to help.

                      Comment

                      • shauryajauhari
                        Junior Member
                        • Aug 2017
                        • 2

                        #12
                        Using bowtie2, an alignment file (SAM) file was inferred. The question at hand is to report the count and sequences of multiple mapped reads.
                        As per the flags and tags specifications of the SAM format, I am trying to sift the alignment file on the basis of the following:
                        1. The value of N in each record carrying tags XS:i:<N> and AS:i:<N> must be identical.
                        2. The reads are preconditioned to be concordantly aligned, i.e. "YT:Z:CP" tag is set.
                        3. The flags 0x100 (256 bits) and 0x800 (2048 bits) represent secondary and supplementary mappings respectively.
                        Despite the aforementioned stipulations, the output is discrepant as compared to the alignment summary provided by bowtie2. Is there any alternative to the logic as stated.
                        Thanks in advance.
                        SJ.

                        Comment

                        • shauryajauhari
                          Junior Member
                          • Aug 2017
                          • 2

                          #13
                          How to derive multiple mapped reads from a SAM file?

                          MAPQ score of 255 engenders uniquely aligned reads; not to mention with the highest mapping score. Based on some literature, and by logic too, isn't it obvious that any number between 0 and 255 will represent multiple aligned reads. So, is it a subjective matter to chose quality score (depiction as per the 5th field in the SAM file) to ascertain multiple reads.
                          I believe the following will simply do the trick:
                          > samtools view -b -q <quality score> your_alignment_file.bam > filtered.bam
                          Additionally, could you also comment on the discrepancy (if any) with the alignment summary/ statistics from Bowtie2.

                          Comment

                          • gopal_botany
                            Junior Member
                            • Jan 2014
                            • 3

                            #14
                            Hello Everyone,

                            I have 2x250 paired-end data of bacteria, I have aligned reads to reference by bowtie2 and got the sam file output.

                            My query is, how to get the mapping position of read2 (mate read) as in sam alignment the mapping position remain same, See few lines of alignment

                            M01976:19:000000000-D240E:1:1101:14558:1508 73 16CFP21_Rv1984c 449 3 110M = 449 0 GTGGTTTCTCCAGCATGTTGTGGGTCGGCTTTTCGTTTCCGACAATCGTTCCGCTGTTTATCTCTAAGTCCATTAACTTGTGTGCT
                            CCCGACGTTCCATTTTGCACCGTT AFB0FFGEFAA1100B1DA2B////0/////012/>>/12/////B1//>00//>/?11212>>B2122211B2211>B1B1B01B11//////?0011?1?11110/0. AS:i:-45 XN:i:0 XM:i:15 XO:i:0 XG:i:0 NM:i:15 MD:Z:2
                            4G4G0G0G5G10G8A2G7A4A19A4A1A7G0A0 YT:Z:UP
                            M01976:19:000000000-D240E:1:1101:14558:1508 133 16CFP21_Rv1984c 449 0 * = 449 0 CCTTCCTGTTCGCCTCTATTTTCGCCGCCTGTCTTGTCATCCCCGTCTGTTCATTCGTTACATGCGCCTTATTATTTCCTCCTCCC
                            TTCCATTTTTGTTCTTCGTGTTCTC 21221011AA2/////112A2A2////////11@@1B12@21/////>/1222>2>200//1>21////?11B22>222110000/01001@22@1/00<1<1/0//0111 YT:Z:UP

                            My second query is how to find out the paired-read aligned in reverse order (strand) of the reference?

                            Please help.

                            Thanks.

                            Gopal

                            Comment

                            Latest Articles

                            Collapse

                            • SEQadmin2
                              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                              by SEQadmin2


                              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                              ...
                              06-02-2026, 10:05 AM
                            • SEQadmin2
                              Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                              by SEQadmin2


                              With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                              Introduction

                              Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                              05-22-2026, 06:42 AM
                            • SEQadmin2
                              Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                              by SEQadmin2

                              Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                              Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                              05-06-2026, 09:04 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by SEQadmin2, Yesterday, 08:59 AM
                            0 responses
                            13 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-02-2026, 12:03 PM
                            0 responses
                            22 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-02-2026, 11:40 AM
                            0 responses
                            19 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 05-28-2026, 11:40 AM
                            0 responses
                            32 views
                            0 reactions
                            Last Post SEQadmin2  
                            Working...