Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA-MEM non-primary alignments "false proper pairing"

    I was running BWA-MEM (Version: 0.7.7-r441) on my illumina sequencing data with the "-a" option to report all alignments and noticed something really weird with the non-primary alignments. Some of the non-primary alignments were given the flag "Properly paired" when the orientation of the two reads doesn't seem to be consistent with what you expect of a proper pair for a illumina dataset (reads highlighted in blue).

    Read 1
    HWI-ST513_0151:1:1101:4168:2366#0 99 2 89619552 0 76M = 89619801 287 TTCTGCTGATACCAATTTAAATCGCTGCCAATGTTCTGACTTGCCCGGCAAGTGATGGTGACTCCGTCTCCCACAG H@@@@DDD@DDDDEDDDDDDDDDDDBDDDFHDDHDEEDHDDHHHHGHFEHHHHHHHHHGHHHHHHHGHHHHHGGG? NM:i:5 MD:Z:22A5T4C30T6T4 AS:i:51 XS:i:51
    HWI-ST513_0151:1:1101:4168:2366#0 371 2 89901545 0 76M = 89619801 -281783 * * NM:i:5 MD:Z:4A6A30G4A5T22 AS:i:51
    HWI-ST513_0151:1:1101:4168:2366#0 355 2 89567927 0 76M = 89619801 51912 * * NM:i:8 MD:Z:22A1T3T5C6C3T18T6T4 AS:i:36
    HWI-ST513_0151:1:1101:4168:2366#0 371 2 89953111 0 76M = 89619801 -333349 * * NM:i:8 MD:Z:4A6A18A3G6G5A3A1T22 AS:i:36
    HWI-ST513_0151:1:1101:4168:2366#0 371 2 90458425 0 76M = 89619801 -838663 * * NM:i:9 MD:Z:4A6A22G6G0G4A3A0T0A22 AS:i:31

    Read 2
    HWI-ST513_0151:1:1101:4168:2366#0 147 2 89619801 0 38S38M = 89619552 -287 GAGGAGGGAGACTGGGTCATCAGGATGTCACATCTGGCACCTCGGAGCCAGAGTAGCAGGAGCCCCAGGAGCTGAG E;;??2F?DECEBEE@ACA?@.@;==@>A<A:>A<B:A7DD?<GED>FFDFFHHHCHFHHHHHHHFGHHHHHHHHH NM:i:0 MD:Z:38 AS:i:38 XS:i:38 SA:Z:2,89399608,-,42M34S,0,2;
    HWI-ST513_0151:1:1101:4168:2366#0 387 2 89901334 0 38M38H = 89619552 -281783 * * NM:i:0 MD:Z:38 AS:i:38
    HWI-ST513_0151:1:1101:4168:2366#0 387 2 114163992 0 38M38H = 89619552 -24544441 * * NM:i:1 MD:Z:22G15 AS:i:33
    HWI-ST513_0151:1:1101:4168:2366#0 2195 2 89399608 0 42M34H = 89619552 219904 GAGGAGGGAGACTGGGTCATCAGGATGTCACATCTGGCACCT E;;??2F?DECEBEE@ACA?@.@;==@>A<A:>A<B:A7DD? NM:i:2 MD:Z:5T15T20 AS:i:32 XS:i:32 SA:Z:2,89619801,-,38S38M,0,0;


    Specifically, if you look at the second alignment of each read, you'll notice that they have a flag of 371 and 387 indicating that the two reads align on the same strand. However, for an illumina paired end dataset, the expectation is for a proper pair to run in the following manner:
    -------> <-------


    (FYI: I have not noticed such discrepancies with the primary alignments though.)

    Thus, is this a case where BWA-MEM only checks for the mate-pair distance and not the orientation of the reads for the non-primary alignments? (i.e. we cannot trust the "properly-paired" flag for the non-primary alignments??)

  • #2
    Originally posted by kartong View Post
    I was running BWA-MEM (Version: 0.7.7-r441) on my illumina sequencing data with the "-a" option to report all alignments and noticed something really weird with the non-primary alignments. Some of the non-primary alignments were given the flag "Properly paired" when the orientation of the two reads doesn't seem to be consistent with what you expect of a proper pair for a illumina dataset (reads highlighted in blue).

    Read 1
    HWI-ST513_0151:1:1101:4168:2366#0 99 2 89619552 0 76M = 89619801 287 TTCTGCTGATACCAATTTAAATCGCTGCCAATGTTCTGACTTGCCCGGCAAGTGATGGTGACTCCGTCTCCCACAG H@@@@DDD@DDDDEDDDDDDDDDDDBDDDFHDDHDEEDHDDHHHHGHFEHHHHHHHHHGHHHHHHHGHHHHHGGG? NM:i:5 MD:Z:22A5T4C30T6T4 AS:i:51 XS:i:51
    HWI-ST513_0151:1:1101:4168:2366#0 371 2 89901545 0 76M = 89619801 -281783 * * NM:i:5 MD:Z:4A6A30G4A5T22 AS:i:51
    HWI-ST513_0151:1:1101:4168:2366#0 355 2 89567927 0 76M = 89619801 51912 * * NM:i:8 MD:Z:22A1T3T5C6C3T18T6T4 AS:i:36
    HWI-ST513_0151:1:1101:4168:2366#0 371 2 89953111 0 76M = 89619801 -333349 * * NM:i:8 MD:Z:4A6A18A3G6G5A3A1T22 AS:i:36
    HWI-ST513_0151:1:1101:4168:2366#0 371 2 90458425 0 76M = 89619801 -838663 * * NM:i:9 MD:Z:4A6A22G6G0G4A3A0T0A22 AS:i:31

    Read 2
    HWI-ST513_0151:1:1101:4168:2366#0 147 2 89619801 0 38S38M = 89619552 -287 GAGGAGGGAGACTGGGTCATCAGGATGTCACATCTGGCACCTCGGAGCCAGAGTAGCAGGAGCCCCAGGAGCTGAG E;;??2F?DECEBEE@ACA?@.@;==@>A<A:>A<B:A7DD?<GED>FFDFFHHHCHFHHHHHHHFGHHHHHHHHH NM:i:0 MD:Z:38 AS:i:38 XS:i:38 SA:Z:2,89399608,-,42M34S,0,2;
    HWI-ST513_0151:1:1101:4168:2366#0 387 2 89901334 0 38M38H = 89619552 -281783 * * NM:i:0 MD:Z:38 AS:i:38
    HWI-ST513_0151:1:1101:4168:2366#0 387 2 114163992 0 38M38H = 89619552 -24544441 * * NM:i:1 MD:Z:22G15 AS:i:33
    HWI-ST513_0151:1:1101:4168:2366#0 2195 2 89399608 0 42M34H = 89619552 219904 GAGGAGGGAGACTGGGTCATCAGGATGTCACATCTGGCACCT E;;??2F?DECEBEE@ACA?@.@;==@>A<A:>A<B:A7DD? NM:i:2 MD:Z:5T15T20 AS:i:32 XS:i:32 SA:Z:2,89619801,-,38S38M,0,0;


    Specifically, if you look at the second alignment of each read, you'll notice that they have a flag of 371 and 387 indicating that the two reads align on the same strand. However, for an illumina paired end dataset, the expectation is for a proper pair to run in the following manner:
    -------> <-------


    (FYI: I have not noticed such discrepancies with the primary alignments though.)

    Thus, is this a case where BWA-MEM only checks for the mate-pair distance and not the orientation of the reads for the non-primary alignments? (i.e. we cannot trust the "properly-paired" flag for the non-primary alignments??)
    The order of secondary alignments reported is not meant to imply pairing. You have to look at the position of the aligned read and the position of the mate alignment to determine what it is "pairing with". See that all of the various read 1 alignments are being "paired" with the primary alignment of read 2 at position of 89619801, and all of the read 2 alignments are being "paired" with the primary alignment of read 1 at position 89619552.

    Comment


    • #3
      Yes, I do understand that the order of the secondary alignments do not imply pairing.

      But if you were to look at this secondary alignment, you'll notice that it has a flag of 371 (read mapped in proper pair + read reverse strand + mate reverse strand) which does not make sense at all for a "proper pair"
      HWI-ST513_0151:1:1101:4168:2366#0 371 2 89901545 0 76M = 89619801 -281783 * * NM:i:5 MD:Z:4A6A30G4A5T22 AS:i:51

      You're right to say that this secondary is paired to the primary alignment. However, if you were to look at the mate alignment "= 89619801", it has a flag of 147 which indicates that it is mapped to the reverse strand as per the secondary alignment I listed above. Thus, this still does not quite make sense.

      Comment


      • #4
        In my bwa version (0.7.8) the manual says for the "a" parameter:

        -a output all alignments for SE or unpaired PE
        As the parameter is not to be used for paired end data, I think some errors are likely to occur

        Comment


        • #5
          I've noticed the description when I was using the "-a" parameter. What you said, makes sense. If it was meant to be an "unpaired alignment", I just don't get why the "paired properly" flag was raised. It is probably an error like what you said.

          Comment


          • #6
            Well, it would be necessary to look into the code to find the problem that is causing the error. But I think the developers were aware of it (as "real" PE reads are excluded in the description) and either didn't find a good solution or didn't think it would be required?! (I'm just speculating, of course )

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin


              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
              Yesterday, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            50 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            43 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Working...
            X