Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwa mem mapping quality on ambiguous mapping reads

    Hello,

    I used bwa mem to align 125bp single end reads to human decoy reference genome. I know bwa will assign mapping quality as zero when one read mapped to two or more locations in the genome. However, I noticed some reads which are mapped equally well to different genomic locations, e.g. one read is mapped to equally well to autosome chromosome (chr16) and one of the patches (GL000192). CIGAR for both alignments are 125M. However, mapping quality for the alignment on chr16 is 23, while the alignment mapped to GL000192 got mapping quality of zero. I thought both of them should have mapping quality as zero? Is this right or not?

    thanks!

  • #2
    It is also my understanding that mapping quality in that case would be zero for both.

    Is there an option to randomly keep one of the multiple mappings rather than discard all of them in bwa mem?

    Comment


    • #3
      I just put some more detail about this question:

      The fastq file used in the alignment is not a fastq file from sequencer. I sliced HYDIN2 sequence into small pieces, each is 125 bp long. I assigned base quality as 30 ("I") for all bases. So all bases have a high base quality. When I did alignment, I asked bwa to output also secondary alignment (using -a option). The record I mentioned here are as following:

      b38_1:146691684-146691808 16 16 71053369 23 125M * 0 0 AGCTGAAA.... IIIIIIIIIIII.... NM:i:1 MD:Z:88T36 AS:i:120 XS:i:110
      b38_1:146691684-146691808 272 GL000192.1 263206 0 125M * 0 0 * * NM:i:3 MD:Z:5G31G50T36 AS:i:110

      Comment


      • #4
        I can't find anywhere a formal definition for the meaning of MAPQ set to 0 by BWA.
        There are only forum posts saying that a MAPQ set to 0 means that a read has multiple hits.

        In your example, the second alignment has the NM tag set to 3, meaning the edit distance to the reference (number of nucleotide differences) is 3.
        The NM tag is set to 1 in the first alignment.

        One could surmise that the 1st alignment is unique in the sense that the second alignment is of such poor quality that it doesn't count.

        Admittedly, this is just wild speculation.
        There should be a formal definition of MAPQ set to 0 to which aligners should adhere, to make the interpretation of the mapping quality less arduous.

        It is certain that the second alignment is of far lesser quality than the first, so it does make sense that the mapping quality is much lower.

        Comment


        • #5
          Hi blancha,

          Thanks for the explanation! But both alignments says 125 base pair matching (CIGAR), so there is no base differences. It seems the SAM record gives different information? Or something I understand wrong?

          Comment


          • #6
            But both alignments says 125 base pair matching (CIGAR), so there is no base differences. It seems the SAM record gives different information? Or something I understand wrong?
            If you check the official SAM format specification, you'll see that M is for alignment match, and "can be a sequence match or mismatch". 125 bases aligned, but there still can be mismatches, in this case 3.


            At least, that is my understanding of the convoluted SAM format.
            Attached Files
            Last edited by blancha; 10-29-2015, 11:06 AM.

            Comment


            • #7
              Thanks! This is more clear! It seems 'M' and 'X','=' giving some redundant information.

              Comment


              • #8
                Originally posted by blancha View Post
                If you check the official SAM format specification, you'll see that M is for alignment match, and "can be a sequence match or mismatch". 125 bases aligned, but there still can be mismatches, in this case 3.

                At least, that is my understanding of the convoluted SAM format.
                Yep, that's correct. But the most recent SAM specification reports mismatches in the cigar string, as well. You can see this by mapping with BBMap, which uses the 'X' and '=' symbols.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                9 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                67 views
                0 likes
                Last Post seqadmin  
                Working...
                X