Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA : XM tag is sometimes wrong

    Hi,
    I've been using the XM tag to find reads with no mismatches but sometimes this tag doesn't give the right number of mismatches.
    Has someone else had this problem ? How did you fix it ?

    Here is a few examples :

    HWI-ST0787:100:C02F9ACXX:7:2307:2404:186548 163 gi|83578099:1-1090946 95044 60 37M1D2M1D62M = 95049 108 GGGGTTTCGGAAAACAAACTCGCTCGATACAGTAATTGCGTTTTATTTACGGAAATTACCGTTCTCGGTTCCAAGAAGGTTAGAAAAATCGGTTGTCGCTC +1+4+0=D+<CFADB9E@@99:CG:BF)*9?DDDC@D?'-<;@=FHCHDB1?EEBCFEFDDCC;?B=8<35@C9?AA?A:?(:4<8ACBB<995>>158 XT:A:U NM:i:4 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:3 XO:i:1 XG:i:2 MD:Z:0C1T34^G2^A62

    HWI-ST0787:100:C02F9ACXX:7:2307:9817:186685 147 gi|83578099:1-1090946 95060 60 21M1D2M1D78M = 94975 -188 AACTCGCTCGATACAGTAATTGCGTTTTATTTACGGAAATTACCGTTCTCGGTTCCAAGAAGGTTAGAAAAATCGGTTGTCGCTCTTTCTTTCCCCCACTT @9B@DDBBDDEEDDDDDDDDBB<@DCDDDAB<@?:3EDC?<8DDDDBDCA??EBHHHHHHIIIIJIIGGJIHIGIIGJJJIIJIGEIHF;@?1FDDBB?B? XT:A:U NM:i:2 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:1 XO:i:1 XG:i:2 MD:Z:21^G2^A78

    HWI-ST0787:100:C02F9ACXX:7:2307:17522:186893 83 gi|83578099:1-1090946 999268 60 68M1I31M2D1M = 999191 -179 CCCTGTATAATGAAATTTCAAAAATATTTTCGTGAATAGTGATTTATTTAATTTAAGCACTAAATTATCCTTACGGACTTGGGCTACATTCATGTTTGCAC BCCCCDDCADCCCCCCCCEED?3HEEA;4EAHHEG>FDCB=CIHGIIGGIHFF<DBGIHEGEIGGFF<EGGBCFAFAB?B3HB<BFE9B>DDHD??DA@?1 XT:A:U NM:i:4 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:2 XO:i:1 XG:i:1 MD:Z:3C95^AG1

    HWI-ST0787:100:C02F9ACXX:7:2307:12781:188676 83 gi|50593115:1-813178 389360 60 61M1D3M1D37M = 389271 -192 TTTAACTTATGAATGTACTTTACTGGCCAAGAATCCGTCTGGAACCATTCTACGGTGCTCTTGCTAGCGCTAAAGACAGCTATAGTGGATATTCAGACGGT >DDCCCCC@DDFDCCCBCCCCDBCCAECECDB8HHHIHDA@==)GCGDCFC8GEJIFJFIGHDJGIIGGHIIGGEBJJFHIJIGGHFGGGHHDFDDBF@C@ XT:A:U NM:i:3 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:2 XO:i:1 XG:i:2 MD:Z:61^C1T1^G37

  • #2
    BWA : XM tag is sometimes wrong

    from the CIGAR strings for the reads in your examples, it looks like you have some deletions or insertions, but not mismatches.

    Comment


    • #3
      More details

      HWI-ST0787:100:C02F9ACXX:7:2307:2404:186548 163 gi|83578099:1-1090946 95044 60 37M1D2M1D62M = 95049 108 GGGGTTTCGGAAAACAAACTCGCTCGATACAGTAATTGCGTTTTATTTACGGAAATTACCGTTCTCGGTTCCAAGAAGGTTAGAAAAATCGGTTGTCGCTC +1+4+0=D+<CFADB9E@@99:CG:BF)*9?DDDC@D?'-<;@=FHCHDB1?EEBCFEFDDCC;?B=8<35@C9?AA?A:?(:4<8ACBB<995>>158 XT:A:U NM:i:4 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:3 XO:i:1 XG:i:2 MD:Z:0C1T34^G2^A62

      For example in this line, we have 2 deletions and 2 mismatches (see the MD tag). It sums to an edit distance of 4 (in accordance to the NM tag)
      However, the XM tag is equal to 3, whereas it should be 2.

      Comment


      • #4
        BWA : XM tag is sometimes wrong

        Originally posted by mastal View Post
        from the CIGAR strings for the reads in your examples, it looks like you have some deletions or insertions, but not mismatches.
        sorry, my error, M in the CIGAR string means match or mismatch.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        9 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X