Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Strange Results After maq2sam-long

    Hi all,

    I've been working with bwa, maq and samtools for a few weeks now, and my PI just came across an unusual result which now has me worried about my results. I started off my workflow by running MAQ on a data set and matching against a restricted chromosomal region of hg18. I now have output with an example as follows:

    HWUSI-EAS211R_5:2:6:90:1384 131 chr9_22054888_22134171 1 99 35M * 0 172 ATCCTTGGAGTTGTGAGGATTTAATGCAATTGTCT WWWWWWWWWWWWWWWWWVWWWWWWWUWWWVUUUUT MF:i:18 AM:i:99 SM:i:99 NM:i:1 UQ:i:30 H0:i:1 H1:i:0

    My question is this: what is going on with the tags NM, H0 and H1 (in bold above). NM:i:1 should mean that the read has one mismatch to the genome, which seems to be true if I blat back to the reference. However H0:i:1 should mean that there is an exact match to the genome, and H1:i:1 should mean that there are no matches with distance 1 from the reference. Am I misinterpreting the tags or is this really inconsistent? If it is inconsistent, where is the bug (MAQ or maq2sam-long) and how can I fix it?

    --Will

  • #2
    Tags NM, H0 and H1 are quite confusing, I discuss it in this thread, please take a look,
    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


    The read you listed above could be interpreted in two ways:
    NM:i:1 H0:1:1 H1:i:0
    1. Unique mapping has 1 mismatch, the number of hit with no mismatch is 1, the number of hit with 1 mismatch is 0. NM field contradicts H0 field.

    2. Unique mapping has 1 mismatch, the number of hits of best hit is 1, the number of suboptimal hits with 1 more mismatch is 0. This is explainable.

    However, I 've come across this as well:
    MF:i:32 AM:i:47 NM:i:2 UQ:i:60 H0:i:0 H1:i:1

    According to the second explanation, H0 should be 1, and the best hit has 2 mismatches. I may not consent this is a bug of maq or maq2sam-long, but the ambiguous definition of tags.

    Comment


    • #3
      I agree that this is abiguous, but looking at the documentation is even more concerning. NM, by this definition, should refer to the particular alignment being reported not just to unique alignments. In this case the H1:i:0 tag is a misreport, because it implies that there are no reads with 1-difference from the reference, but simultaneously is itself reporting a read 1-difference from the reference.

      See: http://samtools.sourceforge.net/SAM1.pdf - page 7

      Comment


      • #4
        It still mixed me up, I thought NM (edit distance) is more or less similar to "number of mismatches of the best hit" defined in MAQ manual. I ever parsed the maq output (.map) file, the distribution of field "number of mismatches of the best hit" I count is exactly same as the distribution I count for NM tag from the sam file converted by same map file.

        Comment


        • #5
          Yea I am getting the same result. I think that's because maq2sam-long is returning only the best hit when it outputs in SAM format, so NM is equal to the number of matches in the best hit AND that hit. Clearly NM is consistent with the entire rest of the line. However H1:i and H0:i are not, and I believe need fixing in maq2sam-long or in maq (but probably not in maq)

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          31 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          32 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          53 views
          0 likes
          Last Post seqadmin  
          Working...
          X