Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA unique mapping

    The question of how to get uniquely mapped reads from BWA-MEM really gives me a headache. I've searched online and found two ways:
    1. Filter by mapq (different thresholds have been suggested: >=1, >=4, >=10, etc)
    2. Filter by requiring AS score > XS score (or furthermore requiring a ratio)

    My basic understanding is:
    1. mapq=-10log(P). When mapq=3, P=0.5. So I would think mapq>=4 would be a lower bound threshold for unique mapping
    2. AS is alignment score, while XS is suboptimal alignment score, so AS > XS should be a sound threshold for unique mapping

    However, the conflicts I got are:

    1. After I selected the alignment with AS > XS, their mapq can be down to 1 (don't know if it can be down to 0 because I started with alignment with mapq>0). For example, in the following two reads:
    HISEQ-MFG:330:C6ATHACXX:8:2115:20746:12090 99 chr10 512755 1 51M = 512885 181 AATCAAAACCACTATGAGATATCATCGCACACCAGTTAGAATGGCAATCAT CCCFFFFFHHHHHJJJJJJJJJJJJJIJJJJJJJJJJJJIJJJJJJJJJJJ NM:i:0 AS:i:51 XS:i:46 RG:Z:44188
    HISEQ-MFG:330:C6ATHACXX:8:2106:17664:83684 99 chr10 70748 27 51M = 70878 181 ACAATGCAAATCAAGTTCATTCTCACTGTGCTTGATTAACCTTCAAAATTG C@@FDFFEHHFFHIGBFGIIIIJGGIGHHIHIIGGIGGIGIIJIIJIJGI@ NM:i:0 AS:i:51 XS:i:46 RG:Z:44188
    Their AS and XS are the same, but why do their mapq differ so much?

    2. After I selected the alignment with mapq >=4, their AS can be the same as XS. For example, in the following three reads:
    HISEQ-MFG:330:C6ATHACXX:8:1101:1320:1942 147 chr8 33349191 40 51M = 33349000 -242 GTCCAGGCTGGAGTGCAGTGGTGCGATCTTGGCTCACTGCAACCTCTGCTT DAIIHGHFFGFFGHHHGIIFCC?FCGHGEGHCJGIHJIHHHHHFFFDFB@@ NM:i:0 AS:i:51 XS:i:49 RG:Z:44187
    HISEQ-MFG:330:C6ATHACXX:8:1101:3430:1929 163 chr10 100935098 40 51M = 100935153 106 CCACAGAGCCCAGCAAGCTAAGATCCACTGGCTTGAAATTCTCGCTGCCAG CCCFFFFFFHHHHJJJJJJJJJJJJJJJJJJJJJGIHJGHHIIGGGGIHGE NM:i:0 AS:i:51 XS:i:51 RG:Z:44187
    HISEQ-MFG:330:C6ATHACXX:8:1101:2657:1969 147 chr8 95238495 40 1S50M = 95238418 -127 TTCTCTCTCTCTCTCTCTCTCTCTCTCTCACACACACACACACACACACAC ;HGJIIJIIHIHIGHGHHHHIGHHIHHDGEHHHHHFIHHFHHHDEDDDB?@ NM:i:0 AS:i:50 XS:i:51 RG:Z:44187
    They all have mapq=40. But how is it possible that AS = XS in the 2nd read, and even worse, AS < XS in the 3rd read?

    I've spent days in this problem and will almost give up. I'll probably require both of mapq>=4 and AS>XS for unique mapping.
    Last edited by metheuse; 04-10-2015, 10:46 AM.

  • #2
    The problem you're having is that you're conceptualizing things in terms of single-end reads, but have a paired-end dataset

    Let's just take the example of the MAPQ 40 alignments with XS>=AS. This can happen when the other mate in the pair has a confident alignment and either (A) the other considered alignments for the read in question are on a different chromosome or (B) would violate the assumptions of alignment-pair orientation and/or fragment length. So in the example of HISEQ-MFG:330:C6ATHACXX:8:1101:2657:1969, if the other valid alignment is on chr1 or even chr8:20000000 and those have higher alignment scores, it doesn't much matter since the other mate provides a reliable anchor.

    Comment


    • #3
      Originally posted by dpryan View Post
      The problem you're having is that you're conceptualizing things in terms of single-end reads, but have a paired-end dataset

      Let's just take the example of the MAPQ 40 alignments with XS>=AS. This can happen when the other mate in the pair has a confident alignment and either (A) the other considered alignments for the read in question are on a different chromosome or (B) would violate the assumptions of alignment-pair orientation and/or fragment length. So in the example of HISEQ-MFG:330:C6ATHACXX:8:1101:2657:1969, if the other valid alignment is on chr1 or even chr8:20000000 and those have higher alignment scores, it doesn't much matter since the other mate provides a reliable anchor.
      Thanks! Your explanation makes sense.
      Then what would be a good threshold for unique mapping in PE?

      Comment


      • #4
        It depends a bit on context. For BS-seq, I use 10. For RNAseq, I use 5. For SNP calling, 10 or 20 is probably good, though I don't do that enough to have a really good threshold there.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        24 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        25 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        21 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Working...
        X