Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA unique mapping

    The question of how to get uniquely mapped reads from BWA-MEM really gives me a headache. I've searched online and found two ways:
    1. Filter by mapq (different thresholds have been suggested: >=1, >=4, >=10, etc)
    2. Filter by requiring AS score > XS score (or furthermore requiring a ratio)

    My basic understanding is:
    1. mapq=-10log(P). When mapq=3, P=0.5. So I would think mapq>=4 would be a lower bound threshold for unique mapping
    2. AS is alignment score, while XS is suboptimal alignment score, so AS > XS should be a sound threshold for unique mapping

    However, the conflicts I got are:

    1. After I selected the alignment with AS > XS, their mapq can be down to 1 (don't know if it can be down to 0 because I started with alignment with mapq>0). For example, in the following two reads:
    HISEQ-MFG:330:C6ATHACXX:8:2115:20746:12090 99 chr10 512755 1 51M = 512885 181 AATCAAAACCACTATGAGATATCATCGCACACCAGTTAGAATGGCAATCAT CCCFFFFFHHHHHJJJJJJJJJJJJJIJJJJJJJJJJJJIJJJJJJJJJJJ NM:i:0 AS:i:51 XS:i:46 RG:Z:44188
    HISEQ-MFG:330:C6ATHACXX:8:2106:17664:83684 99 chr10 70748 27 51M = 70878 181 ACAATGCAAATCAAGTTCATTCTCACTGTGCTTGATTAACCTTCAAAATTG C@@FDFFEHHFFHIGBFGIIIIJGGIGHHIHIIGGIGGIGIIJIIJIJGI@ NM:i:0 AS:i:51 XS:i:46 RG:Z:44188
    Their AS and XS are the same, but why do their mapq differ so much?

    2. After I selected the alignment with mapq >=4, their AS can be the same as XS. For example, in the following three reads:
    HISEQ-MFG:330:C6ATHACXX:8:1101:1320:1942 147 chr8 33349191 40 51M = 33349000 -242 GTCCAGGCTGGAGTGCAGTGGTGCGATCTTGGCTCACTGCAACCTCTGCTT DAIIHGHFFGFFGHHHGIIFCC?FCGHGEGHCJGIHJIHHHHHFFFDFB@@ NM:i:0 AS:i:51 XS:i:49 RG:Z:44187
    HISEQ-MFG:330:C6ATHACXX:8:1101:3430:1929 163 chr10 100935098 40 51M = 100935153 106 CCACAGAGCCCAGCAAGCTAAGATCCACTGGCTTGAAATTCTCGCTGCCAG CCCFFFFFFHHHHJJJJJJJJJJJJJJJJJJJJJGIHJGHHIIGGGGIHGE NM:i:0 AS:i:51 XS:i:51 RG:Z:44187
    HISEQ-MFG:330:C6ATHACXX:8:1101:2657:1969 147 chr8 95238495 40 1S50M = 95238418 -127 TTCTCTCTCTCTCTCTCTCTCTCTCTCTCACACACACACACACACACACAC ;HGJIIJIIHIHIGHGHHHHIGHHIHHDGEHHHHHFIHHFHHHDEDDDB?@ NM:i:0 AS:i:50 XS:i:51 RG:Z:44187
    They all have mapq=40. But how is it possible that AS = XS in the 2nd read, and even worse, AS < XS in the 3rd read?

    I've spent days in this problem and will almost give up. I'll probably require both of mapq>=4 and AS>XS for unique mapping.
    Last edited by metheuse; 04-10-2015, 10:46 AM.

  • #2
    The problem you're having is that you're conceptualizing things in terms of single-end reads, but have a paired-end dataset

    Let's just take the example of the MAPQ 40 alignments with XS>=AS. This can happen when the other mate in the pair has a confident alignment and either (A) the other considered alignments for the read in question are on a different chromosome or (B) would violate the assumptions of alignment-pair orientation and/or fragment length. So in the example of HISEQ-MFG:330:C6ATHACXX:8:1101:2657:1969, if the other valid alignment is on chr1 or even chr8:20000000 and those have higher alignment scores, it doesn't much matter since the other mate provides a reliable anchor.

    Comment


    • #3
      Originally posted by dpryan View Post
      The problem you're having is that you're conceptualizing things in terms of single-end reads, but have a paired-end dataset

      Let's just take the example of the MAPQ 40 alignments with XS>=AS. This can happen when the other mate in the pair has a confident alignment and either (A) the other considered alignments for the read in question are on a different chromosome or (B) would violate the assumptions of alignment-pair orientation and/or fragment length. So in the example of HISEQ-MFG:330:C6ATHACXX:8:1101:2657:1969, if the other valid alignment is on chr1 or even chr8:20000000 and those have higher alignment scores, it doesn't much matter since the other mate provides a reliable anchor.
      Thanks! Your explanation makes sense.
      Then what would be a good threshold for unique mapping in PE?

      Comment


      • #4
        It depends a bit on context. For BS-seq, I use 10. For RNAseq, I use 5. For SNP calling, 10 or 20 is probably good, though I don't do that enough to have a really good threshold there.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Advancing Precision Medicine for Rare Diseases in Children
          by seqadmin




          Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
          12-16-2024, 07:57 AM
        • seqadmin
          Recent Advances in Sequencing Technologies
          by seqadmin



          Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

          Long-Read Sequencing
          Long-read sequencing has seen remarkable advancements,...
          12-02-2024, 01:49 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 12-17-2024, 10:28 AM
        0 responses
        36 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-13-2024, 08:24 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-12-2024, 07:41 AM
        0 responses
        37 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-11-2024, 07:45 AM
        0 responses
        46 views
        0 likes
        Last Post seqadmin  
        Working...
        X