The question of how to get uniquely mapped reads from BWA-MEM really gives me a headache. I've searched online and found two ways:
1. Filter by mapq (different thresholds have been suggested: >=1, >=4, >=10, etc)
2. Filter by requiring AS score > XS score (or furthermore requiring a ratio)
My basic understanding is:
1. mapq=-10log(P). When mapq=3, P=0.5. So I would think mapq>=4 would be a lower bound threshold for unique mapping
2. AS is alignment score, while XS is suboptimal alignment score, so AS > XS should be a sound threshold for unique mapping
However, the conflicts I got are:
1. After I selected the alignment with AS > XS, their mapq can be down to 1 (don't know if it can be down to 0 because I started with alignment with mapq>0). For example, in the following two reads:
Their AS and XS are the same, but why do their mapq differ so much?
2. After I selected the alignment with mapq >=4, their AS can be the same as XS. For example, in the following three reads:
They all have mapq=40. But how is it possible that AS = XS in the 2nd read, and even worse, AS < XS in the 3rd read?
I've spent days in this problem and will almost give up. I'll probably require both of mapq>=4 and AS>XS for unique mapping.
1. Filter by mapq (different thresholds have been suggested: >=1, >=4, >=10, etc)
2. Filter by requiring AS score > XS score (or furthermore requiring a ratio)
My basic understanding is:
1. mapq=-10log(P). When mapq=3, P=0.5. So I would think mapq>=4 would be a lower bound threshold for unique mapping
2. AS is alignment score, while XS is suboptimal alignment score, so AS > XS should be a sound threshold for unique mapping
However, the conflicts I got are:
1. After I selected the alignment with AS > XS, their mapq can be down to 1 (don't know if it can be down to 0 because I started with alignment with mapq>0). For example, in the following two reads:
HISEQ-MFG:330:C6ATHACXX:8:2115:20746:12090 99 chr10 512755 1 51M = 512885 181 AATCAAAACCACTATGAGATATCATCGCACACCAGTTAGAATGGCAATCAT CCCFFFFFHHHHHJJJJJJJJJJJJJIJJJJJJJJJJJJIJJJJJJJJJJJ NM:i:0 AS:i:51 XS:i:46 RG:Z:44188
HISEQ-MFG:330:C6ATHACXX:8:2106:17664:83684 99 chr10 70748 27 51M = 70878 181 ACAATGCAAATCAAGTTCATTCTCACTGTGCTTGATTAACCTTCAAAATTG C@@FDFFEHHFFHIGBFGIIIIJGGIGHHIHIIGGIGGIGIIJIIJIJGI@ NM:i:0 AS:i:51 XS:i:46 RG:Z:44188
2. After I selected the alignment with mapq >=4, their AS can be the same as XS. For example, in the following three reads:
HISEQ-MFG:330:C6ATHACXX:8:1101:1320:1942 147 chr8 33349191 40 51M = 33349000 -242 GTCCAGGCTGGAGTGCAGTGGTGCGATCTTGGCTCACTGCAACCTCTGCTT DAIIHGHFFGFFGHHHGIIFCC?FCGHGEGHCJGIHJIHHHHHFFFDFB@@ NM:i:0 AS:i:51 XS:i:49 RG:Z:44187
HISEQ-MFG:330:C6ATHACXX:8:1101:3430:1929 163 chr10 100935098 40 51M = 100935153 106 CCACAGAGCCCAGCAAGCTAAGATCCACTGGCTTGAAATTCTCGCTGCCAG CCCFFFFFFHHHHJJJJJJJJJJJJJJJJJJJJJGIHJGHHIIGGGGIHGE NM:i:0 AS:i:51 XS:i:51 RG:Z:44187
HISEQ-MFG:330:C6ATHACXX:8:1101:2657:1969 147 chr8 95238495 40 1S50M = 95238418 -127 TTCTCTCTCTCTCTCTCTCTCTCTCTCTCACACACACACACACACACACAC ;HGJIIJIIHIHIGHGHHHHIGHHIHHDGEHHHHHFIHHFHHHDEDDDB?@ NM:i:0 AS:i:50 XS:i:51 RG:Z:44187
I've spent days in this problem and will almost give up. I'll probably require both of mapq>=4 and AS>XS for unique mapping.
Comment