I ran TopHat 1.0.14 and got the mapping results. Now I'm trying to find out which positions are mismatches when mapping to the reference genome for each read. The sam output of TopHat is like this:
I checked the format manual of SAM. "NM" in TAG part shows the number of differences between the query sequence and the reference sequence. That means in my example there are two mismatches. However, I cannot find out which two positions of this read are the exact mismatch ones in this output. In SAM format, there is one kind of TAG "MD" which shows the mismatch positions but I cannot see any in the output of TopHat. Can anyone tell me how to do?
Thank you very much!
Code:
[ID] 16 chr1 14509 0 76M * 0 0 [Query_Seq] [Quality_Score] NM:i:2 NH:i:7 CC:Z:chr12 CP:i:91041
Thank you very much!
Comment