Hi,
I am using BWA-mem for split read alignment for my single end genomic DNA-seq from Illumina. I know that BWA uses SA tag for marking chimeric reads. When I manually BLAST individual reads with the SA tag I can clearly verify that they are indeed chimeras. However, I could not find details about the SA tag itself. What information is encoded in the SA field? I am posting an example of a chimeric read that maps to two separate genomic locations within the same contig (scf7180000067989)
HWI-ST387:139:C03WJABXX:5:2108:15315:193815 16 scf7180000067989 85156 60 60M41S * 0 0 TTGAAGTCAAGAAAGTGGTAAAGAGAGATTAATAGGGGTATCTCAGCTACAACAAATATTATATTAAATTAAATGGTTAATCTTGCTTTGCTCACCATAAA * NM:i:2 MD:Z:31G1C26 AS:i:50 XS:i:0 SA:Z:scf7180000067989,85273,-,54S47M,60,1;
HWI-ST387:139:C03WJABXX:5:2108:15315:193815 272 scf7180000067989 85273 60 54H47M * 0 0 AATATTATATTAAATTAAATGGTTAATCTTGCTTTGCTCACCATAAA * NM:i:1 MD:Z:11T35 AS:i:42 XS:i:22 SA:Z:scf7180000067989,85156,-,60M41S,60,2;
I am expecting a lot of genome rearrangements in the sample, so ultimately I want to isolate these reads that map to variant locations and identify the regions of microhomology, which could help identify the breakpoint. I am new to Bioinformatics so any inputs would be great.
Thanks in advance!
I am using BWA-mem for split read alignment for my single end genomic DNA-seq from Illumina. I know that BWA uses SA tag for marking chimeric reads. When I manually BLAST individual reads with the SA tag I can clearly verify that they are indeed chimeras. However, I could not find details about the SA tag itself. What information is encoded in the SA field? I am posting an example of a chimeric read that maps to two separate genomic locations within the same contig (scf7180000067989)
HWI-ST387:139:C03WJABXX:5:2108:15315:193815 16 scf7180000067989 85156 60 60M41S * 0 0 TTGAAGTCAAGAAAGTGGTAAAGAGAGATTAATAGGGGTATCTCAGCTACAACAAATATTATATTAAATTAAATGGTTAATCTTGCTTTGCTCACCATAAA * NM:i:2 MD:Z:31G1C26 AS:i:50 XS:i:0 SA:Z:scf7180000067989,85273,-,54S47M,60,1;
HWI-ST387:139:C03WJABXX:5:2108:15315:193815 272 scf7180000067989 85273 60 54H47M * 0 0 AATATTATATTAAATTAAATGGTTAATCTTGCTTTGCTCACCATAAA * NM:i:1 MD:Z:11T35 AS:i:42 XS:i:22 SA:Z:scf7180000067989,85156,-,60M41S,60,2;
I am expecting a lot of genome rearrangements in the sample, so ultimately I want to isolate these reads that map to variant locations and identify the regions of microhomology, which could help identify the breakpoint. I am new to Bioinformatics so any inputs would be great.
Thanks in advance!
Comment