I was running BWA-MEM (Version: 0.7.7-r441) on my illumina sequencing data with the "-a" option to report all alignments and noticed something really weird with the non-primary alignments. Some of the non-primary alignments were given the flag "Properly paired" when the orientation of the two reads doesn't seem to be consistent with what you expect of a proper pair for a illumina dataset (reads highlighted in blue).
Read 1
HWI-ST513_0151:1:1101:4168:2366#0 99 2 89619552 0 76M = 89619801 287 TTCTGCTGATACCAATTTAAATCGCTGCCAATGTTCTGACTTGCCCGGCAAGTGATGGTGACTCCGTCTCCCACAG H@@@@DDD@DDDDEDDDDDDDDDDDBDDDFHDDHDEEDHDDHHHHGHFEHHHHHHHHHGHHHHHHHGHHHHHGGG? NM:i:5 MD:Z:22A5T4C30T6T4 AS:i:51 XS:i:51
HWI-ST513_0151:1:1101:4168:2366#0 371 2 89901545 0 76M = 89619801 -281783 * * NM:i:5 MD:Z:4A6A30G4A5T22 AS:i:51
HWI-ST513_0151:1:1101:4168:2366#0 355 2 89567927 0 76M = 89619801 51912 * * NM:i:8 MD:Z:22A1T3T5C6C3T18T6T4 AS:i:36
HWI-ST513_0151:1:1101:4168:2366#0 371 2 89953111 0 76M = 89619801 -333349 * * NM:i:8 MD:Z:4A6A18A3G6G5A3A1T22 AS:i:36
HWI-ST513_0151:1:1101:4168:2366#0 371 2 90458425 0 76M = 89619801 -838663 * * NM:i:9 MD:Z:4A6A22G6G0G4A3A0T0A22 AS:i:31
Read 2
HWI-ST513_0151:1:1101:4168:2366#0 147 2 89619801 0 38S38M = 89619552 -287 GAGGAGGGAGACTGGGTCATCAGGATGTCACATCTGGCACCTCGGAGCCAGAGTAGCAGGAGCCCCAGGAGCTGAG E;;??2F?DECEBEE@ACA?@.@;==@>A<A:>A<B:A7DD?<GED>FFDFFHHHCHFHHHHHHHFGHHHHHHHHH NM:i:0 MD:Z:38 AS:i:38 XS:i:38 SA:Z:2,89399608,-,42M34S,0,2;
HWI-ST513_0151:1:1101:4168:2366#0 387 2 89901334 0 38M38H = 89619552 -281783 * * NM:i:0 MD:Z:38 AS:i:38
HWI-ST513_0151:1:1101:4168:2366#0 387 2 114163992 0 38M38H = 89619552 -24544441 * * NM:i:1 MD:Z:22G15 AS:i:33
HWI-ST513_0151:1:1101:4168:2366#0 2195 2 89399608 0 42M34H = 89619552 219904 GAGGAGGGAGACTGGGTCATCAGGATGTCACATCTGGCACCT E;;??2F?DECEBEE@ACA?@.@;==@>A<A:>A<B:A7DD? NM:i:2 MD:Z:5T15T20 AS:i:32 XS:i:32 SA:Z:2,89619801,-,38S38M,0,0;
Specifically, if you look at the second alignment of each read, you'll notice that they have a flag of 371 and 387 indicating that the two reads align on the same strand. However, for an illumina paired end dataset, the expectation is for a proper pair to run in the following manner:
-------> <-------
(FYI: I have not noticed such discrepancies with the primary alignments though.)
Thus, is this a case where BWA-MEM only checks for the mate-pair distance and not the orientation of the reads for the non-primary alignments? (i.e. we cannot trust the "properly-paired" flag for the non-primary alignments??)
Read 1
HWI-ST513_0151:1:1101:4168:2366#0 99 2 89619552 0 76M = 89619801 287 TTCTGCTGATACCAATTTAAATCGCTGCCAATGTTCTGACTTGCCCGGCAAGTGATGGTGACTCCGTCTCCCACAG H@@@@DDD@DDDDEDDDDDDDDDDDBDDDFHDDHDEEDHDDHHHHGHFEHHHHHHHHHGHHHHHHHGHHHHHGGG? NM:i:5 MD:Z:22A5T4C30T6T4 AS:i:51 XS:i:51
HWI-ST513_0151:1:1101:4168:2366#0 371 2 89901545 0 76M = 89619801 -281783 * * NM:i:5 MD:Z:4A6A30G4A5T22 AS:i:51
HWI-ST513_0151:1:1101:4168:2366#0 355 2 89567927 0 76M = 89619801 51912 * * NM:i:8 MD:Z:22A1T3T5C6C3T18T6T4 AS:i:36
HWI-ST513_0151:1:1101:4168:2366#0 371 2 89953111 0 76M = 89619801 -333349 * * NM:i:8 MD:Z:4A6A18A3G6G5A3A1T22 AS:i:36
HWI-ST513_0151:1:1101:4168:2366#0 371 2 90458425 0 76M = 89619801 -838663 * * NM:i:9 MD:Z:4A6A22G6G0G4A3A0T0A22 AS:i:31
Read 2
HWI-ST513_0151:1:1101:4168:2366#0 147 2 89619801 0 38S38M = 89619552 -287 GAGGAGGGAGACTGGGTCATCAGGATGTCACATCTGGCACCTCGGAGCCAGAGTAGCAGGAGCCCCAGGAGCTGAG E;;??2F?DECEBEE@ACA?@.@;==@>A<A:>A<B:A7DD?<GED>FFDFFHHHCHFHHHHHHHFGHHHHHHHHH NM:i:0 MD:Z:38 AS:i:38 XS:i:38 SA:Z:2,89399608,-,42M34S,0,2;
HWI-ST513_0151:1:1101:4168:2366#0 387 2 89901334 0 38M38H = 89619552 -281783 * * NM:i:0 MD:Z:38 AS:i:38
HWI-ST513_0151:1:1101:4168:2366#0 387 2 114163992 0 38M38H = 89619552 -24544441 * * NM:i:1 MD:Z:22G15 AS:i:33
HWI-ST513_0151:1:1101:4168:2366#0 2195 2 89399608 0 42M34H = 89619552 219904 GAGGAGGGAGACTGGGTCATCAGGATGTCACATCTGGCACCT E;;??2F?DECEBEE@ACA?@.@;==@>A<A:>A<B:A7DD? NM:i:2 MD:Z:5T15T20 AS:i:32 XS:i:32 SA:Z:2,89619801,-,38S38M,0,0;
Specifically, if you look at the second alignment of each read, you'll notice that they have a flag of 371 and 387 indicating that the two reads align on the same strand. However, for an illumina paired end dataset, the expectation is for a proper pair to run in the following manner:
-------> <-------
(FYI: I have not noticed such discrepancies with the primary alignments though.)
Thus, is this a case where BWA-MEM only checks for the mate-pair distance and not the orientation of the reads for the non-primary alignments? (i.e. we cannot trust the "properly-paired" flag for the non-primary alignments??)
Comment