SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to set parameters for mapping reads to a bacterial genome using "bwa-mem/bowtie"? joyjane88 RNA Sequencing 1 11-13-2013 02:54 AM
tophat marking "read in a proper pair" wrongly jay2008 Bioinformatics 3 08-12-2013 12:38 AM
"refine gapped alignments... Segmentation fault" while using bwa sampe dd_genome Bioinformatics 3 01-03-2013 10:42 AM
What is the meaning of "moved end" in BWA pairing process. hosang.jeon Bioinformatics 0 07-12-2012 04:50 PM
BWA sampe mapping result, what is "PROPER PAIR"? hl450 Bioinformatics 5 08-06-2010 01:26 PM

Reply
 
Thread Tools
Old 08-14-2014, 06:56 PM   #1
kartong
Junior Member
 
Location: Asia

Join Date: Mar 2014
Posts: 4
Default BWA-MEM non-primary alignments "false proper pairing"

I was running BWA-MEM (Version: 0.7.7-r441) on my illumina sequencing data with the "-a" option to report all alignments and noticed something really weird with the non-primary alignments. Some of the non-primary alignments were given the flag "Properly paired" when the orientation of the two reads doesn't seem to be consistent with what you expect of a proper pair for a illumina dataset (reads highlighted in blue).

Read 1
HWI-ST513_0151:1:1101:4168:2366#0 99 2 89619552 0 76M = 89619801 287 TTCTGCTGATACCAATTTAAATCGCTGCCAATGTTCTGACTTGCCCGGCAAGTGATGGTGACTCCGTCTCCCACAG H@@@@DDD@DDDDEDDDDDDDDDDDBDDDFHDDHDEEDHDDHHHHGHFEHHHHHHHHHGHHHHHHHGHHHHHGGG? NM:i:5 MD:Z:22A5T4C30T6T4 AS:i:51 XS:i:51
HWI-ST513_0151:1:1101:4168:2366#0 371 2 89901545 0 76M = 89619801 -281783 * * NM:i:5 MD:Z:4A6A30G4A5T22 AS:i:51
HWI-ST513_0151:1:1101:4168:2366#0 355 2 89567927 0 76M = 89619801 51912 * * NM:i:8 MD:Z:22A1T3T5C6C3T18T6T4 AS:i:36
HWI-ST513_0151:1:1101:4168:2366#0 371 2 89953111 0 76M = 89619801 -333349 * * NM:i:8 MD:Z:4A6A18A3G6G5A3A1T22 AS:i:36
HWI-ST513_0151:1:1101:4168:2366#0 371 2 90458425 0 76M = 89619801 -838663 * * NM:i:9 MD:Z:4A6A22G6G0G4A3A0T0A22 AS:i:31

Read 2
HWI-ST513_0151:1:1101:4168:2366#0 147 2 89619801 0 38S38M = 89619552 -287 GAGGAGGGAGACTGGGTCATCAGGATGTCACATCTGGCACCTCGGAGCCAGAGTAGCAGGAGCCCCAGGAGCTGAG E;;??2F?DECEBEE@ACA?@.@;==@>A<A:>A<B:A7DD?<GED>FFDFFHHHCHFHHHHHHHFGHHHHHHHHH NM:i:0 MD:Z:38 AS:i:38 XS:i:38 SA:Z:2,89399608,-,42M34S,0,2;
HWI-ST513_0151:1:1101:4168:2366#0 387 2 89901334 0 38M38H = 89619552 -281783 * * NM:i:0 MD:Z:38 AS:i:38
HWI-ST513_0151:1:1101:4168:2366#0 387 2 114163992 0 38M38H = 89619552 -24544441 * * NM:i:1 MD:Z:22G15 AS:i:33
HWI-ST513_0151:1:1101:4168:2366#0 2195 2 89399608 0 42M34H = 89619552 219904 GAGGAGGGAGACTGGGTCATCAGGATGTCACATCTGGCACCT E;;??2F?DECEBEE@ACA?@.@;==@>A<A:>A<B:A7DD? NM:i:2 MD:Z:5T15T20 AS:i:32 XS:i:32 SA:Z:2,89619801,-,38S38M,0,0;


Specifically, if you look at the second alignment of each read, you'll notice that they have a flag of 371 and 387 indicating that the two reads align on the same strand. However, for an illumina paired end dataset, the expectation is for a proper pair to run in the following manner:
-------> <-------


(FYI: I have not noticed such discrepancies with the primary alignments though.)

Thus, is this a case where BWA-MEM only checks for the mate-pair distance and not the orientation of the reads for the non-primary alignments? (i.e. we cannot trust the "properly-paired" flag for the non-primary alignments??)
kartong is offline   Reply With Quote
Old 08-15-2014, 07:37 AM   #2
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

Quote:
Originally Posted by kartong View Post
I was running BWA-MEM (Version: 0.7.7-r441) on my illumina sequencing data with the "-a" option to report all alignments and noticed something really weird with the non-primary alignments. Some of the non-primary alignments were given the flag "Properly paired" when the orientation of the two reads doesn't seem to be consistent with what you expect of a proper pair for a illumina dataset (reads highlighted in blue).

Read 1
HWI-ST513_0151:1:1101:4168:2366#0 99 2 89619552 0 76M = 89619801 287 TTCTGCTGATACCAATTTAAATCGCTGCCAATGTTCTGACTTGCCCGGCAAGTGATGGTGACTCCGTCTCCCACAG H@@@@DDD@DDDDEDDDDDDDDDDDBDDDFHDDHDEEDHDDHHHHGHFEHHHHHHHHHGHHHHHHHGHHHHHGGG? NM:i:5 MD:Z:22A5T4C30T6T4 AS:i:51 XS:i:51
HWI-ST513_0151:1:1101:4168:2366#0 371 2 89901545 0 76M = 89619801 -281783 * * NM:i:5 MD:Z:4A6A30G4A5T22 AS:i:51
HWI-ST513_0151:1:1101:4168:2366#0 355 2 89567927 0 76M = 89619801 51912 * * NM:i:8 MD:Z:22A1T3T5C6C3T18T6T4 AS:i:36
HWI-ST513_0151:1:1101:4168:2366#0 371 2 89953111 0 76M = 89619801 -333349 * * NM:i:8 MD:Z:4A6A18A3G6G5A3A1T22 AS:i:36
HWI-ST513_0151:1:1101:4168:2366#0 371 2 90458425 0 76M = 89619801 -838663 * * NM:i:9 MD:Z:4A6A22G6G0G4A3A0T0A22 AS:i:31

Read 2
HWI-ST513_0151:1:1101:4168:2366#0 147 2 89619801 0 38S38M = 89619552 -287 GAGGAGGGAGACTGGGTCATCAGGATGTCACATCTGGCACCTCGGAGCCAGAGTAGCAGGAGCCCCAGGAGCTGAG E;;??2F?DECEBEE@ACA?@.@;==@>A<A:>A<B:A7DD?<GED>FFDFFHHHCHFHHHHHHHFGHHHHHHHHH NM:i:0 MD:Z:38 AS:i:38 XS:i:38 SA:Z:2,89399608,-,42M34S,0,2;
HWI-ST513_0151:1:1101:4168:2366#0 387 2 89901334 0 38M38H = 89619552 -281783 * * NM:i:0 MD:Z:38 AS:i:38
HWI-ST513_0151:1:1101:4168:2366#0 387 2 114163992 0 38M38H = 89619552 -24544441 * * NM:i:1 MD:Z:22G15 AS:i:33
HWI-ST513_0151:1:1101:4168:2366#0 2195 2 89399608 0 42M34H = 89619552 219904 GAGGAGGGAGACTGGGTCATCAGGATGTCACATCTGGCACCT E;;??2F?DECEBEE@ACA?@.@;==@>A<A:>A<B:A7DD? NM:i:2 MD:Z:5T15T20 AS:i:32 XS:i:32 SA:Z:2,89619801,-,38S38M,0,0;


Specifically, if you look at the second alignment of each read, you'll notice that they have a flag of 371 and 387 indicating that the two reads align on the same strand. However, for an illumina paired end dataset, the expectation is for a proper pair to run in the following manner:
-------> <-------


(FYI: I have not noticed such discrepancies with the primary alignments though.)

Thus, is this a case where BWA-MEM only checks for the mate-pair distance and not the orientation of the reads for the non-primary alignments? (i.e. we cannot trust the "properly-paired" flag for the non-primary alignments??)
The order of secondary alignments reported is not meant to imply pairing. You have to look at the position of the aligned read and the position of the mate alignment to determine what it is "pairing with". See that all of the various read 1 alignments are being "paired" with the primary alignment of read 2 at position of 89619801, and all of the read 2 alignments are being "paired" with the primary alignment of read 1 at position 89619552.
kmcarr is offline   Reply With Quote
Old 08-15-2014, 03:44 PM   #3
kartong
Junior Member
 
Location: Asia

Join Date: Mar 2014
Posts: 4
Default

Yes, I do understand that the order of the secondary alignments do not imply pairing.

But if you were to look at this secondary alignment, you'll notice that it has a flag of 371 (read mapped in proper pair + read reverse strand + mate reverse strand) which does not make sense at all for a "proper pair"
HWI-ST513_0151:1:1101:4168:2366#0 371 2 89901545 0 76M = 89619801 -281783 * * NM:i:5 MD:Z:4A6A30G4A5T22 AS:i:51

You're right to say that this secondary is paired to the primary alignment. However, if you were to look at the mate alignment "= 89619801", it has a flag of 147 which indicates that it is mapped to the reverse strand as per the secondary alignment I listed above. Thus, this still does not quite make sense.
kartong is offline   Reply With Quote
Old 08-18-2014, 06:08 AM   #4
WhatsOEver
Senior Member
 
Location: Germany

Join Date: Apr 2012
Posts: 215
Default

In my bwa version (0.7.8) the manual says for the "a" parameter:

Quote:
-a output all alignments for SE or unpaired PE
As the parameter is not to be used for paired end data, I think some errors are likely to occur
WhatsOEver is offline   Reply With Quote
Old 08-18-2014, 06:39 PM   #5
kartong
Junior Member
 
Location: Asia

Join Date: Mar 2014
Posts: 4
Default

I've noticed the description when I was using the "-a" parameter. What you said, makes sense. If it was meant to be an "unpaired alignment", I just don't get why the "paired properly" flag was raised. It is probably an error like what you said.
kartong is offline   Reply With Quote
Old 08-18-2014, 11:29 PM   #6
WhatsOEver
Senior Member
 
Location: Germany

Join Date: Apr 2012
Posts: 215
Default

Well, it would be necessary to look into the code to find the problem that is causing the error. But I think the developers were aware of it (as "real" PE reads are excluded in the description) and either didn't find a good solution or didn't think it would be required?! (I'm just speculating, of course )
WhatsOEver is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:12 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO