SEQanswers (
-   Bioinformatics (
-   -   BWA behaviour with Mate Pair data + Multi read mapping (

apratap 06-23-2011 09:42 AM

BWA behaviour with Mate Pair data + Multi read mapping
Hi All

PS: This message was also posted on BWA mailing list but I did not get any response.

I am not able to logically understand why BWA is/not able to work natively
with mate pair data. Second question is whats the best work around if I
want to obtain multiple read mappings (if any) for a read. I am also pasting
the result of aligning mate pair data both before and after reverse
complimenting. The mapping gets better after rev comp but I dont quite
understand why. Appreciate you input on both my questions.

A. Before Reverse comp : Mate pair data <----- ------>

356766 + 0 in total (QC-passed reads + QC-failed reads)
282236 + 0 mapped (79.11%:nan%)
356766 + 0 paired in sequencing
178383 + 0 read1
178383 + 0 read2
126094 + 0 properly paired (35.34%:nan%)

B. After Reverse Comp : Mate pair data --------------->

356766 + 0 in total (QC-passed reads + QC-failed reads)
265575 + 0 mapped (74.44%:nan%)
356766 + 0 paired in sequencing
178383 + 0 read1
178383 + 0 read2
11146 + 0 properly paired (3.12%:nan%)

It seems the algo takes a major hit during the pairing of reads.


Jon_Keats 06-23-2011 01:25 PM

What was your experiment design? Standard Illumina mate-pair? Read length? Also how did you reverse complement and what was the bwa command line arguements?

I've done standard Illumina MP preps, reverse complemented with fastx-toolkit and aligned with standard parameters using bwa before with good success. See below

133724796 in total
0 QC failure
45170770 duplicates
121800898 mapped (91.08%)
133724796 paired in sequencing
66862398 read1
66862398 read2
102897270 properly paired (76.95%)
115119322 with itself and mate mapped
6681576 singletons (5.00%)
3387642 with mate mapped to a different chr
2495203 with mate mapped to a different chr (mapQ>=5)

apratap 06-23-2011 01:37 PM

Hi Jon

Thanks for your reply. The protocol is not standard we are trying to sequence the ends of transcripts using Mate Pair technique.

I data that I get after linker removal is of variable read length 60+/-20 bp. I reverse compliment the reads based on the basic definition reverse the read and then compliment it and also reverse the quality header.

One thing that could trick BWA is the variable fragment size as it dependent on the length of transcripts that we are trying to capture.

As per BWA options I have pretty much used the standard ones. At this point I am not so concerned about the mapping % as I am about the need for reverse complimenting the reads before mapping with BWA and how it handles the multi read mapping.


Jon_Keats 06-23-2011 01:48 PM

I'm assuming you are trying to get the 5' and 3' ends of each RNA species by circularizing the cDNA? Neat idea, definately the weird distribution when mapped to genome will give bwa some problems. You might want to try Tophat instead for the alignment.

For reads that map at multiple locations bwa will report the other potential sites and will randomly select one unless the mate/pair read dictates the location, but even then it should report the alternative options.

apratap 06-23-2011 01:51 PM

Any idea why BWA needs reads to be inner directional (---> <---) for it to map them.

I guess Tophat will not work as the read lengths are variable and as per my understanding of the version I used they require read 1/2 of equal length. In our case based on the identification of linker the read length will be variable.


Jon_Keats 06-23-2011 04:38 PM

Maybe try BWA in single end mode, filter out the reads aligning to multiple locations, then manually pair the reads using perl or something to find the mates/pairs that mark the ends of your RNA species

All times are GMT -8. The time now is 08:02 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.