SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
bfast: how does it recognize SOLiD mate pairs Calle SOLiD 13 09-09-2010 06:18 PM
Read length for Illumina mate pairs Linnea Illumina/Solexa 2 06-08-2010 11:46 PM
454 mate pairs and mosaik afb Bioinformatics 4 04-02-2010 05:07 AM
Mate pairs contaminated with paired ends - impact on assembly? reithme Bioinformatics 2 12-13-2009 11:35 PM
find all mate-pairs (75b / 0-infinity gap) alignment ramouz87 Bioinformatics 5 11-18-2009 11:04 PM

Reply
 
Thread Tools
Old 05-11-2010, 01:32 PM   #1
Margarida
Junior Member
 
Location: Ithaca, NY

Join Date: Jan 2010
Posts: 6
Default Aligners for Illumina's mate-pairs

Hello everybody,

I just started working with mate-pair data from Illumina and I am a bit lost regarding which aligners can actually work with mate-pairs. Most aligners work with paired-ends but I am starting to realize that that does not mean they work with mate-pairs as well.

Am I correct that bwa can only work with paired-end reads but not mate-pairs?
How about Novoalign? I could not find an explicit mention to mate-pairs but I could easily have overlooked it.

Mosaik does work with mate-pairs but MosaikSort won't work if more than 10% of the reads have paired-end orientation (which unfortunately happens in some samples due to contamination).

I would be very grateful if people could share their experience working with mate-pairs.

Thanks!!
Margarida is offline   Reply With Quote
Old 05-11-2010, 06:29 PM   #2
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

Novoalign will do mate-pair alignments on Illumina reads but you would need to ensure you cover 2 items

1) Reverse complement both reads (R1 and R2) in the pair
2) Set the "-i" option to specify your expected insert length for the library

Let us know how it goes.
zee is offline   Reply With Quote
Old 05-13-2010, 11:38 PM   #3
Torst
Senior Member
 
Location: The University of Melbourne, AUSTRALIA

Join Date: Apr 2008
Posts: 275
Default

Quote:
Originally Posted by Margarida View Post
I just started working with mate-pair data from Illumina and I am a bit lost regarding which aligners can actually work with mate-pairs. Most aligners work with paired-ends but I am starting to realize that that does not mean they work with mate-pairs as well.
I think any PE aligner will work with MP reads as long as you re-orient the reads to match the PE orientation, and can set the insert length to the appropriate higher value.

In terms of orientation, Illumina PE is (L-> <-R) whereas MP is (<-L R->). As zee said in his post, just reverse complement L and R and they will appear like PE oriented reads. (FYI SOLiD3 mate-pairs are (R-> F->) orientation.)

Quote:
Originally Posted by Margarida View Post
Mosaik does work with mate-pairs but MosaikSort won't work if more than 10% of the reads have paired-end orientation (which unfortunately happens in some samples due to contamination).
The inexact nature of the MP library prep protocol means there will always be some PE contamination. If you only have 10%, you are lucky. I've seen as high as 50% in MP data sets! You can try and remove some PE reads by mapping to de novo assembled contigs (treating them as SE reads?) and if they map as (L-> 200bp <-R) then you can separate those from your primary data set.
Torst is offline   Reply With Quote
Old 05-14-2010, 07:24 AM   #4
Margarida
Junior Member
 
Location: Ithaca, NY

Join Date: Jan 2010
Posts: 6
Default

Zee and Torst, thank you so much for your insight. Now I know how to proceed.

Torst, it's good to know that ~10% contamination of MP reads with PE reads is not excessive. I was worried it might be. I will follow your suggestion to try to remove some of the PE reads.
Margarida is offline   Reply With Quote
Old 07-06-2010, 01:25 PM   #5
ForeignMan
Member
 
Location: Germany

Join Date: Jun 2010
Posts: 20
Default

Hello everyone,

I join this thread to ask a follow-up question about how Illumina ensures the orientation of its paired-end or mate-pair data:
I recently noticed, that (let's keep it simple) about half of my mate-pair data mapped in forward-reverse orientation (FR, or L-> <-R) and the other half in RF (i.e. <-L R->). I used bwa for the alignment.
Torst already wrote, that Illumina's mate-pair technology produces pairs in RF orientation. Does it kind of "sort" the reads, that the first one is always in reverse direction? So, every pair is in fact nearly equally likely to map in any of these orientations on the reference?
Or could anything have gone wrong with bwa? Does it have any requirements on mate-pair data? Its manual page doesn't really help in this case. But, I still can't really get a picture of how bwa should have problems with the orientation of the pairs. Since the results look pretty ok to me. I'm just curious of how Illumina mate-pair (or paired-end) data can map in both (RF and FR) orientations.
I would be very thankful for any help and explanation.

Last edited by ForeignMan; 07-06-2010 at 01:28 PM.
ForeignMan is offline   Reply With Quote
Old 07-10-2010, 11:28 PM   #6
Torst
Senior Member
 
Location: The University of Melbourne, AUSTRALIA

Join Date: Apr 2008
Posts: 275
Default

Quote:
Originally Posted by ForeignMan View Post
Hello everyone,
I recently noticed, that (let's keep it simple) about half of my mate-pair data mapped in forward-reverse orientation (FR, or L-> <-R) and the other half in RF (i.e. <-L R->).
That sounds like a typically bad Illumina mate-pair (MP) library prep - about 50% contamination with PE reads.

Quote:
Torst already wrote, that Illumina's mate-pair technology produces pairs in RF orientation. Does it kind of "sort" the reads, that the first one is always in reverse direction? So, every pair is in fact nearly equally likely to map in any of these orientations on the reference?
No sorting is done. It is random sampling to which strand the "LEFT" read is on.

Quote:
Or could anything have gone wrong with bwa? Does it have any requirements on mate-pair data? Its manual page doesn't really help in this case. But, I still can't really get a picture of how bwa should have problems with the orientation of the pairs.
My understanding is that "bwa sampe" expects paired reads to be oriented L-> <-R like Illumina PE (and not like Illumina MP).

Quote:
Since the results look pretty ok to me. I'm just curious of how Illumina mate-pair (or paired-end) data can map in both (RF and FR) orientations.
I would be very thankful for any help and explanation.
In an ideal world the library preparation step (done by molecular biologists in lab coats with tubes etc) for MP would be perfect and only <-R L-> pairs would be generated. However real world is imperfect, and the protocol is complicated, and purity is challenging, and some undesirable DNA fragments get left in the mix and end up as TRUE PE reads in a MP prep.
Torst is offline   Reply With Quote
Old 07-12-2010, 04:22 AM   #7
ForeignMan
Member
 
Location: Germany

Join Date: Jun 2010
Posts: 20
Default

Thank you very much for your answer!
I understand the situation, especially the lab preparations, much better, now.
I still need help with three other questions:

(1) About fragment sizes: I am wondering about the fragment sizes of the undesirable PE reads: According to the Illumina protocol, I would expect them to have a fragment size of 400-500 bp. But in my case, most of the "bad" reads in FR orientation (about 90%) have a fragment size between 1.000 bp and 3.000 bp (mean fragment size is almost 2.200 bp).
What could be the reason for this?

(2) About alignment orientation: There is something else, that is not really clear to me: In my data (and I saw this also other datasets and alignments), there are still numerable pairs, where both ends mapped to the same strand. Did anyone else notice this with mate-pair (or paired-end) data? Or am I still confusing the orientations/alignments?

(3) About aligners: I now read mulitple times (in this and in other threads) that bwa does not work with mate-pairs. What exactly does that mean? Should it give an error message, or does it give wrong results? As a matter of fact, I tried bwa (version 0.5.7) with a few Illumina mate-pair datasets and if I hadn't read here that it shouldn't not work, I would not have noticed anything. The results seemed pretty ok and comprehensible to me (apart from the insert size distribution described in (1); but I also had a dataset, where everything was fine...). It's kind of strange and contradictory. Are there any good alternative aligners for structual variation analysis with mate-pair data? In my opinion, MAQ takes way too much time for this high amount of data (about 8 lanes with > 200.000.000 pairs). I never worked with Mosaik yet, but I'd also fear, that MosaikSort will not work, if it's right what Margarida said in the first post. Is it necessary to run MosaikSort, or can I just use MosaikAligner and then move on to other tools? I also have experience with bowtie, but it seems to me, that it discards any reads beyond the given insert size ranges (-I and -X parameters). They appear as unmapped in the SAM-output-file. I want to run structural variation tools like breakdancer and gasv on the alignment, and the behaviour of bowtie to report only valid pairs is not very helpful. I could not find a way (or parameter) to get around that. I would really, really appreciate any tipps!

Thanks in advance to everyone, who has some ideas.

P.S.: Sorry, that my post is so long and I'm asking so much .

Last edited by ForeignMan; 07-16-2010 at 02:08 AM. Reason: More questions
ForeignMan is offline   Reply With Quote
Old 08-24-2010, 10:03 AM   #8
srs57
Junior Member
 
Location: Ithaca

Join Date: Jun 2010
Posts: 1
Default

Update: the latest version of Novoalign (v 2.07) will handle Illumina mate-pair libraries. No need to reverse complement the reads and it deals with paired-end contamination.
srs57 is offline   Reply With Quote
Old 07-29-2013, 09:28 AM   #9
fishyu
Junior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 1
Default alignment orientation

I would like to bring up this topic, especially for the effect of bwa sampe with mate pair -- could anyone here help me to understant whether and why bwa sampe can NOT handle RF orientation pair? What is the impact?

Thanks in advance!


Quote:
Originally Posted by ForeignMan View Post
Thank you very much for your answer!
I understand the situation, especially the lab preparations, much better, now.
I still need help with three other questions:

(1) About fragment sizes: I am wondering about the fragment sizes of the undesirable PE reads: According to the Illumina protocol, I would expect them to have a fragment size of 400-500 bp. But in my case, most of the "bad" reads in FR orientation (about 90%) have a fragment size between 1.000 bp and 3.000 bp (mean fragment size is almost 2.200 bp).
What could be the reason for this?

(2) About alignment orientation: There is something else, that is not really clear to me: In my data (and I saw this also other datasets and alignments), there are still numerable pairs, where both ends mapped to the same strand. Did anyone else notice this with mate-pair (or paired-end) data? Or am I still confusing the orientations/alignments?

(3) About aligners: I now read mulitple times (in this and in other threads) that bwa does not work with mate-pairs. What exactly does that mean? Should it give an error message, or does it give wrong results? As a matter of fact, I tried bwa (version 0.5.7) with a few Illumina mate-pair datasets and if I hadn't read here that it shouldn't not work, I would not have noticed anything. The results seemed pretty ok and comprehensible to me (apart from the insert size distribution described in (1); but I also had a dataset, where everything was fine...). It's kind of strange and contradictory. Are there any good alternative aligners for structual variation analysis with mate-pair data? In my opinion, MAQ takes way too much time for this high amount of data (about 8 lanes with > 200.000.000 pairs). I never worked with Mosaik yet, but I'd also fear, that MosaikSort will not work, if it's right what Margarida said in the first post. Is it necessary to run MosaikSort, or can I just use MosaikAligner and then move on to other tools? I also have experience with bowtie, but it seems to me, that it discards any reads beyond the given insert size ranges (-I and -X parameters). They appear as unmapped in the SAM-output-file. I want to run structural variation tools like breakdancer and gasv on the alignment, and the behaviour of bowtie to report only valid pairs is not very helpful. I could not find a way (or parameter) to get around that. I would really, really appreciate any tipps!

Thanks in advance to everyone, who has some ideas.

P.S.: Sorry, that my post is so long and I'm asking so much .
fishyu is offline   Reply With Quote
Reply

Tags
bwa, mate pair, mate-pair, mosaik, novoalign

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:36 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO