Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Aligners for Illumina's mate-pairs

    Hello everybody,

    I just started working with mate-pair data from Illumina and I am a bit lost regarding which aligners can actually work with mate-pairs. Most aligners work with paired-ends but I am starting to realize that that does not mean they work with mate-pairs as well.

    Am I correct that bwa can only work with paired-end reads but not mate-pairs?
    How about Novoalign? I could not find an explicit mention to mate-pairs but I could easily have overlooked it.

    Mosaik does work with mate-pairs but MosaikSort won't work if more than 10% of the reads have paired-end orientation (which unfortunately happens in some samples due to contamination).

    I would be very grateful if people could share their experience working with mate-pairs.

    Thanks!!

  • #2
    Novoalign will do mate-pair alignments on Illumina reads but you would need to ensure you cover 2 items

    1) Reverse complement both reads (R1 and R2) in the pair
    2) Set the "-i" option to specify your expected insert length for the library

    Let us know how it goes.

    Comment


    • #3
      Originally posted by Margarida View Post
      I just started working with mate-pair data from Illumina and I am a bit lost regarding which aligners can actually work with mate-pairs. Most aligners work with paired-ends but I am starting to realize that that does not mean they work with mate-pairs as well.
      I think any PE aligner will work with MP reads as long as you re-orient the reads to match the PE orientation, and can set the insert length to the appropriate higher value.

      In terms of orientation, Illumina PE is (L-> <-R) whereas MP is (<-L R->). As zee said in his post, just reverse complement L and R and they will appear like PE oriented reads. (FYI SOLiD3 mate-pairs are (R-> F->) orientation.)

      Originally posted by Margarida View Post
      Mosaik does work with mate-pairs but MosaikSort won't work if more than 10% of the reads have paired-end orientation (which unfortunately happens in some samples due to contamination).
      The inexact nature of the MP library prep protocol means there will always be some PE contamination. If you only have 10%, you are lucky. I've seen as high as 50% in MP data sets! You can try and remove some PE reads by mapping to de novo assembled contigs (treating them as SE reads?) and if they map as (L-> 200bp <-R) then you can separate those from your primary data set.

      Comment


      • #4
        Zee and Torst, thank you so much for your insight. Now I know how to proceed.

        Torst, it's good to know that ~10% contamination of MP reads with PE reads is not excessive. I was worried it might be. I will follow your suggestion to try to remove some of the PE reads.

        Comment


        • #5
          Hello everyone,

          I join this thread to ask a follow-up question about how Illumina ensures the orientation of its paired-end or mate-pair data:
          I recently noticed, that (let's keep it simple) about half of my mate-pair data mapped in forward-reverse orientation (FR, or L-> <-R) and the other half in RF (i.e. <-L R->). I used bwa for the alignment.
          Torst already wrote, that Illumina's mate-pair technology produces pairs in RF orientation. Does it kind of "sort" the reads, that the first one is always in reverse direction? So, every pair is in fact nearly equally likely to map in any of these orientations on the reference?
          Or could anything have gone wrong with bwa? Does it have any requirements on mate-pair data? Its manual page doesn't really help in this case. But, I still can't really get a picture of how bwa should have problems with the orientation of the pairs. Since the results look pretty ok to me. I'm just curious of how Illumina mate-pair (or paired-end) data can map in both (RF and FR) orientations.
          I would be very thankful for any help and explanation.
          Last edited by ForeignMan; 07-06-2010, 01:28 PM.

          Comment


          • #6
            Originally posted by ForeignMan View Post
            Hello everyone,
            I recently noticed, that (let's keep it simple) about half of my mate-pair data mapped in forward-reverse orientation (FR, or L-> <-R) and the other half in RF (i.e. <-L R->).
            That sounds like a typically bad Illumina mate-pair (MP) library prep - about 50% contamination with PE reads.

            Torst already wrote, that Illumina's mate-pair technology produces pairs in RF orientation. Does it kind of "sort" the reads, that the first one is always in reverse direction? So, every pair is in fact nearly equally likely to map in any of these orientations on the reference?
            No sorting is done. It is random sampling to which strand the "LEFT" read is on.

            Or could anything have gone wrong with bwa? Does it have any requirements on mate-pair data? Its manual page doesn't really help in this case. But, I still can't really get a picture of how bwa should have problems with the orientation of the pairs.
            My understanding is that "bwa sampe" expects paired reads to be oriented L-> <-R like Illumina PE (and not like Illumina MP).

            Since the results look pretty ok to me. I'm just curious of how Illumina mate-pair (or paired-end) data can map in both (RF and FR) orientations.
            I would be very thankful for any help and explanation.
            In an ideal world the library preparation step (done by molecular biologists in lab coats with tubes etc) for MP would be perfect and only <-R L-> pairs would be generated. However real world is imperfect, and the protocol is complicated, and purity is challenging, and some undesirable DNA fragments get left in the mix and end up as TRUE PE reads in a MP prep.

            Comment


            • #7
              Thank you very much for your answer!
              I understand the situation, especially the lab preparations, much better, now.
              I still need help with three other questions:

              (1) About fragment sizes: I am wondering about the fragment sizes of the undesirable PE reads: According to the Illumina protocol, I would expect them to have a fragment size of 400-500 bp. But in my case, most of the "bad" reads in FR orientation (about 90%) have a fragment size between 1.000 bp and 3.000 bp (mean fragment size is almost 2.200 bp).
              What could be the reason for this?

              (2) About alignment orientation: There is something else, that is not really clear to me: In my data (and I saw this also other datasets and alignments), there are still numerable pairs, where both ends mapped to the same strand. Did anyone else notice this with mate-pair (or paired-end) data? Or am I still confusing the orientations/alignments?

              (3) About aligners: I now read mulitple times (in this and in other threads) that bwa does not work with mate-pairs. What exactly does that mean? Should it give an error message, or does it give wrong results? As a matter of fact, I tried bwa (version 0.5.7) with a few Illumina mate-pair datasets and if I hadn't read here that it shouldn't not work, I would not have noticed anything. The results seemed pretty ok and comprehensible to me (apart from the insert size distribution described in (1); but I also had a dataset, where everything was fine...). It's kind of strange and contradictory. Are there any good alternative aligners for structual variation analysis with mate-pair data? In my opinion, MAQ takes way too much time for this high amount of data (about 8 lanes with > 200.000.000 pairs). I never worked with Mosaik yet, but I'd also fear, that MosaikSort will not work, if it's right what Margarida said in the first post. Is it necessary to run MosaikSort, or can I just use MosaikAligner and then move on to other tools? I also have experience with bowtie, but it seems to me, that it discards any reads beyond the given insert size ranges (-I and -X parameters). They appear as unmapped in the SAM-output-file. I want to run structural variation tools like breakdancer and gasv on the alignment, and the behaviour of bowtie to report only valid pairs is not very helpful. I could not find a way (or parameter) to get around that. I would really, really appreciate any tipps!

              Thanks in advance to everyone, who has some ideas.

              P.S.: Sorry, that my post is so long and I'm asking so much .
              Last edited by ForeignMan; 07-16-2010, 02:08 AM. Reason: More questions

              Comment


              • #8
                Update: the latest version of Novoalign (v 2.07) will handle Illumina mate-pair libraries. No need to reverse complement the reads and it deals with paired-end contamination.

                Comment


                • #9
                  alignment orientation

                  I would like to bring up this topic, especially for the effect of bwa sampe with mate pair -- could anyone here help me to understant whether and why bwa sampe can NOT handle RF orientation pair? What is the impact?

                  Thanks in advance!


                  Originally posted by ForeignMan View Post
                  Thank you very much for your answer!
                  I understand the situation, especially the lab preparations, much better, now.
                  I still need help with three other questions:

                  (1) About fragment sizes: I am wondering about the fragment sizes of the undesirable PE reads: According to the Illumina protocol, I would expect them to have a fragment size of 400-500 bp. But in my case, most of the "bad" reads in FR orientation (about 90%) have a fragment size between 1.000 bp and 3.000 bp (mean fragment size is almost 2.200 bp).
                  What could be the reason for this?

                  (2) About alignment orientation: There is something else, that is not really clear to me: In my data (and I saw this also other datasets and alignments), there are still numerable pairs, where both ends mapped to the same strand. Did anyone else notice this with mate-pair (or paired-end) data? Or am I still confusing the orientations/alignments?

                  (3) About aligners: I now read mulitple times (in this and in other threads) that bwa does not work with mate-pairs. What exactly does that mean? Should it give an error message, or does it give wrong results? As a matter of fact, I tried bwa (version 0.5.7) with a few Illumina mate-pair datasets and if I hadn't read here that it shouldn't not work, I would not have noticed anything. The results seemed pretty ok and comprehensible to me (apart from the insert size distribution described in (1); but I also had a dataset, where everything was fine...). It's kind of strange and contradictory. Are there any good alternative aligners for structual variation analysis with mate-pair data? In my opinion, MAQ takes way too much time for this high amount of data (about 8 lanes with > 200.000.000 pairs). I never worked with Mosaik yet, but I'd also fear, that MosaikSort will not work, if it's right what Margarida said in the first post. Is it necessary to run MosaikSort, or can I just use MosaikAligner and then move on to other tools? I also have experience with bowtie, but it seems to me, that it discards any reads beyond the given insert size ranges (-I and -X parameters). They appear as unmapped in the SAM-output-file. I want to run structural variation tools like breakdancer and gasv on the alignment, and the behaviour of bowtie to report only valid pairs is not very helpful. I could not find a way (or parameter) to get around that. I would really, really appreciate any tipps!

                  Thanks in advance to everyone, who has some ideas.

                  P.S.: Sorry, that my post is so long and I'm asking so much .

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:37 PM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 06:07 PM
                  0 responses
                  9 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  51 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  67 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X