Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • how paired end alignment works?

    Hello, all

    I am completely new in bioinformatics, I start to read papers about alignment, but I just couldn't make an image how paired-end alignment works. I'v been searching around by google, it seems I couldn't get a tutorial or paper to clearly instruct me that.

    What I need is a tutorial with some simple examples of PE alignment. Can anyone provide me a link here? Thanks a lot.

  • #2
    Originally posted by totalnew View Post
    Hello, all

    I am completely new in bioinformatics, I start to read papers about alignment, but I just couldn't make an image how paired-end alignment works. I'v been searching around by google, it seems I couldn't get a tutorial or paper to clearly instruct me that.

    What I need is a tutorial with some simple examples of PE alignment. Can anyone provide me a link here? Thanks a lot.
    Check out papers for aligners like MAQ or SOAP. Most aligners align each end independently, or at most consider using one end and the expected insert size to infer the location of the other (to be more sensitive).

    Comment


    • #3
      Thanks, nilshomer! I read Heng's paper about Maq, and paper for SOAP, PE part is kind of short, they wouldn't make me clearly understand how it works. Any other straightforward documentations?

      thanks
      Last edited by totalnew; 04-27-2009, 09:27 AM.

      Comment


      • #4
        Originally posted by totalnew View Post
        Thanks, nilshomer! I read Heng's paper about Maq, and paper for SOAP, PE part is kind of short, they wouldn't make me clearly understand how it works. Any other straightforward documentations?

        thanks
        There isn't too much to say about paired end alignment, just align them independently. PM if you want a draft of my own alignment paper.

        Comment


        • #5
          Actually there are a lot to say about paired-end mapping. This is where the accuracy of different aligners differs. The algorithms can be classified into four groups.

          a) Eland-like strategy. Eland finds up to 10 equally best hits first and then check which pair (10x10 in total) is consistent. SSAHA2 uses a similar strategy, but seeing more top hits.

          b) SOAP-like strategy. SOAP finds almost all the hits and then pair them. I do not know whether it may map a read to a suboptimal position if its mate is hanging around. I am sure SOAP-2.0.1 and BWA do this if necessary. You can say a) and b) are essentially the same, but only b) is useful to anchor reads in repeats.

          c) MAQ-like strategy. MAQ does not find all the single-end hits first. It pairs the reads while doing the alignment. For programs indexing reads, this strategy is more effective and efficient than collecting all the single-end hits first.

          d) We can map one end first and then do local alignment around the region pointed by the mapped reads. This strategy is usually combined with the previous. I believe NovoAlign/MAQ/BWA use this strategy as a complement to other strategies.

          For short reads, proper pairing increases the coverage of the genome and substantially reduce false alignments.

          Comment


          • #6
            Originally posted by lh3 View Post
            For short reads, proper pairing increases the coverage of the genome and substantially reduce false alignments.
            The above is exactly right, although it may depend on the exact experiment. For example cancer sequencing we expect many translocations, or large-scale rearrangements, and preferring "paired reads" may reduce our power.

            In general, if an aligner produces all hits for each end, any post-alignment filtering is possible (all the above classes). Of course some limit must be placed on the number of hits returned (thousands is overkill), since my 4 petabyte array of solid state hard drives has yet to arrive in the mail.

            Comment


            • #7
              On the contrary, my experience is detecting structural variations (SVs) particularly presses for highly effective pairing. In the real world, abnormal pairs are most likely to be caused by false alignments rather than true SVs, which is also true for cancer genomes. And if a read can be paired with its mate, the alignment tends to be correct. I know several groups on detecting SVs put a lot of effort on getting more reads paired.

              Whether keeping all hits is a debate. Surely we can recover anything, but the cost is considerable. How to use them effectively for SV detection is also an open question, I think. In addition, for effective pairing, keeping thousands of hits or keeping equally best hits only is not good enough. It is important to see sufficient suboptimal hits. NovoAlign is the most accurate aligner mainly because it sees many suboptimal hits and achieves highest pairing fraction.

              Alignment accuracy is no so important for resequencing, but it is one of the most important factors for SV detection.
              Last edited by lh3; 04-27-2009, 12:20 PM.

              Comment


              • #8
                Thanks a lot, nils & Heng! But still need time to digest what you have mentioned above, .

                Comment


                • #9
                  What I don't follow is if you align each end separately you will get the highest pairing fraction, since you are the very sensitive in this case (fewer constraints, in fact no constraints between each end). Furthermore, using one end to infer hits for the other can also increase sensitivity.

                  In my own experience, if one is sensitive enough, potentially false SVs (due to mapping) can be eliminated since by examining the secondary hits for each end, and seeing if there exists a pair of alignments for each end that fall within the expected insert size distribution that are not too much worse than the "best pair". Is this what you are talking about? If so, then we agree.

                  I would take exception to Novoalign being the most accurate, since this is conditional on sensitivity, as well as the many definitions of "accuracy".

                  Finally, I think you and I have a fundamental disagreement between what an aligner should do. I think it should return all hits for a given read that it can find (sensitivity), and let the user filter/choose the best alignment or alignment pair based on their experiment. I would prefer gapped smith-waterman, but this could vary based on experiment. Given this, I agree to disagree.

                  The aligner is but one step in the whole process, and everything shouldn't be lumped into the alignment algorithm.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:37 PM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 06:07 PM
                  0 responses
                  9 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  50 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  67 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X