SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How long should paired-end alignment run? agc Bioinformatics 11 09-07-2011 01:31 AM
Tophat paired end alignment. adeslat SOLiD 5 08-07-2011 04:05 AM
BWA alignment for paired end reads AvinashP Genomic Resequencing 2 06-08-2010 04:11 AM
MAQ paired-end alignment parameters baohua100 Bioinformatics 0 08-19-2009 01:28 PM
why is paired-end alignment support so important found Bioinformatics 1 03-03-2009 08:05 AM

Reply
 
Thread Tools
Old 04-24-2009, 05:04 PM   #1
totalnew
Member
 
Location: Canada

Join Date: Apr 2009
Posts: 46
Default how paired end alignment works?

Hello, all

I am completely new in bioinformatics, I start to read papers about alignment, but I just couldn't make an image how paired-end alignment works. I'v been searching around by google, it seems I couldn't get a tutorial or paper to clearly instruct me that.

What I need is a tutorial with some simple examples of PE alignment. Can anyone provide me a link here? Thanks a lot.
totalnew is offline   Reply With Quote
Old 04-24-2009, 08:58 PM   #2
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by totalnew View Post
Hello, all

I am completely new in bioinformatics, I start to read papers about alignment, but I just couldn't make an image how paired-end alignment works. I'v been searching around by google, it seems I couldn't get a tutorial or paper to clearly instruct me that.

What I need is a tutorial with some simple examples of PE alignment. Can anyone provide me a link here? Thanks a lot.
Check out papers for aligners like MAQ or SOAP. Most aligners align each end independently, or at most consider using one end and the expected insert size to infer the location of the other (to be more sensitive).
nilshomer is offline   Reply With Quote
Old 04-27-2009, 10:21 AM   #3
totalnew
Member
 
Location: Canada

Join Date: Apr 2009
Posts: 46
Default

Thanks, nilshomer! I read Heng's paper about Maq, and paper for SOAP, PE part is kind of short, they wouldn't make me clearly understand how it works. Any other straightforward documentations?

thanks

Last edited by totalnew; 04-27-2009 at 10:27 AM.
totalnew is offline   Reply With Quote
Old 04-27-2009, 12:21 PM   #4
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by totalnew View Post
Thanks, nilshomer! I read Heng's paper about Maq, and paper for SOAP, PE part is kind of short, they wouldn't make me clearly understand how it works. Any other straightforward documentations?

thanks
There isn't too much to say about paired end alignment, just align them independently. PM if you want a draft of my own alignment paper.
nilshomer is offline   Reply With Quote
Old 04-27-2009, 12:45 PM   #5
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

Actually there are a lot to say about paired-end mapping. This is where the accuracy of different aligners differs. The algorithms can be classified into four groups.

a) Eland-like strategy. Eland finds up to 10 equally best hits first and then check which pair (10x10 in total) is consistent. SSAHA2 uses a similar strategy, but seeing more top hits.

b) SOAP-like strategy. SOAP finds almost all the hits and then pair them. I do not know whether it may map a read to a suboptimal position if its mate is hanging around. I am sure SOAP-2.0.1 and BWA do this if necessary. You can say a) and b) are essentially the same, but only b) is useful to anchor reads in repeats.

c) MAQ-like strategy. MAQ does not find all the single-end hits first. It pairs the reads while doing the alignment. For programs indexing reads, this strategy is more effective and efficient than collecting all the single-end hits first.

d) We can map one end first and then do local alignment around the region pointed by the mapped reads. This strategy is usually combined with the previous. I believe NovoAlign/MAQ/BWA use this strategy as a complement to other strategies.

For short reads, proper pairing increases the coverage of the genome and substantially reduce false alignments.
lh3 is offline   Reply With Quote
Old 04-27-2009, 12:54 PM   #6
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by lh3 View Post
For short reads, proper pairing increases the coverage of the genome and substantially reduce false alignments.
The above is exactly right, although it may depend on the exact experiment. For example cancer sequencing we expect many translocations, or large-scale rearrangements, and preferring "paired reads" may reduce our power.

In general, if an aligner produces all hits for each end, any post-alignment filtering is possible (all the above classes). Of course some limit must be placed on the number of hits returned (thousands is overkill), since my 4 petabyte array of solid state hard drives has yet to arrive in the mail.
nilshomer is offline   Reply With Quote
Old 04-27-2009, 01:15 PM   #7
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

On the contrary, my experience is detecting structural variations (SVs) particularly presses for highly effective pairing. In the real world, abnormal pairs are most likely to be caused by false alignments rather than true SVs, which is also true for cancer genomes. And if a read can be paired with its mate, the alignment tends to be correct. I know several groups on detecting SVs put a lot of effort on getting more reads paired.

Whether keeping all hits is a debate. Surely we can recover anything, but the cost is considerable. How to use them effectively for SV detection is also an open question, I think. In addition, for effective pairing, keeping thousands of hits or keeping equally best hits only is not good enough. It is important to see sufficient suboptimal hits. NovoAlign is the most accurate aligner mainly because it sees many suboptimal hits and achieves highest pairing fraction.

Alignment accuracy is no so important for resequencing, but it is one of the most important factors for SV detection.

Last edited by lh3; 04-27-2009 at 01:20 PM.
lh3 is offline   Reply With Quote
Old 04-27-2009, 01:41 PM   #8
totalnew
Member
 
Location: Canada

Join Date: Apr 2009
Posts: 46
Default

Thanks a lot, nils & Heng! But still need time to digest what you have mentioned above, .
totalnew is offline   Reply With Quote
Old 04-27-2009, 01:46 PM   #9
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

What I don't follow is if you align each end separately you will get the highest pairing fraction, since you are the very sensitive in this case (fewer constraints, in fact no constraints between each end). Furthermore, using one end to infer hits for the other can also increase sensitivity.

In my own experience, if one is sensitive enough, potentially false SVs (due to mapping) can be eliminated since by examining the secondary hits for each end, and seeing if there exists a pair of alignments for each end that fall within the expected insert size distribution that are not too much worse than the "best pair". Is this what you are talking about? If so, then we agree.

I would take exception to Novoalign being the most accurate, since this is conditional on sensitivity, as well as the many definitions of "accuracy".

Finally, I think you and I have a fundamental disagreement between what an aligner should do. I think it should return all hits for a given read that it can find (sensitivity), and let the user filter/choose the best alignment or alignment pair based on their experiment. I would prefer gapped smith-waterman, but this could vary based on experiment. Given this, I agree to disagree.

The aligner is but one step in the whole process, and everything shouldn't be lumped into the alignment algorithm.
nilshomer is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:06 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO