SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics
Similar Threads
Thread Thread Starter Forum Replies Last Post
TopHat -paired end vs single end reads adarshjose RNA Sequencing 10 06-12-2012 06:15 PM
Paired end reads in Tophat mathew Bioinformatics 8 03-22-2012 04:57 AM
TopHat: how to use paired-end reads without partner nike00 RNA Sequencing 2 07-20-2011 01:46 AM
tophat with mixed paired end reads nimmi RNA Sequencing 2 11-10-2010 11:03 AM
Paired end inconsistency in tophat SAM dariober Bioinformatics 2 11-06-2010 06:59 AM

Reply
 
Thread Tools
Old 05-28-2010, 06:37 AM   #1
Bio.X2Y
Member
 
Location: Europe

Join Date: Apr 2010
Posts: 46
Default TopHat SAM - Expressing Paired End Multi-reads

Hi,

I've been struggling to understand exactly how TopHat represents multi-reads in its output SAM file, especially in the context of paired end reads. I've done some background reading, but I haven't been able to clear things up - somefully someone can help.

Let's say TopHat is considering a pair with ends A and B, and it finds two alignment combinations that make sense (i.e. where both ends map to opposite strands of the same chromosome, at an expected distance apart):

A->P1
B->P2

A->P3
B->P4

How are these represented in the SAM? The way I understand it at the moment, there will be 4 lines in the SAM, one for each alignment. However, I can't see how the knowledge that A->P1 and B->P2 are associated with each other in an important way is represented.

Put another way, I can't see how you could take the SAM record A->P1 and "find" the corresponding sibling B->P2 while recognizing that B->P4 is not the correct sibling.

If this information is in fact lost, does this mean that SAM is not expressive enough to capture ambiguous alignments for paired end reads? And won't it mean that downstream processors, e.g. CuffLinks, will not have access to important information that was originally available?

Apologies for the long-winded question!

thanks for your time,
Bio.X2Y is offline   Reply With Quote
Old 05-28-2010, 07:16 AM   #2
Thomas Doktor
Senior Member
 
Location: University of Southern Denmark (SDU), Denmark

Join Date: Apr 2009
Posts: 105
Default

I believe the position of the mate is contained in field 8 and the distance between mates is contained in the field 9 in the SAM format so the SAM format should be able to contain enough information to correctly match P1 with P2 and P3 with P4. Briefly looking at my own SAM files produced by TopHat, it seems TopHat does not use field 9, but the mate position is reported in field 8 (if the mate is mapped).
Thomas Doktor is offline   Reply With Quote
Old 05-28-2010, 08:07 AM   #3
Bio.X2Y
Member
 
Location: Europe

Join Date: Apr 2010
Posts: 46
Default

Thanks Thomas.

Perhaps there is still a certain amount of ambiguity in the some cases, e.g.?

A->P1
B->P2

A->P3
B->P2 (B is the same in both)

In this scenario, presumably TopHat would output two B->P2 SAM records, each with a different field 8?

If this is the case, field 8 of the A->P1 SAM record is now ambiguous (since it refers equally well to both B->P2s)?

I wonder how CuffLinks etc. processes these kinds of scenarios - perhaps it doesn't need to deconvolute pairs?

Thanks
Bio.X2Y is offline   Reply With Quote
Reply

Tags
multireads, paired end, sam, tophat

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



All times are GMT -8. The time now is 08:07 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO