![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
TopHat -paired end vs single end reads | adarshjose | RNA Sequencing | 10 | 06-12-2012 06:15 PM |
Paired end reads in Tophat | mathew | Bioinformatics | 8 | 03-22-2012 04:57 AM |
TopHat: how to use paired-end reads without partner | nike00 | RNA Sequencing | 2 | 07-20-2011 01:46 AM |
tophat with mixed paired end reads | nimmi | RNA Sequencing | 2 | 11-10-2010 11:03 AM |
Paired end inconsistency in tophat SAM | dariober | Bioinformatics | 2 | 11-06-2010 06:59 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Europe Join Date: Apr 2010
Posts: 46
|
![]()
Hi,
I've been struggling to understand exactly how TopHat represents multi-reads in its output SAM file, especially in the context of paired end reads. I've done some background reading, but I haven't been able to clear things up - somefully someone can help. Let's say TopHat is considering a pair with ends A and B, and it finds two alignment combinations that make sense (i.e. where both ends map to opposite strands of the same chromosome, at an expected distance apart): A->P1 B->P2 A->P3 B->P4 How are these represented in the SAM? The way I understand it at the moment, there will be 4 lines in the SAM, one for each alignment. However, I can't see how the knowledge that A->P1 and B->P2 are associated with each other in an important way is represented. Put another way, I can't see how you could take the SAM record A->P1 and "find" the corresponding sibling B->P2 while recognizing that B->P4 is not the correct sibling. If this information is in fact lost, does this mean that SAM is not expressive enough to capture ambiguous alignments for paired end reads? And won't it mean that downstream processors, e.g. CuffLinks, will not have access to important information that was originally available? Apologies for the long-winded question! thanks for your time, |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: University of Southern Denmark (SDU), Denmark Join Date: Apr 2009
Posts: 105
|
![]()
I believe the position of the mate is contained in field 8 and the distance between mates is contained in the field 9 in the SAM format so the SAM format should be able to contain enough information to correctly match P1 with P2 and P3 with P4. Briefly looking at my own SAM files produced by TopHat, it seems TopHat does not use field 9, but the mate position is reported in field 8 (if the mate is mapped).
|
![]() |
![]() |
![]() |
#3 |
Member
Location: Europe Join Date: Apr 2010
Posts: 46
|
![]()
Thanks Thomas.
Perhaps there is still a certain amount of ambiguity in the some cases, e.g.? A->P1 B->P2 A->P3 B->P2 (B is the same in both) In this scenario, presumably TopHat would output two B->P2 SAM records, each with a different field 8? If this is the case, field 8 of the A->P1 SAM record is now ambiguous (since it refers equally well to both B->P2s)? I wonder how CuffLinks etc. processes these kinds of scenarios - perhaps it doesn't need to deconvolute pairs? Thanks |
![]() |
![]() |
![]() |
Tags |
multireads, paired end, sam, tophat |
Thread Tools | |
|
|