SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to combine junctions.bed files produced by TopHat HTS Bioinformatics 8 05-03-2015 02:33 AM
tophat results: chr4-9 missing ?? IrisZhu Bioinformatics 0 08-30-2010 03:35 AM
TopHat: the results confused me Maria_Lu Bioinformatics 2 05-14-2010 06:54 PM
column meaning of hits_accepted.sam produced from TopHat jiwu2573 Bioinformatics 5 03-08-2010 07:10 PM
Tophat and Bowtie results baohua100 Bioinformatics 6 08-26-2009 11:17 PM

Reply
 
Thread Tools
Old 10-23-2010, 08:09 PM   #1
kentnf
Member
 
Location: Ithaca

Join Date: Jan 2009
Posts: 26
Default Different results produced by tophat?

I use tophat to map my strand-specific RNA-seq seqs and its reverse complementary seqs to genome, but the results have a little difference.

For example, I have 5700000 seqs in one sample after sequencing, 5000000 of them mapped to genome by tophat. For reverse complementary seqs, maybe 4998000 of them were mapped to genome.

command I used:
$tophat -o output1 --segment-mismatches 0 genome_index RNA-seq.fa

Then I compare mapped seqs of different results. There are 50000 seqs in the results A (5000000 mapped, normal strand) are not present in result B (4998000 mapped, reverse complementary strand). Another 50000 seqs are not present in result A but included in result B.

Who can tell me why the results is different ? Thanks.
kentnf is offline   Reply With Quote
Old 10-27-2010, 09:39 AM   #2
Daehwan
Member
 
Location: College Park

Join Date: Oct 2010
Posts: 27
Default

I think it's probably due to quality values, you know, the quality value of each base usually decreases from left to right. So, if we reverse-complement a read, the read will have low quality values in its first several bases and therefore it's less likely for the read to be mapped to a genome. Perhaps you check this running Bowtie with the original reads and the reverse-complemented reads as TopHat is based on Bowtie.

Bowtie allows 0~3 mismatches in the SEED of a read, which is 28bp by default, and allows additional mismatches in the rest depending on the sum of quality values, which I understand.


Regarding each 50,000 seq difference between the results A and B, TopHat divides unmapped reads into several segments to find novel junctions, and a set of segments from the original reads can be different from that of segments from the reverse-complemented reads depending on the read length and the segment length, rendering different set of putative junctions.

Other than this, the new version of TopHat at <http://tophat.cbcb.umd.edu/index.html> supports strand-specific RNA-Seq if you want to try it.

Thanks,
Daehwan

Last edited by Daehwan; 10-27-2010 at 10:06 AM.
Daehwan is offline   Reply With Quote
Old 10-29-2010, 09:35 AM   #3
kentnf
Member
 
Location: Ithaca

Join Date: Jan 2009
Posts: 26
Default

Thank you Daehwan.

Before map reads to genome using tophat, contamination has already been cleaned. So I use fasta as input of tophat. And then I use bowtie map forward and reverse seqs to genome, the results are same.

After check the cigar field of sam file, I think you are right. All the different parts of forward and reverse results are unmapped reads. When I set the parameter -a to 4, the different parts of results are reduced.
Quote:
Originally Posted by Daehwan View Post
Regarding each 50,000 seq difference between the results A and B, TopHat divides unmapped reads into several segments to find novel junctions, and a set of segments from the original reads can be different from that of segments from the reverse-complemented reads depending on the read length and the segment length, rendering different set of putative junctions.
And I will try the new version of tophat. Thanks again
kentnf is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:43 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO