SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Different results produced by tophat? (http://seqanswers.com/forums/showthread.php?t=7478)

kentnf 10-23-2010 08:09 PM

Different results produced by tophat?
 
I use tophat to map my strand-specific RNA-seq seqs and its reverse complementary seqs to genome, but the results have a little difference.

For example, I have 5700000 seqs in one sample after sequencing, 5000000 of them mapped to genome by tophat. For reverse complementary seqs, maybe 4998000 of them were mapped to genome.

command I used:
$tophat -o output1 --segment-mismatches 0 genome_index RNA-seq.fa

Then I compare mapped seqs of different results. There are 50000 seqs in the results A (5000000 mapped, normal strand) are not present in result B (4998000 mapped, reverse complementary strand). Another 50000 seqs are not present in result A but included in result B.

Who can tell me why the results is different ? Thanks.

Daehwan 10-27-2010 09:39 AM

I think it's probably due to quality values, you know, the quality value of each base usually decreases from left to right. So, if we reverse-complement a read, the read will have low quality values in its first several bases and therefore it's less likely for the read to be mapped to a genome. Perhaps you check this running Bowtie with the original reads and the reverse-complemented reads as TopHat is based on Bowtie.

Bowtie allows 0~3 mismatches in the SEED of a read, which is 28bp by default, and allows additional mismatches in the rest depending on the sum of quality values, which I understand.


Regarding each 50,000 seq difference between the results A and B, TopHat divides unmapped reads into several segments to find novel junctions, and a set of segments from the original reads can be different from that of segments from the reverse-complemented reads depending on the read length and the segment length, rendering different set of putative junctions.

Other than this, the new version of TopHat at <http://tophat.cbcb.umd.edu/index.html> supports strand-specific RNA-Seq if you want to try it.

Thanks,
Daehwan

kentnf 10-29-2010 09:35 AM

Thank you Daehwan.

Before map reads to genome using tophat, contamination has already been cleaned. So I use fasta as input of tophat. And then I use bowtie map forward and reverse seqs to genome, the results are same.

After check the cigar field of sam file, I think you are right. All the different parts of forward and reverse results are unmapped reads. When I set the parameter -a to 4, the different parts of results are reduced.
Quote:

Originally Posted by Daehwan (Post 27967)
Regarding each 50,000 seq difference between the results A and B, TopHat divides unmapped reads into several segments to find novel junctions, and a set of segments from the original reads can be different from that of segments from the reverse-complemented reads depending on the read length and the segment length, rendering different set of putative junctions.

And I will try the new version of tophat. Thanks again


All times are GMT -8. The time now is 06:45 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.