View Single Post
Old 10-27-2010, 09:39 AM   #2
Daehwan
Member
 
Location: College Park

Join Date: Oct 2010
Posts: 27
Default

I think it's probably due to quality values, you know, the quality value of each base usually decreases from left to right. So, if we reverse-complement a read, the read will have low quality values in its first several bases and therefore it's less likely for the read to be mapped to a genome. Perhaps you check this running Bowtie with the original reads and the reverse-complemented reads as TopHat is based on Bowtie.

Bowtie allows 0~3 mismatches in the SEED of a read, which is 28bp by default, and allows additional mismatches in the rest depending on the sum of quality values, which I understand.


Regarding each 50,000 seq difference between the results A and B, TopHat divides unmapped reads into several segments to find novel junctions, and a set of segments from the original reads can be different from that of segments from the reverse-complemented reads depending on the read length and the segment length, rendering different set of putative junctions.

Other than this, the new version of TopHat at <http://tophat.cbcb.umd.edu/index.html> supports strand-specific RNA-Seq if you want to try it.

Thanks,
Daehwan

Last edited by Daehwan; 10-27-2010 at 10:06 AM.
Daehwan is offline   Reply With Quote