Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • kentnf
    Member
    • Jan 2009
    • 26

    Different results produced by tophat?

    I use tophat to map my strand-specific RNA-seq seqs and its reverse complementary seqs to genome, but the results have a little difference.

    For example, I have 5700000 seqs in one sample after sequencing, 5000000 of them mapped to genome by tophat. For reverse complementary seqs, maybe 4998000 of them were mapped to genome.

    command I used:
    $tophat -o output1 --segment-mismatches 0 genome_index RNA-seq.fa

    Then I compare mapped seqs of different results. There are 50000 seqs in the results A (5000000 mapped, normal strand) are not present in result B (4998000 mapped, reverse complementary strand). Another 50000 seqs are not present in result A but included in result B.

    Who can tell me why the results is different ? Thanks.
  • Daehwan
    Member
    • Oct 2010
    • 27

    #2
    I think it's probably due to quality values, you know, the quality value of each base usually decreases from left to right. So, if we reverse-complement a read, the read will have low quality values in its first several bases and therefore it's less likely for the read to be mapped to a genome. Perhaps you check this running Bowtie with the original reads and the reverse-complemented reads as TopHat is based on Bowtie.

    Bowtie allows 0~3 mismatches in the SEED of a read, which is 28bp by default, and allows additional mismatches in the rest depending on the sum of quality values, which I understand.


    Regarding each 50,000 seq difference between the results A and B, TopHat divides unmapped reads into several segments to find novel junctions, and a set of segments from the original reads can be different from that of segments from the reverse-complemented reads depending on the read length and the segment length, rendering different set of putative junctions.

    Other than this, the new version of TopHat at <http://tophat.cbcb.umd.edu/index.html> supports strand-specific RNA-Seq if you want to try it.

    Thanks,
    Daehwan
    Last edited by Daehwan; 10-27-2010, 10:06 AM.

    Comment

    • kentnf
      Member
      • Jan 2009
      • 26

      #3
      Thank you Daehwan.

      Before map reads to genome using tophat, contamination has already been cleaned. So I use fasta as input of tophat. And then I use bowtie map forward and reverse seqs to genome, the results are same.

      After check the cigar field of sam file, I think you are right. All the different parts of forward and reverse results are unmapped reads. When I set the parameter -a to 4, the different parts of results are reduced.
      Originally posted by Daehwan View Post
      Regarding each 50,000 seq difference between the results A and B, TopHat divides unmapped reads into several segments to find novel junctions, and a set of segments from the original reads can be different from that of segments from the reverse-complemented reads depending on the read length and the segment length, rendering different set of putative junctions.
      And I will try the new version of tophat. Thanks again

      Comment

      Latest Articles

      Collapse

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, 06-09-2026, 11:58 AM
      0 responses
      17 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-05-2026, 10:09 AM
      0 responses
      27 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-04-2026, 08:59 AM
      0 responses
      38 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 12:03 PM
      0 responses
      61 views
      0 reactions
      Last Post SEQadmin2  
      Working...