Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Missing Junctions in Tophat! ( after providing a known junctions & gene models files)

    I am using Tophat for aligning unpaired RNAseq reads (36 bases each) and I had a question about junction calling in Tophat. I tried running Tophat with both -j (a known junctions file from Ensembl) and -G (known gene model annotations file from Ensembl) and also ran Cufflinks on the results.

    In both of the runs, if I compare the results with the actual known transcripts from Ensembl, it seems like I am missing many known junctions.

    To Illustrate, say if Ensembl has a Transcript with say 5 exons. Cufflinks annotates the same region as containing 3 transcripts (Exon 1-2 as Transcript 1, Exon 3-4 as Transcript 2 and Exon 5 as Transcript 3).

    If I have understood the paper correctly, this is because even though Tophat has the coordinates of the known junctions it didn't find enough IUM reads spanning those particular junctions (under the given parameters) to allow it to call it junction? Is that correct..?

    Under that assumption, I tried running Tophat with --min-anchor-length = 5 (reduced from default 8) and --min-isoform-fraction 0.1 (reduced from default 0.1). But I didn't get any improvement in finding more junctions. (The junctions.bed file has the exact same number of lines)

    Does anyone have any suggestions on what else I can try to improve junction calling?

    Also, given the Genomic coordinates of a splice junction, is there a way I can extract, from the Tophat output, the no of IUM (Initially Unmapped reads) that Tophat mapped to span that particular junction?

    thanks,

    Avinash

  • #2
    Hi Avinash,

    In both of the runs, if I compare the results with the actual known transcripts from Ensembl, it seems like I am missing many known junctions.
    For any one tissue type, a certain (possibly substantial) fraction of the known junctions will not be present simply due to tissue-specific expression of different isoforms. As such, I wouldn't worry about this part of your question.

    Also, given the Genomic coordinates of a splice junction, is there a way I can extract, from the Tophat output, the no of IUM (Initially Unmapped reads) that Tophat mapped to span that particular junction?
    I doubt this is possible, unless you are MUCH better than hacking into the code and the tmp files than I am.

    Best of luck,

    Shurjo

    Comment


    • #3
      You could try reducing the segment-length to 1/2 of your read length. I believe that you do not get any mapped splice junctions if your segment-length is greater than 1/2 the read length.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:37 PM
      0 responses
      11 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 06:07 PM
      0 responses
      10 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      51 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      68 views
      0 likes
      Last Post seqadmin  
      Working...
      X