Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat2 "joining segment hits" does not complete

    Tophat2 runs nicely (6 hours) up to the step "joining segment hits". But this step (single core process, named "long_spanning_reads") is running now for almost one week.
    data: 454-sequencing reads up to 500 nucleotides long probably containing a lot of exon-exon junctions.
    Tophat version 2.0.0
    Bowtie version: 2.0.0.6
    here is the command:
    tophat2 -p 12 -o tophat_out genome 454_data.fastq.gz

    Does anyone has an idea what could cause this? Or maybe someone knows which parameter can be adjusted to reduce the time for this step.

    Thanks
    Last edited by Sniwells; 10-17-2012, 11:20 AM.

  • #2
    Hello Sniwells,

    I'm having the same problem here. I've 12 libraries from 2 runs (454) and Tophat gets stuck in that step. The most annoying thing is that in some libraries Tophat did a quick alignment (around 5-10h) but for 3 of them it took 1 week to complete, and I'm still waiting for last 4 to complete (more than 12 days).

    I've not found any topics related to this problem in this forum and neither have received any answers from the authors of Tophat. Since the libraries were have a very similar amount of reads and 454 does not seem to be the most popular choice for RNA-seq, my thought is that this might have to do with the length of the reads (which in 454 data is way bigger than in Illumina's).

    Anybody's got a clue?

    Comment


    • #3
      did you put "--no-coverage-search"? if not it will take very long time.

      Comment


      • #4
        Right after your suggestion I started tophat with the following command:
        tophat2 --no-coverage-search -p 12 -o tophat_out genome 454_data.fastq.gz
        But it seems as if this parameter does not solve this problem, because tophat2 is stucked at the same point since the day of of your post.
        Maybe tophat is not designed for long 454 reads?

        Comment


        • #5
          at which step?
          when tophat is writing segment, junction files, it will take a few days or even a week.

          Originally posted by Sniwells View Post
          Right after your suggestion I started tophat with the following command:
          tophat2 --no-coverage-search -p 12 -o tophat_out genome 454_data.fastq.gz
          But it seems as if this parameter does not solve this problem, because tophat2 is stucked at the same point since the day of of your post.
          Maybe tophat is not designed for long 454 reads?
          Last edited by HSV-1; 10-22-2012, 05:38 PM.

          Comment


          • #6
            Originally posted by HSV-1 View Post
            at which step?
            The same step:
            "Tophat2 runs nicely (6 hours) up to the step "joining segment hits". But this step (single core process, named "long_spanning_reads") is running now for almost one week."

            Comment


            • #7
              sort of normal.
              be sure your que is tolerant for this comsumed time or it will be killed w/o accomplishment.

              Originally posted by Sniwells View Post
              The same step:
              "Tophat2 runs nicely (6 hours) up to the step "joining segment hits". But this step (single core process, named "long_spanning_reads") is running now for almost one week."

              Comment


              • #8
                Originally posted by HSV-1 View Post
                sort of normal.
                be sure your que is tolerant for this comsumed time or it will be killed w/o accomplishment.
                The process is still running, (14 days). Let's see if there will be a happy end.

                Comment


                • #9
                  I stopped the process after it was running for nearly a month.
                  Does anyone has run tophat with long 454-reads successfully?

                  Comment


                  • #10
                    I didn't know your reads are from roche 454.
                    There is special protocol for long reads.

                    Comment


                    • #11
                      Hello again!

                      I've finally managed to make Tophat2 work on my problematic 454 reads. What I did is splitting my original fastq file into several ones and run Tophat separately for each of them. Then take the sub-file that takes longer to finish, split it in sub-sub-files and run Tophat again on each of them.

                      After several rounds, I came across a single read that, if erased in the original fastq file, makes Tophat work smooth and fast.

                      I still don't know what makes those reads special as they are not the longest, nor the shortest, nor showing bad quality...

                      Anyway, hope it works.
                      Pablo

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin


                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                        Yesterday, 07:01 AM
                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      39 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      41 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      35 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      55 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X