Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Sniwells
    Junior Member
    • Sep 2012
    • 7

    Tophat2 "joining segment hits" does not complete

    Tophat2 runs nicely (6 hours) up to the step "joining segment hits". But this step (single core process, named "long_spanning_reads") is running now for almost one week.
    data: 454-sequencing reads up to 500 nucleotides long probably containing a lot of exon-exon junctions.
    Tophat version 2.0.0
    Bowtie version: 2.0.0.6
    here is the command:
    tophat2 -p 12 -o tophat_out genome 454_data.fastq.gz

    Does anyone has an idea what could cause this? Or maybe someone knows which parameter can be adjusted to reduce the time for this step.

    Thanks
    Last edited by Sniwells; 10-17-2012, 11:20 AM.
  • prios
    Junior Member
    • Jun 2012
    • 4

    #2
    Hello Sniwells,

    I'm having the same problem here. I've 12 libraries from 2 runs (454) and Tophat gets stuck in that step. The most annoying thing is that in some libraries Tophat did a quick alignment (around 5-10h) but for 3 of them it took 1 week to complete, and I'm still waiting for last 4 to complete (more than 12 days).

    I've not found any topics related to this problem in this forum and neither have received any answers from the authors of Tophat. Since the libraries were have a very similar amount of reads and 454 does not seem to be the most popular choice for RNA-seq, my thought is that this might have to do with the length of the reads (which in 454 data is way bigger than in Illumina's).

    Anybody's got a clue?

    Comment

    • HSV-1
      Member
      • Jul 2012
      • 38

      #3
      did you put "--no-coverage-search"? if not it will take very long time.

      Comment

      • Sniwells
        Junior Member
        • Sep 2012
        • 7

        #4
        Right after your suggestion I started tophat with the following command:
        tophat2 --no-coverage-search -p 12 -o tophat_out genome 454_data.fastq.gz
        But it seems as if this parameter does not solve this problem, because tophat2 is stucked at the same point since the day of of your post.
        Maybe tophat is not designed for long 454 reads?

        Comment

        • HSV-1
          Member
          • Jul 2012
          • 38

          #5
          at which step?
          when tophat is writing segment, junction files, it will take a few days or even a week.

          Originally posted by Sniwells View Post
          Right after your suggestion I started tophat with the following command:
          tophat2 --no-coverage-search -p 12 -o tophat_out genome 454_data.fastq.gz
          But it seems as if this parameter does not solve this problem, because tophat2 is stucked at the same point since the day of of your post.
          Maybe tophat is not designed for long 454 reads?
          Last edited by HSV-1; 10-22-2012, 05:38 PM.

          Comment

          • Sniwells
            Junior Member
            • Sep 2012
            • 7

            #6
            Originally posted by HSV-1 View Post
            at which step?
            The same step:
            "Tophat2 runs nicely (6 hours) up to the step "joining segment hits". But this step (single core process, named "long_spanning_reads") is running now for almost one week."

            Comment

            • HSV-1
              Member
              • Jul 2012
              • 38

              #7
              sort of normal.
              be sure your que is tolerant for this comsumed time or it will be killed w/o accomplishment.

              Originally posted by Sniwells View Post
              The same step:
              "Tophat2 runs nicely (6 hours) up to the step "joining segment hits". But this step (single core process, named "long_spanning_reads") is running now for almost one week."

              Comment

              • Sniwells
                Junior Member
                • Sep 2012
                • 7

                #8
                Originally posted by HSV-1 View Post
                sort of normal.
                be sure your que is tolerant for this comsumed time or it will be killed w/o accomplishment.
                The process is still running, (14 days). Let's see if there will be a happy end.

                Comment

                • Sniwells
                  Junior Member
                  • Sep 2012
                  • 7

                  #9
                  I stopped the process after it was running for nearly a month.
                  Does anyone has run tophat with long 454-reads successfully?

                  Comment

                  • HSV-1
                    Member
                    • Jul 2012
                    • 38

                    #10
                    I didn't know your reads are from roche 454.
                    There is special protocol for long reads.

                    Comment

                    • prios
                      Junior Member
                      • Jun 2012
                      • 4

                      #11
                      Hello again!

                      I've finally managed to make Tophat2 work on my problematic 454 reads. What I did is splitting my original fastq file into several ones and run Tophat separately for each of them. Then take the sub-file that takes longer to finish, split it in sub-sub-files and run Tophat again on each of them.

                      After several rounds, I came across a single read that, if erased in the original fastq file, makes Tophat work smooth and fast.

                      I still don't know what makes those reads special as they are not the longest, nor the shortest, nor showing bad quality...

                      Anyway, hope it works.
                      Pablo

                      Comment

                      Latest Articles

                      Collapse

                      • GATTACAT
                        Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                        by GATTACAT
                        Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                        Today, 11:43 AM
                      • SEQadmin2
                        Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                        by SEQadmin2


                        I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                        Here are nine questions we think about, in roughly the order they matter, before...
                        06-18-2026, 07:11 AM
                      • SEQadmin2
                        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                        by SEQadmin2


                        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                        ...
                        06-02-2026, 10:05 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, Yesterday, 05:37 AM
                      0 responses
                      7 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-26-2026, 11:10 AM
                      0 responses
                      17 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-17-2026, 06:09 AM
                      0 responses
                      51 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-09-2026, 11:58 AM
                      0 responses
                      110 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...