Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat2 very slow when running over Bowtie2

    We recently completed our first RNA-seq runs on an Illumina Hi-Seq 1000. These were paired-end 100bp reads, with ~30-50 million paired reads per sample. Some are human cell line, some human patient sample, and some are rat.

    (This means I'm a newbie for tophat and bowtie.)

    I've been trying to run alignments with Tophat2 for these reads. I was successful with the rat samples running Tophat 2.0.5 over Bowtie 0.12.8; however for some of the human samples the run crashed on long_spanning_reads.

    I could fix this by switching to Bowtie 2.0.0.7, but the runs were taking prohibitively long using this version. Under Bowtie 0.12.8 the successful runs are completing in ~5-7 hours; under Bowtie 2.0.0.7 the only one I have let run to completion has taken 40 hours. Since I have about 40 samples to process, this is not going to work. (I'm using Red Hat Linux with 40 processors committed to the job @ 2.00GHz. The machine has 128GB of RAM and is not running out of physical memory.)

    According to this thread: http://seqanswers.com/forums/showthread.php?t=22438 the "long_spanning_reads" error is fixed in the latest version of tophat (2.0.6), so for now my solution is to run Tophat 2.0.6 over Bowtie 0.12.8. Hopefully this will work but I would really like to be running the most recent version of Bowtie.

    The only non-default options I'm providing to tophat are -p 40 for the multithreading and --bowtie1 if I'm running against bowtie 0.12.8. I am not providing annotations, though I actually tried that and it made no difference either to the speed or to the long_spanning_reads error.

    Has anyone experienced this drop in performance between bowtie 0.12.8 and bowtie 2.0.0.7 when using tophat2? Any suggestions?

  • #2
    I've also noticed that tophat (v.2.0.7) is extremly slow when run in conjunction with bowtie2. I was going to have a look at my options ( -r 100 -m 1 -p 1 --coverage-search --microexon-search --library-type fr-unstranded) to see if there were anything there. According to the program the coverage search can be really slow but I've also found that the alignments per se is really slow. I multiplex several samples rather then having many cores on one samples so it takes more then one week per sample (human RNA, around 40-50 million reads).

    We have started looking into star which is way (more then factor 10) faster. But since we haven't evaluated that just yet I'd really like to keep on using tophat.

    /Petter

    Comment


    • #3
      I ended up running Tophat2 over Bowtie 1; the latest version of each resolved the problems I was having with crashing. It would be nice to understand why Bowtie2 is prohibitively slow, though.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:37 PM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 06:07 PM
      0 responses
      10 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      51 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      68 views
      0 likes
      Last Post seqadmin  
      Working...
      X