View Single Post
Old 10-26-2012, 10:28 AM   #1
jdenvir
Junior Member
 
Location: Huntington, WV USA

Join Date: Dec 2011
Posts: 6
Default Tophat2 very slow when running over Bowtie2

We recently completed our first RNA-seq runs on an Illumina Hi-Seq 1000. These were paired-end 100bp reads, with ~30-50 million paired reads per sample. Some are human cell line, some human patient sample, and some are rat.

(This means I'm a newbie for tophat and bowtie.)

I've been trying to run alignments with Tophat2 for these reads. I was successful with the rat samples running Tophat 2.0.5 over Bowtie 0.12.8; however for some of the human samples the run crashed on long_spanning_reads.

I could fix this by switching to Bowtie 2.0.0.7, but the runs were taking prohibitively long using this version. Under Bowtie 0.12.8 the successful runs are completing in ~5-7 hours; under Bowtie 2.0.0.7 the only one I have let run to completion has taken 40 hours. Since I have about 40 samples to process, this is not going to work. (I'm using Red Hat Linux with 40 processors committed to the job @ 2.00GHz. The machine has 128GB of RAM and is not running out of physical memory.)

According to this thread: http://seqanswers.com/forums/showthread.php?t=22438 the "long_spanning_reads" error is fixed in the latest version of tophat (2.0.6), so for now my solution is to run Tophat 2.0.6 over Bowtie 0.12.8. Hopefully this will work but I would really like to be running the most recent version of Bowtie.

The only non-default options I'm providing to tophat are -p 40 for the multithreading and --bowtie1 if I'm running against bowtie 0.12.8. I am not providing annotations, though I actually tried that and it made no difference either to the speed or to the long_spanning_reads error.

Has anyone experienced this drop in performance between bowtie 0.12.8 and bowtie 2.0.0.7 when using tophat2? Any suggestions?
jdenvir is offline   Reply With Quote