SEQanswers (
-   Bioinformatics (
-   -   Tophat2 very slow when running over Bowtie2 (

jdenvir 10-26-2012 09:28 AM

Tophat2 very slow when running over Bowtie2
We recently completed our first RNA-seq runs on an Illumina Hi-Seq 1000. These were paired-end 100bp reads, with ~30-50 million paired reads per sample. Some are human cell line, some human patient sample, and some are rat.

(This means I'm a newbie for tophat and bowtie.)

I've been trying to run alignments with Tophat2 for these reads. I was successful with the rat samples running Tophat 2.0.5 over Bowtie 0.12.8; however for some of the human samples the run crashed on long_spanning_reads.

I could fix this by switching to Bowtie, but the runs were taking prohibitively long using this version. Under Bowtie 0.12.8 the successful runs are completing in ~5-7 hours; under Bowtie the only one I have let run to completion has taken 40 hours. Since I have about 40 samples to process, this is not going to work. (I'm using Red Hat Linux with 40 processors committed to the job @ 2.00GHz. The machine has 128GB of RAM and is not running out of physical memory.)

According to this thread: the "long_spanning_reads" error is fixed in the latest version of tophat (2.0.6), so for now my solution is to run Tophat 2.0.6 over Bowtie 0.12.8. Hopefully this will work but I would really like to be running the most recent version of Bowtie.

The only non-default options I'm providing to tophat are -p 40 for the multithreading and --bowtie1 if I'm running against bowtie 0.12.8. I am not providing annotations, though I actually tried that and it made no difference either to the speed or to the long_spanning_reads error.

Has anyone experienced this drop in performance between bowtie 0.12.8 and bowtie when using tophat2? Any suggestions?

pettervikman 02-17-2013 11:39 PM

I've also noticed that tophat (v.2.0.7) is extremly slow when run in conjunction with bowtie2. I was going to have a look at my options ( -r 100 -m 1 -p 1 --coverage-search --microexon-search --library-type fr-unstranded) to see if there were anything there. According to the program the coverage search can be really slow but I've also found that the alignments per se is really slow. I multiplex several samples rather then having many cores on one samples so it takes more then one week per sample (human RNA, around 40-50 million reads).

We have started looking into star which is way (more then factor 10) faster. But since we haven't evaluated that just yet I'd really like to keep on using tophat.


jdenvir 02-18-2013 05:28 AM

I ended up running Tophat2 over Bowtie 1; the latest version of each resolved the problems I was having with crashing. It would be nice to understand why Bowtie2 is prohibitively slow, though.

All times are GMT -8. The time now is 07:38 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.