We recently completed our first RNA-seq runs on an Illumina Hi-Seq 1000. These were paired-end 100bp reads, with ~30-50 million paired reads per sample. Some are human cell line, some human patient sample, and some are rat.
(This means I'm a newbie for tophat and bowtie.)
I've been trying to run alignments with Tophat2 for these reads. I was successful with the rat samples running Tophat 2.0.5 over Bowtie 0.12.8; however for some of the human samples the run crashed on long_spanning_reads.
I could fix this by switching to Bowtie 2.0.0.7, but the runs were taking prohibitively long using this version. Under Bowtie 0.12.8 the successful runs are completing in ~5-7 hours; under Bowtie 2.0.0.7 the only one I have let run to completion has taken 40 hours. Since I have about 40 samples to process, this is not going to work. (I'm using Red Hat Linux with 40 processors committed to the job @ 2.00GHz. The machine has 128GB of RAM and is not running out of physical memory.)
According to this thread: http://seqanswers.com/forums/showthread.php?t=22438 the "long_spanning_reads" error is fixed in the latest version of tophat (2.0.6), so for now my solution is to run Tophat 2.0.6 over Bowtie 0.12.8. Hopefully this will work but I would really like to be running the most recent version of Bowtie.
The only non-default options I'm providing to tophat are -p 40 for the multithreading and --bowtie1 if I'm running against bowtie 0.12.8. I am not providing annotations, though I actually tried that and it made no difference either to the speed or to the long_spanning_reads error.
Has anyone experienced this drop in performance between bowtie 0.12.8 and bowtie 2.0.0.7 when using tophat2? Any suggestions?
(This means I'm a newbie for tophat and bowtie.)
I've been trying to run alignments with Tophat2 for these reads. I was successful with the rat samples running Tophat 2.0.5 over Bowtie 0.12.8; however for some of the human samples the run crashed on long_spanning_reads.
I could fix this by switching to Bowtie 2.0.0.7, but the runs were taking prohibitively long using this version. Under Bowtie 0.12.8 the successful runs are completing in ~5-7 hours; under Bowtie 2.0.0.7 the only one I have let run to completion has taken 40 hours. Since I have about 40 samples to process, this is not going to work. (I'm using Red Hat Linux with 40 processors committed to the job @ 2.00GHz. The machine has 128GB of RAM and is not running out of physical memory.)
According to this thread: http://seqanswers.com/forums/showthread.php?t=22438 the "long_spanning_reads" error is fixed in the latest version of tophat (2.0.6), so for now my solution is to run Tophat 2.0.6 over Bowtie 0.12.8. Hopefully this will work but I would really like to be running the most recent version of Bowtie.
The only non-default options I'm providing to tophat are -p 40 for the multithreading and --bowtie1 if I'm running against bowtie 0.12.8. I am not providing annotations, though I actually tried that and it made no difference either to the speed or to the long_spanning_reads error.
Has anyone experienced this drop in performance between bowtie 0.12.8 and bowtie 2.0.0.7 when using tophat2? Any suggestions?
Comment