Go Back   SEQanswers > Applications Forums > RNA Sequencing

Similar Threads
Thread Thread Starter Forum Replies Last Post
Optimizing tophat mapping for mixed RNA-Seq data bob-loblaw Bioinformatics 5 02-15-2013 02:22 AM
velveth_de running time and mem Ori Bioinformatics 10 11-03-2011 04:39 PM
Optimizing settings for mapping 100bp RNA-seq reads with tophat seqhorn Bioinformatics 3 10-21-2011 10:40 PM
Bambus running time question user1313 Bioinformatics 0 07-04-2011 01:40 AM
Shortening running time of MISO? hong_sunwoo Bioinformatics 1 04-07-2011 12:38 PM

Thread Tools
Old 03-13-2013, 06:33 AM   #1
Junior Member
Location: CT

Join Date: Jan 2013
Posts: 2
Default Optimizing tophat running time

Hi friends,

I'm trying to align 50bp paired-end Illumina reads to the mm10 genome/transcriptome with tophat 2.0.8. I've done a few runs on our local desktop Mac Pros to get an idea of how the software is working, and now I'm starting to migrate this onto our local high performance computing cluster in the hopes of running the alignments faster (or at least running them in parallel rather than sequentially) as these are large data sets. I'm wondering if anyone has advice on how much computational resources I should request per data set to get them to run quickly, without hogging the system, given that not every step in the tophat pipeline is multi-threaded?

From my pilot alignments and from reading these forums, I understand that the segment_juncs step is single-threaded and time-consuming -- will this step run more quickly if more memory is available to it? (i.e., Is it "fair" to request more cores from the system just to have their memory? Does the speed of this step scale at all? In my pilots, the run time has been quite variable, and I haven't been able to correlate it with anything obvious.)

Empirically I've also seen that setting the -p value to less than the actual number of cores available is also necessary to avoid problems when the tophat script shell tries to invoke samtools or other processes while running the alignment and output steps, but it is not totally clear to me what a good value for this should be to avoid problems. Is there any good rule of thumb here, like p="number of cores available" - "some particular constant that I don't know"?

Thanks in advance for any advice you can provide. The computational side of this is pretty intimidating to a bench biologist, and I've tried to RTFM as best I can understand it, I swear I have!
kevin9y9 is offline   Reply With Quote
Old 03-13-2013, 07:09 AM   #2
not just another member
Location: Belgium

Join Date: Aug 2010
Posts: 264

If you want speed, try STAR. It's much more faster than tophat (and the results seems even better). The only thing is that STAR use much RAM than tophat .

If you want to use tophat, use not to much process. I think a maximum of 10 is ok.
NicoBxl is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 07:46 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO