SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Very long tophat run seq_newbie Bioinformatics 3 04-15-2014 07:02 AM
Mi-Seq Run Time? bwking82 Illumina/Solexa 0 01-09-2012 07:46 AM
How to run Tophat with annotation file masylichu Bioinformatics 2 09-06-2011 07:25 PM
TopHat approximate run time & memory usage? xinchen Bioinformatics 4 05-18-2010 02:47 AM
SOLiD 4 run time Bruins SOLiD 3 03-04-2010 12:34 AM

Reply
 
Thread Tools
Old 06-21-2010, 05:29 AM   #1
bassu
Junior Member
 
Location: India

Join Date: Jun 2010
Posts: 5
Question Tophat Run time

Dear all,

I am currently running tophat to align human genome reference(hg19 bowtie index) to illumina pair-end data of size 17Gb each(17*2). Its almost 24 houurs and its still running. I wonder how much more time it would take to finish the process? . Concurrently i am also running Maq for 75bp single read of size 15GB with a reference size of 30mb(binary human genome file).

My system configuration is 64Gb Ram with 8 core processor(Which i feel is one among best configuration available in industry. Do i need to update my system configuration for NGS data analysis?If so please provide me the config).

And i also like to know how much processing time it would take if i run the Tophat and Maq separately?

Hopping for a speedy reply asap.

Thanks
bassu is offline   Reply With Quote
Old 06-21-2010, 07:03 AM   #2
Rao
Member
 
Location: India

Join Date: Oct 2008
Posts: 36
Default

try with bowtie...
is it RNA-seq data?
Rao is offline   Reply With Quote
Old 06-21-2010, 07:09 AM   #3
lmf_bill
Member
 
Location: New Haven

Join Date: Jul 2008
Posts: 36
Default

Using tophat, especial for paired-end reads, it will take long time. In my personal point, your configuration is enough.
BTW, tophat will produce huge tmp file.
lmf_bill is offline   Reply With Quote
Old 06-21-2010, 08:48 AM   #4
john_mu
Member
 
Location: Stanford, CA

Join Date: May 2010
Posts: 88
Default

Did you run TopHat with multiple threads?

If you are running it with only one thread 17Gb of reads will take several days to run...

What is your read length? Long reads also take much longer than short ones (for the same amount of data).

Running TopHat and Maq at the same time should not cause much problems (Unless you ran TopHat with 8 threads)

Regarding your system configuration, 64Gb should be plenty of RAM for your amount of data.
__________________
SpliceMap: De novo detection of splice junctions from RNA-seq
Download SpliceMap Comment here
john_mu is offline   Reply With Quote
Old 06-22-2010, 12:36 AM   #5
bassu
Junior Member
 
Location: India

Join Date: Jun 2010
Posts: 5
Default

Thanks all,
@Rao: yes i am using RNA-seq data.

@mf_bill: thanks for your valuable comment, I was wondering whether my system configuration was right ? Even though BWT, tophat will produce huge tmp files.. it will get deleted automatically right?

@john_mu: Thanks john, currently i'm running my reads in a single thread. and my read length is 50bp.
bassu is offline   Reply With Quote
Old 10-05-2010, 02:41 PM   #6
DineshCyanam
Compendia Bio
 
Location: Ann Arbor

Join Date: Oct 2010
Posts: 35
Default

@Bassu: So how long did it take for you to finally finish the run? Were u able to reduce the run time in any way? I am having the same problem here...
DineshCyanam is offline   Reply With Quote
Old 10-06-2010, 08:44 AM   #7
mrawlins
Member
 
Location: Retirement - Not working with bioinformatics anymore.

Join Date: Apr 2010
Posts: 63
Default

Running with n-1 threads on an n-core machine (e.g. 7 threads on a machine with 8 cores) should speed things up. Bowtie has a --shmem option for using shared memory for all threads, so that shouldn't increase the memory footprint by much to use that many threads. I've observed roughly a linear speed up with the number of cores dedicated to bowtie; I suspect similar results for tophat.
I will sometimes run with as many threads as cores, but only if I don't intend to use the computer for anything else while the program runs (i.e. a compute node on our analysis cluster).
mrawlins is offline   Reply With Quote
Old 09-17-2013, 10:34 AM   #8
crazyhottommy
Senior Member
 
Location: Gainesville

Join Date: Apr 2012
Posts: 140
Default

I was running Tophat to map a 24G single end RNA-seq fastq to hg19 with the gtf from GENECODE
I run it in a cluster with 1 node, 8 processors, ram=3gb
it took me 45hrs to finish.....

Any way to speed it up? as a regular user of the cluster, the above setting is the max resource I can have.
crazyhottommy is offline   Reply With Quote
Old 09-17-2013, 11:17 AM   #9
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Quote:
Originally Posted by crazyhottommy View Post
I was running Tophat to map a 24G single end RNA-seq fastq to hg19 with the gtf from GENECODE
I run it in a cluster with 1 node, 8 processors, ram=3gb
it took me 45hrs to finish.....

Any way to speed it up? as a regular user of the cluster, the above setting is the max resource I can have.
Split the fastq file and use multiple nodes. If you had more RAM, you could run STAR.
dpryan is offline   Reply With Quote
Old 09-17-2013, 02:54 PM   #10
shi
Wei Shi
 
Location: Australia

Join Date: Feb 2010
Posts: 235
Default

You may try Subread, which is >10 times faster.
shi is offline   Reply With Quote
Old 09-17-2013, 05:47 PM   #11
crazyhottommy
Senior Member
 
Location: Gainesville

Join Date: Apr 2012
Posts: 140
Default

Quote:
Originally Posted by shi View Post
You may try Subread, which is >10 times faster.
I will give it a shot. Thanks
crazyhottommy is offline   Reply With Quote
Old 09-17-2013, 09:59 PM   #12
adamyao
Member
 
Location: Taiwan

Join Date: Feb 2011
Posts: 19
Default

20.8G bases (104M reads, Pair End) 1 node (16 cores) AMD 2.4GHz 96G ram
- 28 hours
Tophat 2 runs best with 16 cores ( single node) according to our tests otherwise it takes longer.
STAR runs much faster ( less than 40 minutes) but needs a lot more memory (64 cores 128G ram).
adamyao is offline   Reply With Quote
Reply

Tags
maq, maq memory, ngs data analysisi, tophat, tophat memory

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:41 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO