SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Runtime of Bowtie wthistle Bioinformatics 0 03-18-2013 10:24 AM
RepeatModeler runtime sunhh Bioinformatics 0 10-02-2012 10:14 AM
Cufflinks Runtime ksiowa Bioinformatics 10 04-27-2012 05:52 AM
GATK pipeline runtime alexbmp Bioinformatics 12 11-11-2011 07:57 PM
Approximate botwie runtime BioSlayer Bioinformatics 6 05-09-2011 05:49 AM

Reply
 
Thread Tools
Old 06-27-2013, 05:35 PM   #1
wilson90
Member
 
Location: Singapore

Join Date: May 2012
Posts: 48
Default Cufflinks runtime

Dear all,

I have a pair-end RNA sample (80 millions reads in total), which has been aligned using tophat2, and run cufflinks on it. It has been 1 day but the program is still at the "Inspecting reads and determining fragment length distribution." phase. Is there anything wrong? Is there a case where Cufflinks went into infinite looping?

I use -g options. My Cufflinks is not of the latest version. I am running it on a clustered server.

Thank you.

Wilson
wilson90 is offline   Reply With Quote
Old 06-27-2013, 05:36 PM   #2
wilson90
Member
 
Location: Singapore

Join Date: May 2012
Posts: 48
Default

Additional information on command:
cufflinks -g refGenes.gtf -p 35 -u -N --total-hits-norm -b genome.fa -q accepted_hits.bam
wilson90 is offline   Reply With Quote
Old 06-28-2013, 10:12 AM   #3
gesdys
Junior Member
 
Location: philadelphia

Join Date: Jun 2013
Posts: 5
Default

I have the same problem, cufflink it's very very slow, and apparently it's not using the multithreading (option -p) properly. I don't know whether this is a bug or is the way they created.
gesdys is offline   Reply With Quote
Old 06-28-2013, 10:30 AM   #4
amarth
Member
 
Location: Mexico City

Join Date: Dec 2012
Posts: 14
Smile

Cufflinks takes a couples of hours with small samples, I've run 60 millions reads and it takes around 10 hours to complete, I have a Linux Workstation with these specs:

Dual-Core Intel Xeon Processor 5150, 4Cores, 2.66 GHz,, 16 GB DDR2 RAM and 1 TB HDD


If you're thinking if it's a problem, I recommend you to add the option [verbose] to make Cufflinks a little more informative of what's going on.

Samples Quality ALSO plays an important role to the time it takes to analize transcripts

Salutes
amarth is offline   Reply With Quote
Old 06-28-2013, 01:16 PM   #5
gesdys
Junior Member
 
Location: philadelphia

Join Date: Jun 2013
Posts: 5
Default

did you check if the multi threading is working correctly?
gesdys is offline   Reply With Quote
Old 06-28-2013, 10:18 PM   #6
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default

i'm not certain about this but i think there's a difference between multi-threadded programming (using multiple cores on a single workstation) and parallel programming (for a computing cluster). cufflinks is most certainly designed for multi-threadded execution on a single workstation but I don't think it's designed to fork it's processing across many different nodes on a cluster. I could be completely wrong but when running cufflinks on a cluster I think you're limited to the resources of a single node.

For 80 million reads, however, it shouldn't take very long at all to get through that first stage of determining the fragment length distribution and I'm not sure if that part even runs more than a single thread anyways. of course this depends on the CPU of the node you're running it on. imaging running 'samtools flagstat' on an alignment file - that's essentially what that step is with a bit of extra code. the multithreadded part is when it starts the assembly/quantification part. if you're using the '-b' option you're in for TWO trips through that process. that should also be the longest part of the process.
__________________
/* Shawn Driscoll, Gene Expression Laboratory, Pfaff
Salk Institute for Biological Studies, La Jolla, CA, USA */
sdriscoll is offline   Reply With Quote
Old 06-29-2013, 08:07 AM   #7
gesdys
Junior Member
 
Location: philadelphia

Join Date: Jun 2013
Posts: 5
Default

I'm using a single workstation, and I checked: I used the verbose option, so I was able to see almost everything.
when it was calculating the number of reads for each transcript (this is a multithread part), it used more than one thread just for few transcript, then it started to use only one thread for all the other (that's why takes so long)...
I also check other programs such as bowtie and tophat and both of then are using multithreading perfectly...
so I really thing this is a bug of the last version of cufflinks...
hope they are gonna fix it, otherwise for huge dataset it will take forever...
gesdys is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:37 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO