SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Cufflinks timing out - computing power required? (http://seqanswers.com/forums/showthread.php?t=23910)

hlwright 10-08-2012 04:13 AM

Cufflinks timing out - computing power required?
 
I am analysing human transcriptome data (Illumina) via the Tophat -> Cufflinks pipeline (v2.0.2) using iGenomes references. My dataset comprises 14 patients and 6 controls, so I have 2 "conditions" to analyse with 14 and 6 biological replicates respectively.

Until now I have been bypassing the full cufflinks protocol and just running cuffdiff providing a GTF, as follows:

PHP Code:

cuffdiff -p 8 -./cuffdiff_out -b genome.fa genes.gtf P1.bam,P2.bam,P3.bam,P4.bam,P5.bam,P6.bam,P7.bam,P8.bam,P9.bam,P10.bam,P11.bam,P12.bam,P13.bam,P14.bam C1.bam,C2.bam,C3.bam,C4.bam,C5.bam,C6.bam 

This operation runs across 8 cores of our server (4GB per core) in 11-12h.

However, I have been trying to run the full cufflinks -> cuffmerge -> cuffdiff protocol (as per the Nature Protocols publication) but as yet have not been able to successfully complete the entire process. My IT support team have been very helpful but the final cuffdiff job which I run is requiring HUGE amounts of computing power and time and I wonder what other people's experience of this is are or if I am doing something wrong.

I have successfully run these operations:-

Cufflinks for each BAM file:
PHP Code:

cufflinks -p 8 -./output_dir -b genome.fa -g genes.gtf P1.bam 

Then create assemblies.txt file:-
PHP Code:

./path/to/P1.bam
./path/to/P2.bam
...
etc 

Cuffmerge (this took 1h):
PHP Code:

cuffmerge -p 8 -./cuffmerge_out -g genes.gtf -s genome.fa assemblies.txt 

Cuffdiff:
PHP Code:

cuffdiff -p 8 -./cuffdiff_out -b genome.fa -u merged.gtf P1.bam,P2.bam,P3.bam,P4.bam,P5.bam,P6.bam,P7.bam,P8.bam,P9.bam,P10.bam,P11.bam,P12.bam,P13.bam,P14.bam C1.bam,C2.bam,C3.bam,C4.bam,C5.bam,C6.bam 

The last time I tried to run the cuffdiff step I was allocated 160GB across 8 cores for 5 days. The job timed out at the "Testing for differential expression and regulation in locus" step. It also only ever used ~30GB of the 160GB allocated.

Can anyone offer any advice / suggestions / or even let me know how much computing power / time they use for their runs?

Much appreciated
Helen

hbt 10-10-2012 05:48 AM

Is this an issue just with the newest version of cufflinks (v.2.02) or did it also occur with older versions of cufflinks?

mallela 03-15-2014 02:12 AM

Hi hlwright,

I am also having the same problem. Could you pls tell me how you've solved your problem ?

Thanks!


All times are GMT -8. The time now is 11:26 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.