SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
High performance computing & Tophat & Cufflinks paolo.kunder Bioinformatics 1 01-24-2012 08:02 AM
Newbie needing advice on required computing power for small-scale NGS facility dalesan Bioinformatics 7 10-03-2011 05:15 AM
Computing power for in-house Next Gen analysis JackieBadger Bioinformatics 24 03-23-2011 08:37 AM
Power analysis for RNAseq dglemay RNA Sequencing 0 03-03-2011 08:34 PM
SAM result format required in cufflinks arthur.yxt Bioinformatics 15 11-12-2010 01:11 AM

Reply
 
Thread Tools
Old 10-08-2012, 04:13 AM   #1
hlwright
Member
 
Location: Liverpool, UK

Join Date: Feb 2011
Posts: 30
Question Cufflinks timing out - computing power required?

I am analysing human transcriptome data (Illumina) via the Tophat -> Cufflinks pipeline (v2.0.2) using iGenomes references. My dataset comprises 14 patients and 6 controls, so I have 2 "conditions" to analyse with 14 and 6 biological replicates respectively.

Until now I have been bypassing the full cufflinks protocol and just running cuffdiff providing a GTF, as follows:

PHP Code:
cuffdiff -p 8 -./cuffdiff_out -b genome.fa genes.gtf P1.bam,P2.bam,P3.bam,P4.bam,P5.bam,P6.bam,P7.bam,P8.bam,P9.bam,P10.bam,P11.bam,P12.bam,P13.bam,P14.bam C1.bam,C2.bam,C3.bam,C4.bam,C5.bam,C6.bam 
This operation runs across 8 cores of our server (4GB per core) in 11-12h.

However, I have been trying to run the full cufflinks -> cuffmerge -> cuffdiff protocol (as per the Nature Protocols publication) but as yet have not been able to successfully complete the entire process. My IT support team have been very helpful but the final cuffdiff job which I run is requiring HUGE amounts of computing power and time and I wonder what other people's experience of this is are or if I am doing something wrong.

I have successfully run these operations:-

Cufflinks for each BAM file:
PHP Code:
cufflinks -p 8 -./output_dir -b genome.fa -g genes.gtf P1.bam 
Then create assemblies.txt file:-
PHP Code:
./path/to/P1.bam
./path/to/P2.bam
...
etc 
Cuffmerge (this took 1h):
PHP Code:
cuffmerge -p 8 -./cuffmerge_out -g genes.gtf -s genome.fa assemblies.txt 
Cuffdiff:
PHP Code:
cuffdiff -p 8 -./cuffdiff_out -b genome.fa -u merged.gtf P1.bam,P2.bam,P3.bam,P4.bam,P5.bam,P6.bam,P7.bam,P8.bam,P9.bam,P10.bam,P11.bam,P12.bam,P13.bam,P14.bam C1.bam,C2.bam,C3.bam,C4.bam,C5.bam,C6.bam 
The last time I tried to run the cuffdiff step I was allocated 160GB across 8 cores for 5 days. The job timed out at the "Testing for differential expression and regulation in locus" step. It also only ever used ~30GB of the 160GB allocated.

Can anyone offer any advice / suggestions / or even let me know how much computing power / time they use for their runs?

Much appreciated
Helen
hlwright is offline   Reply With Quote
Old 10-10-2012, 05:48 AM   #2
hbt
Member
 
Location: UK

Join Date: Jan 2011
Posts: 20
Default

Is this an issue just with the newest version of cufflinks (v.2.02) or did it also occur with older versions of cufflinks?
hbt is offline   Reply With Quote
Old 03-15-2014, 02:12 AM   #3
mallela
Member
 
Location: Münster

Join Date: Apr 2013
Posts: 15
Default

Hi hlwright,

I am also having the same problem. Could you pls tell me how you've solved your problem ?

Thanks!
mallela is offline   Reply With Quote
Reply

Tags
cuffdiff, cufflinks 2.0.2, ram

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:11 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO