Seqanswers Leaderboard Ad

**tgenahmet** · 07-05-2011, 09:42 AM

We're also having very similar issues. Our clusters gives us 96 hours to complete our jobs but sometimes we hit this wall time. Any tips to improve speed with cufflinks are highly appreciated.

**DZhang** · 07-05-2011, 06:22 PM

Which step did you notice the program spent most of the time? Below is one entry in the Cufflinks FAQ:

I'm trying to assemble a sample. Cufflinks is almost done, but it seems to be hanging at "99% complete". What's going on?

Cufflinks spawns threads for each locus to assemble and quantitate the "bundle" of reads in that locus. Some loci may have more reads and more complicated alternative splicing than others, which requires more CPU cycles. These bundles can continue processing long after all others have completed, leading to this behavior. You may be able to decrease the number of such bundles by masking out ribosomal and mitochondrial RNA using the -M/--mask-file option described in the Manual.

**ksiowa** · 07-06-2011, 08:20 AM

Originally posted by DZhang View Post

Which step did you notice the program spent most of the time? Below is one entry in the Cufflinks FAQ:

I'm trying to assemble a sample. Cufflinks is almost done, but it seems to be hanging at "99% complete". What's going on?

Cufflinks spawns threads for each locus to assemble and quantitate the "bundle" of reads in that locus. Some loci may have more reads and more complicated alternative splicing than others, which requires more CPU cycles. These bundles can continue processing long after all others have completed, leading to this behavior. You may be able to decrease the number of such bundles by masking out ribosomal and mitochondrial RNA using the -M/--mask-file option described in the Manual.

We noticed it hanging around 71% for a particularly long time one day, but since we had to leave it running over several nights, it's hard to say whether or not this was unusual.

Also, it seems to take about the same amount of time regardless of how many threads we tell it to use (we've tried 1, 4, 7, and 8). However, that answer from the FAQ makes it sound like cufflinks spawns threads automatically, so we're wondering if maybe we misunderstood the -p option?

**DZhang** · 07-06-2011, 08:31 AM

I believe the -p option at least works for bowtie.

Douglas

https://www.contigexpress.com

**dhiralphadke** · 09-01-2011, 05:15 AM

We have ~175 million aligned reads. I am running cufflinks version 1.0.3. It has been running since last 8 days and is not yet completed. The cufflinks output files are being updated after 1 or 2 days. Is this normal?

here are the cufflinks options I used:
cufflinks --GTF-guide refseq.gtf --frag-bias-correct /indexes --multi-read-correct -p 12

The machine specs are:
32 processors (2.4 GHz each)
512 GB RAM

Any thoughts would be greatly appreciated!

Thanks!

-Dhiral.

**thurisaz** · 09-01-2011, 05:59 AM

I think you should consider using a mask file (see post #3 above). cufflinks was also taking a long time to run on my data; when I had a look at the region where it was stalling, I could see that very many reads were aligning there. Creating a GFF file to mask these regions (with -M) solved the problem in my case.

**dhiralphadke** · 09-01-2011, 08:14 AM

Originally posted by thurisaz View Post

I think you should consider using a mask file (see post #3 above). cufflinks was also taking a long time to run on my data; when I had a look at the region where it was stalling, I could see that very many reads were aligning there. Creating a GFF file to mask these regions (with -M) solved the problem in my case.

You created a mask file with the regions that it was stalling at? There could be valid transcripts in those regions if several reads were aligning there. Or did you mask out specific ribosomal RNA and mitochondiral RNA regions?

**thurisaz** · 09-02-2011, 02:45 AM

Yes, I created a mask file to exclude the regions it was stalling at, since they were hugely over-represented and the analysis wouldn't finish otherwise. Comparing them now, however, I see that they do cover the annotated rRNA as well as some extra regions:

Code:

[B]Problem areas in my run:[/B]
Chr2    TAIR10  exon    1900    10200   .       .       .       ID=Chr2_problem_area
Chr3    TAIR10  exon    14143000        14145000        .       .       .       ID=Chr3_problem_area1
Chr3    TAIR10  exon    14195800        14204100        .       .       .       ID=Chr3_problem_area2

[B]Annotated rRNA:[/B]
Chr2  TAIR10  rRNA  5782  5945  . + . ID=AT2G01020.1;Parent=AT2G01020;Name=AT2G01020.1;Index=1                                                                                       
Chr3  TAIR10  rRNA  14197677  14199484  . + . ID=AT3G41768.1;Parent=AT3G41768;Name=AT3G41768.1;Index=1                                                                               
Chr3  TAIR10  rRNA  14199753  14199916  . + . ID=AT3G41979.1;Parent=AT3G41979;Name=AT3G41979.1;Index=1

**zorph** · 04-05-2012, 01:44 PM

did anyone find a way around getting Cufflinks to work faster on a large file without masking transcripts or making cufflinks run for a longer period of time?

****I wish I could just divide the file in half and then figure out a way to merge the FPKMs***

**sudders** · 04-27-2012, 05:52 AM

Originally posted by zorph View Post

did anyone find a way around getting Cufflinks to work faster on a large file without masking transcripts or making cufflinks run for a longer period of time?

****I wish I could just divide the file in half and then figure out a way to merge the FPKMs***

You could in theory divide the input bams by which chromosomes the reads map to and then run a seperate cufflinks process for each chromosome. You'd have to find some way to renormalise the FPKMs afterwards.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 13 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Cufflinks Runtime

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News