Unconfigured Ad

**dpryan** · 03-31-2014, 01:51 AM

Since tophat isn't written to use MPI, the instances run on each node will be blind to each other. There is no way around this without rewriting the program (actually, you'd need to rewrite bowtie as well). If you really want to use tophat, then just split your fastq files into small enough versions and then push a large number of tophat jobs onto the cluster (normally one would create a script to do this). If you're going that route anyway, just switch to STAR (you might have enough RAM, I'm not sure) and you'll get your results in a fraction of the time. The last alternative here would by Myrna, which seems to be designed with this sort of scenario in mind.

You'll have the same problems with cufflinks, namely that increasing the number of instances with MPI won't decrease runtime since the instances won't talk to each other. Your best bet there would be eXpress or to just use count-based methods.

**GenoMax** · 03-31-2014, 03:25 AM

@Marco: Only 6 h of walltime per job, odd indeed (hope there is some logic behind that restriction). Not sure what kind of queuing system your cluster uses but perhaps you can make a case for a separate queue for your jobs with a walltime of at least 24 h?

**kitinje** · 03-31-2014, 05:37 AM

Thank you for the quick reply, it makes things much more clear to me.
The cluster has PBS queuing system and yes there is another queue that allows me to use a more limited number of nodes (but unnecessary at this point, at least for tophat/cufflinks) and that has 24h walltime. But I couldn't finish tophat run in 24h on a single machine. I can split FASTQ files to partially solve this first problem. But then I face the same problem with cufflinks.
What happens if I split the bam files by chromosome? And then run 23 cufflinks jobs? Will I face serious problems in the quantification of the isoforms/normalization?

**NicoBxl** · 03-31-2014, 06:11 AM

like dpryan said, if you are time limited, use STAR ( very very fast and same, even better results than tophat

**kitinje** · 03-31-2014, 06:20 AM

Yes, thanks. I contacted sysadmins in order to install STAR module on the server and see how it works.

**dpryan** · 03-31-2014, 06:22 AM

You don't need them to install anything, you can just do that yourself (just install it into your home directory).

**GenoMax** · 03-31-2014, 06:36 AM

On a shared cluster it is good practice (like Marco is doing) to ask the admins to install software. Under the "modules" system (which Macro's cluster is using) admins will automatically account for dependencies/conflicts with libraries etc. A software like STAR is widely useful so having a central single install is preferable to having everyone install a local copy. Keeping genome indexes in a central location also saves on disk space.

That said temporarily running STAR from your directory (while admins install a central copy) may be an option for the impatient :-)

**kitinje** · 03-31-2014, 08:14 AM

STAR module was installed, I'm downloading hg19 + annotations from their ftp server. If it works, in my understanding, I'll have to covert the output to bam, sort it and then create a sorted indexed bam file.

At this stage I still face a problem with cufflinks, I will check if i can get the job done using a single node (12:CPUS/48GB) in 24h (max walltime).
If not, I might split the bam file by chromosome and run 23 different jobs in as many nodes.
In that case, are there any available options in order to renormalize the FPKMs afterwards?
(I'm looking at eXpress in the meantime)

**GenoMax** · 03-31-2014, 08:36 AM

Is 24h the longest time slot you have available?

**kitinje** · 03-31-2014, 08:48 AM

Yes,
I have lots of available nodes but unfortunately I only have 3 possibile queues, debug (walltime 30 min), parallel (6h), longpar (24h).
I will ask sysadmins if they can create a 96h queue for me but i fear this is unlikely to happen. The cluster is being used for other kind of not bioinformatic related computations and I think that they don't want to reserve nodes for more then 24h.

**kitinje** · 03-31-2014, 03:33 PM

STAR worked flawlessy and completed the job in 30 minutes. Now I "only" need to solve my walltime problem with the quantification of the transcripts.
In the meantime, thank you all!

**GenoMax** · 03-31-2014, 05:42 PM

Look at featurecounts as an option for the quantification: http://bioinf.wehi.edu.au/featureCounts/

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, Yesterday, 11:08 AM	0 responses 6 views 0 reactions	Last Post by SEQadmin2 Yesterday, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 11 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 19 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 53 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

Running Tophat/Cufflinks on a cluster with Multiple nodes

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News

Unconfigured Ad

Running Tophat/Cufflinks on a cluster with *Multiple* nodes

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News

Running Tophat/Cufflinks on a cluster with Multiple nodes