SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Labrador: a web based tool to manage and automate the processing of sequence datasets ewels Bioinformatics 3 01-05-2016 08:14 AM
Useful bioinformatics tool ideas? vinay427 Bioinformatics 11 07-20-2012 07:53 AM
PubMed: FAAST: Flow-space Assisted Alignment Search Tool. Newsbot! Literature Watch 0 07-21-2011 07:40 AM
Cluster Station Flow Problems SeaJane Illumina/Solexa 2 06-18-2009 02:19 AM

Reply
 
Thread Tools
Old 04-25-2014, 01:55 AM   #1
ewels
Phil Ewels
 
Location: SciLifeLab, Stockholm, Sweden

Join Date: Mar 2011
Posts: 32
Default Cluster Flow: A pipelining tool to automate and standardise bioinformatics analyses

Hi all,

We've just released a new piece of software from the Babraham Bioinformatics group called Cluster Flow.

Cluster Flow is a command-line program which uses GRIDEngine or LSF cluster environments to run analysis pipelines.
  • Routine analyses are very quick to run, for example: cf --genome GRCh37 fastq_bowtie *fq.gz
  • Pipelines use identical parameters, standardising analysis and making results more reproducable
  • Integrated parallelisation tools help prevent your cluster becoming overloaded
  • All commands and output is logged in files for future reference
  • Intuitive commands and a comprehensive manual make Cluster Flow easy to use
  • Works out of the box (almost - see the YouTube tutorial)

How Cluster Flow differs from other pipeline tools:
  • Very lightweight and flexible
  • Pipelines and configurations can easily be generated on a project-specific basis if required
  • New modules and pipelines are very easy to write (see video tutorial)

We have been using Cluster Flow on our GRIDEngine software for some months and it's working well. In fact, I think it's fair to say that most of our bioinformatics group use it on an almost daily basis now. There has been limited testing on LSF systems with the help of a friend at the EBI, where it seems to work ok.

At the time of writing, Cluster Flow comes bundled with pipelines and modules to run the following programs:
It comes with typical pipelines to process data using these modules, some with additional parameters (eg. for miRNA alignment or RRBS methylation data).

We've written these pipelines as we've needed them - Cluster Flow comes with an example module which you can use to help you write your own. If you do use Cluster Flow and write any new modules or pipelines, please let us know as we're keen to expand the number of available analyses that it can run.

Cluster Flow is released with a GPL v3 licence and can be downloaded from the Babraham Bioinformatics website: http://www.bioinformatics.babraham.a...s/clusterflow/
ewels is offline   Reply With Quote
Old 05-30-2014, 12:54 PM   #2
ewels
Phil Ewels
 
Location: SciLifeLab, Stockholm, Sweden

Join Date: Mar 2011
Posts: 32
Default

Hi all,

I've just released version 0.2 of Cluster Flow. The main update is that it now supports SLURM clusters, plus it's much easier to customise the job submission commands to be tailored to your environment.

Cluster Flow now has its own website for documentation: http://ewels.github.io/clusterflow/

It's now hosted on GitHub - you can download v0.2 from tagged releases page.

Cheers,

Phil
ewels is offline   Reply With Quote
Old 07-11-2014, 03:22 AM   #3
ewels
Phil Ewels
 
Location: SciLifeLab, Stockholm, Sweden

Join Date: Mar 2011
Posts: 32
Default

Version 0.3 of Cluster Flow has just been pushed live.

This one has been brewing for a few months now and is a big update. The main highlights:
  • Report log files are now handled in a clever way to keep their order consistent, even when jobs are running in parallel.
  • E-mails are fancier and flag any errors or warnings, plus they can be given custom text strings to search for in the logs and highlight or flag as warnings.
  • Environment module loading has been tidied up and now needs less configuration and works more robustly. Environment modules can now be given aliases for better compatibility and version specification.
  • Cluster compatibility has been developed heavily and now allows almost complete configuration of the job submission commands via the configuration file.

You can download v0.3 of Cluster Flow here: https://github.com/ewels/clusterflow/releases/tag/v0.3

Documentation and new demonstrations can be seen on the docs homepage: http://ewels.github.io/clusterflow/

Much of this development has been the result of me moving and wanting to run Cluster Flow on a different cluster. I'd like to thank those who have helped out with testing and development, notably the chaps back at Babraham who have had to put up with all of my buggy pre-releases.

Phil
ewels is offline   Reply With Quote
Reply

Tags
babraham, cluster computing, cluster flow, pipeline, pipeline development

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:31 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO