Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cluster Flow: A pipelining tool to automate and standardise bioinformatics analyses

    Hi all,

    We've just released a new piece of software from the Babraham Bioinformatics group called Cluster Flow.

    Cluster Flow is a command-line program which uses GRIDEngine or LSF cluster environments to run analysis pipelines.
    • Routine analyses are very quick to run, for example: cf --genome GRCh37 fastq_bowtie *fq.gz
    • Pipelines use identical parameters, standardising analysis and making results more reproducable
    • Integrated parallelisation tools help prevent your cluster becoming overloaded
    • All commands and output is logged in files for future reference
    • Intuitive commands and a comprehensive manual make Cluster Flow easy to use
    • Works out of the box (almost - see the YouTube tutorial)


    How Cluster Flow differs from other pipeline tools:
    • Very lightweight and flexible
    • Pipelines and configurations can easily be generated on a project-specific basis if required
    • New modules and pipelines are very easy to write (see video tutorial)


    We have been using Cluster Flow on our GRIDEngine software for some months and it's working well. In fact, I think it's fair to say that most of our bioinformatics group use it on an almost daily basis now. There has been limited testing on LSF systems with the help of a friend at the EBI, where it seems to work ok.

    At the time of writing, Cluster Flow comes bundled with pipelines and modules to run the following programs:

    It comes with typical pipelines to process data using these modules, some with additional parameters (eg. for miRNA alignment or RRBS methylation data).

    We've written these pipelines as we've needed them - Cluster Flow comes with an example module which you can use to help you write your own. If you do use Cluster Flow and write any new modules or pipelines, please let us know as we're keen to expand the number of available analyses that it can run.

    Cluster Flow is released with a GPL v3 licence and can be downloaded from the Babraham Bioinformatics website: http://www.bioinformatics.babraham.a...s/clusterflow/

  • #2
    Hi all,

    I've just released version 0.2 of Cluster Flow. The main update is that it now supports SLURM clusters, plus it's much easier to customise the job submission commands to be tailored to your environment.

    Cluster Flow now has its own website for documentation: http://ewels.github.io/clusterflow/

    It's now hosted on GitHub - you can download v0.2 from tagged releases page.

    Cheers,

    Phil

    Comment


    • #3
      Version 0.3 of Cluster Flow has just been pushed live.

      This one has been brewing for a few months now and is a big update. The main highlights:
      • Report log files are now handled in a clever way to keep their order consistent, even when jobs are running in parallel.
      • E-mails are fancier and flag any errors or warnings, plus they can be given custom text strings to search for in the logs and highlight or flag as warnings.
      • Environment module loading has been tidied up and now needs less configuration and works more robustly. Environment modules can now be given aliases for better compatibility and version specification.
      • Cluster compatibility has been developed heavily and now allows almost complete configuration of the job submission commands via the configuration file.


      You can download v0.3 of Cluster Flow here: https://github.com/ewels/clusterflow/releases/tag/v0.3

      Documentation and new demonstrations can be seen on the docs homepage: http://ewels.github.io/clusterflow/

      Much of this development has been the result of me moving and wanting to run Cluster Flow on a different cluster. I'd like to thank those who have helped out with testing and development, notably the chaps back at Babraham who have had to put up with all of my buggy pre-releases.

      Phil

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:37 PM
      0 responses
      11 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 06:07 PM
      0 responses
      10 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      51 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      68 views
      0 likes
      Last Post seqadmin  
      Working...
      X