Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Your favorite web-based Gene List Tool? Richard Finney Bioinformatics 5 01-30-2013 12:59 PM
NextBrowse: Web-based genomics visualization and BAM hosting tool. jack_bauer Bioinformatics 14 02-08-2012 08:48 AM
RNA-Seq: wapRNA: a web-based application for the processing of RNA sequences. Newsbot! Literature Watch 0 09-08-2011 02:00 AM
PubMed: SNiPlay: a web-based tool for detection, management and analysis of SNPs. App Newsbot! Literature Watch 0 05-08-2011 02:42 AM
PubMed: W-ChIPeaks: a comprehensive web application tool for processing ChIP-chip and Newsbot! Literature Watch 0 12-09-2010 02:00 AM

Thread Tools
Old 01-17-2014, 02:56 AM   #1
Phil Ewels
Location: SciLifeLab, Stockholm, Sweden

Join Date: Mar 2011
Posts: 32
Default Labrador: a web based tool to manage and automate the processing of sequence datasets

We've just released the first version of a new tool called Labrador: http://www.bioinformatics.babraham.a...ects/labrador/

What it's for
We found that a lot of people were asking us to download and process data from public repositories. Typically, we reprocess this data from raw reads in house. Increasing amounts of our time was being spent digging out download URLs and running the same analyses on different datasets.

Labrador was written to streamline this process. It can track projects and notify users, it automates the retrieval of metadata from public sequence data repositories and it can generate scripts based on this data.

What it does
Labrador has two target groups: end-users (researchers) and bioinformaticians.

Researchers can use Labrador to:
  • Browse and search previously processed datasets
  • View processing and analysis reports in their web browser
  • Download data through their web browser
  • Request new datasets, with required information automatically retrieved from accession numbers
Bioinformaticians can use Labrador to:
  • Speed up retrieval of project information from repositories
  • Catalogue processing and analysis
  • Create automated analysis bash scripts
  • Customise templates for analysis script generation
Video tutorials
We've made three video tutorials to help you get started with Labrador:How it works
Labrador is a web-based tool designed to run on a local intranet, written in PHP with a MySQL back end. Labrador communicates with the GEO, NCBI SRA, EBI ENA and the DDJB to automatically retrieve metadata and accession numbers.

Bioinformaticians can manage the processing of datasets and use the retrieved metadata to generate processing bash scripts or download files. This is entirely customisable, allowing you to make Labrador fit in with your existing environment. Labrador comes with a number of common processing pipelines. Once processed, it's easy to view reports within the browser which are associated with the projects and shown in-page.

Where to get it
Labrador is free software under the GPLv3. It is written in PHP and MySQL and we run it on an apache system. You can download Labrador here.

We have been using Labrador at the Babraham Institute for several months and have found it helpful. We're keen to hear any feedback about bugs or suggestions for improvements.

Hopefully Labrador can be a useful tool for some others in these forums!


Last edited by ewels; 01-17-2014 at 03:12 AM.
ewels is offline   Reply With Quote
Old 11-13-2015, 01:26 PM   #2
Location: Ithaca, NY

Join Date: Jun 2012
Posts: 38

Hi @tallphil,

Labrador looks like a useful tool and I'll give it a try for managing datasets for a collaboration I am starting. However, I'm interested in getting your advice, given your expertise with the backend of NCBI. I've spent the past hour navigating using their search tools to pull datasets based on their geographic location (polar), sample type (amplicon) and environment (marine). I have not found a satisfactory method and I wonder if I'm missing something. Do you have any advice for how to comprehensively and efficiently download data based on the aforementioned criteria? I thought it would be more straightforward than it what I've experienced thus far.

roliwilhelm is offline   Reply With Quote
Old 11-14-2015, 01:51 AM   #3
Phil Ewels
Location: SciLifeLab, Stockholm, Sweden

Join Date: Mar 2011
Posts: 32

Hi Roli,

I'm glad you like the look of Labrador - it'd be good to hear how you get on. Note that it's a bit different to standard LIMS systems as it's deliberately public to all users. But if your installation is only visible to collaborators then that shouldn't be a problem.

As for the NCBI - I'm afraid I don't have an easy solution for you. The inner workings of their search tools is a dark and mysterious place. The API is powerful but not very intuitive (and has fun gotchas such as HTML-encoded XML as flat strings inside other XML tags, yay!).

Anyway, I've also spent lots of hours searching for data like you have, and my take home is that there isn't really a quick way to do it. Typically I start by searching the GEO which usually has the best metadata. Then I move to the SRA as it has some datasets which aren't described in the GEO. This becomes rapidly frustrating, but on a good day you might be lucky. Probably my favourite way to find datasets is to use google / pubmed to find relevant papers, and then backtrack from each paper to track down its data. Review papers can be a goldmine for these.

Labrador uses the NCBI in the other direction - a user supplies a database accession, then it uses the NCBI API tools to pull down what data it can find. Pulling data like this is far easier than searching for unknown data..

Sorry that I don't have a better solution - good luck!

ewels is offline   Reply With Quote
Old 01-05-2016, 07:14 AM   #4
Junior Member
Location: UK

Join Date: Jan 2016
Posts: 1

Hi Roli,

It seems you would like to search and download polar marine amplicon sequence data available in the public nucleotide sequence databases.

One option would be to use the Advanced search functionality of the EBI ENA browser. Briefly:

1. Go to the webpage here:

2. Select the 'Environmental' domain

3. Use the 'Geographical location' where you can specify bounded box, radius or directly add latitude and longitude of the southwest and northeast points. There is a plenty of other options to specify your search further. When you are happy with your search query, click on the 'Search' button.

3. The search results will provide a list of sample accession numbers matching the given specification. By clicking on the sample accession you will be directed to the page with associated sequence data.

Further details on how to programmatically retrieve data are available here:

More info on downloading read data is here:

Hope this helps.

curator is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 12:51 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO