Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Labrador: a web based tool to manage and automate the processing of sequence datasets

    We've just released the first version of a new tool called Labrador: http://www.bioinformatics.babraham.a...ects/labrador/

    What it's for
    We found that a lot of people were asking us to download and process data from public repositories. Typically, we reprocess this data from raw reads in house. Increasing amounts of our time was being spent digging out download URLs and running the same analyses on different datasets.

    Labrador was written to streamline this process. It can track projects and notify users, it automates the retrieval of metadata from public sequence data repositories and it can generate scripts based on this data.

    What it does
    Labrador has two target groups: end-users (researchers) and bioinformaticians.

    Researchers can use Labrador to:
    • Browse and search previously processed datasets
    • View processing and analysis reports in their web browser
    • Download data through their web browser
    • Request new datasets, with required information automatically retrieved from accession numbers

    Bioinformaticians can use Labrador to:
    • Speed up retrieval of project information from repositories
    • Catalogue processing and analysis
    • Create automated analysis bash scripts
    • Customise templates for analysis script generation

    Video tutorials
    We've made three video tutorials to help you get started with Labrador:
    How it works
    Labrador is a web-based tool designed to run on a local intranet, written in PHP with a MySQL back end. Labrador communicates with the GEO, NCBI SRA, EBI ENA and the DDJB to automatically retrieve metadata and accession numbers.

    Bioinformaticians can manage the processing of datasets and use the retrieved metadata to generate processing bash scripts or download files. This is entirely customisable, allowing you to make Labrador fit in with your existing environment. Labrador comes with a number of common processing pipelines. Once processed, it's easy to view reports within the browser which are associated with the projects and shown in-page.

    Where to get it
    Labrador is free software under the GPLv3. It is written in PHP and MySQL and we run it on an apache system. You can download Labrador here.

    We have been using Labrador at the Babraham Institute for several months and have found it helpful. We're keen to hear any feedback about bugs or suggestions for improvements.

    Hopefully Labrador can be a useful tool for some others in these forums!

    Phil
    Last edited by ewels; 01-17-2014, 04:12 AM.

  • #2
    Hi @tallphil,

    Labrador looks like a useful tool and I'll give it a try for managing datasets for a collaboration I am starting. However, I'm interested in getting your advice, given your expertise with the backend of NCBI. I've spent the past hour navigating using their search tools to pull datasets based on their geographic location (polar), sample type (amplicon) and environment (marine). I have not found a satisfactory method and I wonder if I'm missing something. Do you have any advice for how to comprehensively and efficiently download data based on the aforementioned criteria? I thought it would be more straightforward than it what I've experienced thus far.

    Thanks,
    Roli

    Comment


    • #3
      Hi Roli,

      I'm glad you like the look of Labrador - it'd be good to hear how you get on. Note that it's a bit different to standard LIMS systems as it's deliberately public to all users. But if your installation is only visible to collaborators then that shouldn't be a problem.

      As for the NCBI - I'm afraid I don't have an easy solution for you. The inner workings of their search tools is a dark and mysterious place. The API is powerful but not very intuitive (and has fun gotchas such as HTML-encoded XML as flat strings inside other XML tags, yay!).

      Anyway, I've also spent lots of hours searching for data like you have, and my take home is that there isn't really a quick way to do it. Typically I start by searching the GEO which usually has the best metadata. Then I move to the SRA as it has some datasets which aren't described in the GEO. This becomes rapidly frustrating, but on a good day you might be lucky. Probably my favourite way to find datasets is to use google / pubmed to find relevant papers, and then backtrack from each paper to track down its data. Review papers can be a goldmine for these.

      Labrador uses the NCBI in the other direction - a user supplies a database accession, then it uses the NCBI API tools to pull down what data it can find. Pulling data like this is far easier than searching for unknown data..

      Sorry that I don't have a better solution - good luck!

      Phil

      Comment


      • #4
        Hi Roli,

        It seems you would like to search and download polar marine amplicon sequence data available in the public nucleotide sequence databases.

        One option would be to use the Advanced search functionality of the EBI ENA browser. Briefly:

        1. Go to the webpage here:


        2. Select the 'Environmental' domain

        3. Use the 'Geographical location' where you can specify bounded box, radius or directly add latitude and longitude of the southwest and northeast points. There is a plenty of other options to specify your search further. When you are happy with your search query, click on the 'Search' button.

        3. The search results will provide a list of sample accession numbers matching the given specification. By clicking on the sample accession you will be directed to the page with associated sequence data.

        Further details on how to programmatically retrieve data are available here:


        More info on downloading read data is here:


        Hope this helps.

        Petra

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        27 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        31 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        27 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Working...
        X