Seqanswers Leaderboard Ad

**roliwilhelm** · 11-13-2015, 02:26 PM

Hi @tallphil,

Labrador looks like a useful tool and I'll give it a try for managing datasets for a collaboration I am starting. However, I'm interested in getting your advice, given your expertise with the backend of NCBI. I've spent the past hour navigating using their search tools to pull datasets based on their geographic location (polar), sample type (amplicon) and environment (marine). I have not found a satisfactory method and I wonder if I'm missing something. Do you have any advice for how to comprehensively and efficiently download data based on the aforementioned criteria? I thought it would be more straightforward than it what I've experienced thus far.

Thanks,
Roli

**ewels** · 11-14-2015, 02:51 AM

Hi Roli,

I'm glad you like the look of Labrador - it'd be good to hear how you get on. Note that it's a bit different to standard LIMS systems as it's deliberately public to all users. But if your installation is only visible to collaborators then that shouldn't be a problem.

As for the NCBI - I'm afraid I don't have an easy solution for you. The inner workings of their search tools is a dark and mysterious place. The API is powerful but not very intuitive (and has fun gotchas such as HTML-encoded XML as flat strings inside other XML tags, yay!).

Anyway, I've also spent lots of hours searching for data like you have, and my take home is that there isn't really a quick way to do it. Typically I start by searching the GEO which usually has the best metadata. Then I move to the SRA as it has some datasets which aren't described in the GEO. This becomes rapidly frustrating, but on a good day you might be lucky. Probably my favourite way to find datasets is to use google / pubmed to find relevant papers, and then backtrack from each paper to track down its data. Review papers can be a goldmine for these.

Labrador uses the NCBI in the other direction - a user supplies a database accession, then it uses the NCBI API tools to pull down what data it can find. Pulling data like this is far easier than searching for unknown data..

Sorry that I don't have a better solution - good luck!

Phil

**curator** · 01-05-2016, 08:14 AM

Hi Roli,

It seems you would like to search and download polar marine amplicon sequence data available in the public nucleotide sequence databases.

One option would be to use the Advanced search functionality of the EBI ENA browser. Briefly:

1. Go to the webpage here:

ENA Browser

http://www.ebi.ac.uk/ena/data/warehouse/search

ENA Browser

2. Select the 'Environmental' domain

3. Use the 'Geographical location' where you can specify bounded box, radius or directly add latitude and longitude of the southwest and northeast points. There is a plenty of other options to specify your search further. When you are happy with your search query, click on the 'Search' button.

3. The search results will provide a list of sample accession numbers matching the given specification. By clicking on the sample accession you will be directed to the page with associated sequence data.

Further details on how to programmatically retrieve data are available here:

How to Access ENA Programmatically — ENA Training Modules 1 documentation

http://www.ebi.ac.uk/ena/browse/data-retrieval-rest

More info on downloading read data is here:

How to Download Data Files — ENA Training Modules 1 documentation

http://www.ebi.ac.uk/ena/browse/read-download

Hope this helps.

Petra

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 27 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Labrador: a web based tool to manage and automate the processing of sequence datasets

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News