SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Accession Numbers when publishing fibar General 7 05-15-2015 12:10 PM
Submission of genome annotated via RAST to GenBank bstamps General 0 09-13-2013 07:43 AM
Download RefSeq .gb files based on accession number thedamian Bioinformatics 4 12-13-2012 04:31 AM
csfasta quality hard trimming do i need to hard trim the qual file? KevinLam Bioinformatics 2 05-13-2010 02:27 PM
Converting genbank accession to UCSC warrenemmett Bioinformatics 0 08-17-2009 05:47 AM

Reply
 
Thread Tools
Old 01-25-2014, 06:51 AM   #1
berthubert
Junior Member
 
Location: Nootdorp, The Netherlands

Join Date: Jan 2014
Posts: 4
Default surprisingly hard: Going from Genbank Accession number to Genome Name

Hi everybody,

I wrote an automated FASTQ based 16S rrna searcher, so you give it your FASTQ and it tells you which 16S matches it best. Although most people should know which genome they sequenced, I enjoy my computer telling me what I did ;-) It may also help you spot contaminants. Code on https://github.com/beaumontlab/antonie

However - the Green Genes database (at http://greengenes.secondgenome.com/downloads ) gives me a Genbank Accession Number, like this:
Best current guess: Genbank GU198115.1

But I'd like to show my user "Pseudomonas fluorescens strain LMG 7207 16S ribosomal RNA gene, partial sequence."

I have found several e-utils queries that work, like http://eutils.ncbi.nlm.nih.gov/entre...ta&retmode=xml

But these often deliver the entire genome, which I really don't need! Is there a way to send a limited query to only get TSeq_defline or TSeq_orgname?

Or alternatively, is there a database of accession numbers/names that I can download somewhere?

Thanks!
berthubert is offline   Reply With Quote
Old 01-25-2014, 08:19 AM   #2
rhinoceros
Senior Member
 
Location: sub-surface moon base

Join Date: Apr 2013
Posts: 372
Default

gi2taxid
Code:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=nucleotide&db=taxonomy&id=???
taxid2data
Code:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=taxonomy&id=???
You'll probably first need to turn those accessions to gis..
__________________
savetherhino.org
rhinoceros is offline   Reply With Quote
Old 01-25-2014, 09:27 AM   #3
berthubert
Junior Member
 
Location: Nootdorp, The Netherlands

Join Date: Jan 2014
Posts: 4
Default

Hi,

Thanks, this works (once I've figured out the gi!). This URL appears to deliver the GI based on the accession number: http://eutils.ncbi.nlm.nih.gov/entre...somedomain.com

And it also delivers a human friendly name!

EDIT: oops, for some accession numbers (like AM181176.4) it still delivers far too much data ;-(

Thanks!

Last edited by berthubert; 01-25-2014 at 09:50 AM. Reason: oops, some urls still too much data
berthubert is offline   Reply With Quote
Old 01-25-2014, 09:48 AM   #4
berthubert
Junior Member
 
Location: Nootdorp, The Netherlands

Join Date: Jan 2014
Posts: 4
Default

Output now looks like this, but for *some* accession numbers, the URL still returns a huge amount of data ;-(

$ ./16ssearcher gg_13_5.fasta P1-1-35_S5_L001_R*fastq
-> Best current guess: 1111855 (2024) -> Genbank HQ911364.1
ORGANISM Pseudomonas fluorescens
2 potentials out of 240 candidates -> Best current guess: 1111132 (3456) -> Genbank GU437272.1
ORGANISM uncultured bacterium
10 potentials out of 2390 candidates -> Best current guess: 1105115 (3457) -> Genbank JF262574.1
ORGANISM Pseudomonas sp. UYSO19
158 potentials out of 186282 candidates -> Best current guess: 790134 (3652) -> Genbank HM190225.1
ORGANISM Pseudomonas marginalis pv. marginalis
363 potentials out of 264000 candidates -> Best current guess: 589242 (3946) -> Genbank GU198113.1
ORGANISM Pseudomonas fluorescens
364 potentials out of 264069 candidates -> Best current guess: 588382 (4362) -> Genbank GU198112.1
ORGANISM Pseudomonas fluorescens
368 potentials out of 267000 candidates -> Best current guess: 585665 (4362) -> Genbank GU198115.1
ORGANISM Pseudomonas fluorescens
2353 potentials out of 683524 candidates -> Best current guess: 16810 (4644) -> Genbank AF336349.1
ORGANISM Pseudomonas fluorescens
3049 potentials out of 999000 candidates -> Best current guess: 3860764 (4980) -> Genbank NC_012660.1
ORGANISM Pseudomonas fluorescens SBW25
3431 potentials out of 1166308 candidates -> Best current guess: 4408488 (4980) -> Genbank AM181176.4
ORGANISM Pseudomonas fluorescens SBW25


Forging on...
berthubert is offline   Reply With Quote
Old 01-25-2014, 11:17 AM   #5
berthubert
Junior Member
 
Location: Nootdorp, The Netherlands

Join Date: Jan 2014
Posts: 4
Default

And we have a winner, thanks to John Eargle on #bioinformatics:
http://eutils.ncbi.nlm.nih.gov/entre...um&id=AE000520

Where AE000520 is the Accession Number. Thanks!

Last edited by berthubert; 01-25-2014 at 11:52 AM. Reason: typo
berthubert is offline   Reply With Quote
Reply

Tags
16s rrna, genbank

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:39 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO