SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Obtaining UCSC Genomic sequence Given Genomic Coordinates modi2020 Bioinformatics 0 12-03-2012 08:45 PM
nucleotide sequence extraction struggler Bioinformatics 12 05-18-2012 10:14 AM
Obtaining unique sequence tag file from fastQ format ramadatta.88 Introductions 0 09-26-2011 02:25 AM
Bias toward G in first nucleotide in sequence? sem Sample Prep / Library Generation 0 01-16-2009 12:54 PM

Reply
 
Thread Tools
Old 03-15-2014, 11:07 PM   #1
jpearl01
Member
 
Location: Philadelphia PA

Join Date: Dec 2010
Posts: 17
Default Obtaining nucleotide gene sequence of bacteria using eutils

Hi All,

I'm trying to write a quick program to download a bunch of genes' dna from bacteria. I don't have ids for the genes, just their protein names (like lacz or what-have-you). So use esearch to look for say the lacz bacteria gene from the gene database:

http://eutils.ncbi.nlm.nih.gov/entre...p+AND+bacteria[filter]

then, I grab the first id in the list that is returned, and convert it from a gene id to a nucleotide using elink

http://eutils.ncbi.nlm.nih.gov/entre...tide&id=945006

Which gives back a list of different ids. Most of the ids (using efetch to get the sequence) generally give back the entire genomic sequence of the organism it was found in. *sometimes* one of the id's gives back the actual gene nucleotide data, but not always. For example the first two ids from the above elink result give whole genomic sequence:

http://eutils.ncbi.nlm.nih.gov/entre...&rettype=fasta

The third id gives the gene sequence:

http://eutils.ncbi.nlm.nih.gov/entre...&rettype=fasta

So, what gives? Is there a way to tell which ones are whole genome sequences, and which are gene sequences? Maybe an elink parameter?

~josh
jpearl01 is offline   Reply With Quote
Old 03-17-2014, 06:51 AM   #2
TiborNagy
Senior Member
 
Location: Budapest

Join Date: Mar 2010
Posts: 329
Default

Try to download the sequences in GeneBank format and parse the descriptions.
TiborNagy is offline   Reply With Quote
Old 03-17-2014, 11:05 AM   #3
jpearl01
Member
 
Location: Philadelphia PA

Join Date: Dec 2010
Posts: 17
Default

Well, that is a possibility. However, I would really like it if I didn't have any intermediary steps. What is frustrating is that if I do the same search, but then link to the *protein* database, I get the correct AA sequence almost every time. I'm not sure why I can do that for the protein sequence but not for the nucleotide. Really, I'm wondering if I'm doing something wrong.
jpearl01 is offline   Reply With Quote
Old 03-17-2014, 01:04 PM   #4
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

Note that NCBI has now released Entrez Direct, which has pipeable functionality for downloading via Entrez. There is likely a specific query that can be fed into Entrez direct for the bacterial genomes.
gringer is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:59 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO