Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Obtaining UCSC Genomic sequence Given Genomic Coordinates modi2020 Bioinformatics 0 12-03-2012 08:45 PM
nucleotide sequence extraction struggler Bioinformatics 12 05-18-2012 10:14 AM
Obtaining unique sequence tag file from fastQ format ramadatta.88 Introductions 0 09-26-2011 02:25 AM
Bias toward G in first nucleotide in sequence? sem Sample Prep / Library Generation 0 01-16-2009 12:54 PM

Thread Tools
Old 03-15-2014, 11:07 PM   #1
Location: Philadelphia PA

Join Date: Dec 2010
Posts: 17
Default Obtaining nucleotide gene sequence of bacteria using eutils

Hi All,

I'm trying to write a quick program to download a bunch of genes' dna from bacteria. I don't have ids for the genes, just their protein names (like lacz or what-have-you). So use esearch to look for say the lacz bacteria gene from the gene database:[filter]

then, I grab the first id in the list that is returned, and convert it from a gene id to a nucleotide using elink

Which gives back a list of different ids. Most of the ids (using efetch to get the sequence) generally give back the entire genomic sequence of the organism it was found in. *sometimes* one of the id's gives back the actual gene nucleotide data, but not always. For example the first two ids from the above elink result give whole genomic sequence:

The third id gives the gene sequence:

So, what gives? Is there a way to tell which ones are whole genome sequences, and which are gene sequences? Maybe an elink parameter?

jpearl01 is offline   Reply With Quote
Old 03-17-2014, 06:51 AM   #2
Senior Member
Location: Budapest

Join Date: Mar 2010
Posts: 329

Try to download the sequences in GeneBank format and parse the descriptions.
TiborNagy is offline   Reply With Quote
Old 03-17-2014, 11:05 AM   #3
Location: Philadelphia PA

Join Date: Dec 2010
Posts: 17

Well, that is a possibility. However, I would really like it if I didn't have any intermediary steps. What is frustrating is that if I do the same search, but then link to the *protein* database, I get the correct AA sequence almost every time. I'm not sure why I can do that for the protein sequence but not for the nucleotide. Really, I'm wondering if I'm doing something wrong.
jpearl01 is offline   Reply With Quote
Old 03-17-2014, 01:04 PM   #4
David Eccles (gringer)
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838

Note that NCBI has now released Entrez Direct, which has pipeable functionality for downloading via Entrez. There is likely a specific query that can be fed into Entrez direct for the bacterial genomes.
gringer is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 08:59 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO