![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Obtaining UCSC Genomic sequence Given Genomic Coordinates | modi2020 | Bioinformatics | 0 | 12-03-2012 08:45 PM |
nucleotide sequence extraction | struggler | Bioinformatics | 12 | 05-18-2012 10:14 AM |
Obtaining unique sequence tag file from fastQ format | ramadatta.88 | Introductions | 0 | 09-26-2011 02:25 AM |
Bias toward G in first nucleotide in sequence? | sem | Sample Prep / Library Generation | 0 | 01-16-2009 12:54 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Philadelphia PA Join Date: Dec 2010
Posts: 17
|
![]()
Hi All,
I'm trying to write a quick program to download a bunch of genes' dna from bacteria. I don't have ids for the genes, just their protein names (like lacz or what-have-you). So use esearch to look for say the lacz bacteria gene from the gene database: http://eutils.ncbi.nlm.nih.gov/entre...p+AND+bacteria[filter] then, I grab the first id in the list that is returned, and convert it from a gene id to a nucleotide using elink http://eutils.ncbi.nlm.nih.gov/entre...tide&id=945006 Which gives back a list of different ids. Most of the ids (using efetch to get the sequence) generally give back the entire genomic sequence of the organism it was found in. *sometimes* one of the id's gives back the actual gene nucleotide data, but not always. For example the first two ids from the above elink result give whole genomic sequence: http://eutils.ncbi.nlm.nih.gov/entre...&rettype=fasta The third id gives the gene sequence: http://eutils.ncbi.nlm.nih.gov/entre...&rettype=fasta So, what gives? Is there a way to tell which ones are whole genome sequences, and which are gene sequences? Maybe an elink parameter? ~josh |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Budapest Join Date: Mar 2010
Posts: 329
|
![]()
Try to download the sequences in GeneBank format and parse the descriptions.
|
![]() |
![]() |
![]() |
#3 |
Member
Location: Philadelphia PA Join Date: Dec 2010
Posts: 17
|
![]()
Well, that is a possibility. However, I would really like it if I didn't have any intermediary steps. What is frustrating is that if I do the same search, but then link to the *protein* database, I get the correct AA sequence almost every time. I'm not sure why I can do that for the protein sequence but not for the nucleotide. Really, I'm wondering if I'm doing something wrong.
|
![]() |
![]() |
![]() |
#4 |
David Eccles (gringer)
Location: Wellington, New Zealand Join Date: May 2011
Posts: 838
|
![]()
Note that NCBI has now released Entrez Direct, which has pipeable functionality for downloading via Entrez. There is likely a specific query that can be fed into Entrez direct for the bacterial genomes.
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|