Hi All,
I'm trying to write a quick program to download a bunch of genes' dna from bacteria. I don't have ids for the genes, just their protein names (like lacz or what-have-you). So use esearch to look for say the lacz bacteria gene from the gene database:
http://eutils.ncbi.nlm.nih.gov/entre...p+AND+bacteria[filter]
then, I grab the first id in the list that is returned, and convert it from a gene id to a nucleotide using elink
Which gives back a list of different ids. Most of the ids (using efetch to get the sequence) generally give back the entire genomic sequence of the organism it was found in. *sometimes* one of the id's gives back the actual gene nucleotide data, but not always. For example the first two ids from the above elink result give whole genomic sequence:
The third id gives the gene sequence:
So, what gives? Is there a way to tell which ones are whole genome sequences, and which are gene sequences? Maybe an elink parameter?
~josh
I'm trying to write a quick program to download a bunch of genes' dna from bacteria. I don't have ids for the genes, just their protein names (like lacz or what-have-you). So use esearch to look for say the lacz bacteria gene from the gene database:
http://eutils.ncbi.nlm.nih.gov/entre...p+AND+bacteria[filter]
then, I grab the first id in the list that is returned, and convert it from a gene id to a nucleotide using elink
Which gives back a list of different ids. Most of the ids (using efetch to get the sequence) generally give back the entire genomic sequence of the organism it was found in. *sometimes* one of the id's gives back the actual gene nucleotide data, but not always. For example the first two ids from the above elink result give whole genomic sequence:
The third id gives the gene sequence:
So, what gives? Is there a way to tell which ones are whole genome sequences, and which are gene sequences? Maybe an elink parameter?
~josh
Comment