Seqanswers Leaderboard Ad

**LeightonP** · 01-04-2013, 03:56 PM

An easy way to get the sequence is to ask entrez.efetch() to return a FASTA formatted sequence, as described in the Biopython tutorial at http://biopython.org/DIST/docs/tutor...al.html#htoc55 - note the rettype="fasta" argument. You can then treat this as any other FASTA stream (i.e. as if it were a file).

**umnklang** · 01-05-2013, 12:59 PM

That should work perfectly.

Can biopython convert ids? For example from entrez GeneIDs to protein accession numbers?

**maubp** · 01-06-2013, 07:22 AM

The NCBI can convert the gene IDs to protein IDs, try Entrez link (elink). See also:

[Biopython] Finding protein ID using Entrez.efetch

http://lists.open-bio.org/pipermail/biopython/2009-August/005472.html

**umnklang** · 01-08-2013, 09:15 AM

Ok so using the tutorial, I developed the following code (using trial and error):

from Bio import Entrez
from Bio import SeqIO
Entrez.email = "my_name@my_website.com"
id_list = set(open('pids_test.csv', 'rU'))
handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", \
id=id_list)
for seq_record in SeqIO.parse(handle, "fasta"):
print ">" + seq_record.id, seq_record.description
print seq_record.seq
handle.close()

this prints exactly what I want. I have two questions:

1) how can I get the results into a text file, rather than printing them in my output?

2) how can I let the user specify the input file (command line is fine)?

K

**maubp** · 01-08-2013, 09:30 AM

To save the NCBI FASTA formatted data to a file, try something like this:

Code:

from Bio import Entrez
from Bio import SeqIO
Entrez.email = "my_name@my_website.com"
id_list = set(open('pids_test.csv', 'rU'))
handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", \
id=id_list)	
out_handle = open("saved.fasta", "w")
for line in handle:
    out_handle.write(line)
out_handle.close()
handle.close()

P.S. There a very similar example in the Biopython Tutorial in the section "EFetch: Downloading full records from Entrez"

Biopython Tutorial and Cookbook

http://biopython.org/DIST/docs/tutorial/Tutorial.html

If you want to take the filename from the command line, learn about sys.argv, while to prompt the user try the input function or similar. Any good introduction to Python should cover this.

**umnklang** · 01-08-2013, 10:00 AM

Worked great, added sys.argv to allow user to specify file input and output:

import sys
from Bio import Entrez
from Bio import SeqIO
Entrez.email = "xxxxxXXXXXxxxxx"
id_list = set(open(sys.argv[1], 'rU'))
handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", \
id=id_list)
out_handle = open(sys.argv[2], 'w')

for line in handle :
out_handle.write(line)
out_handle.close()

handle.close()

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 27 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Biopython - want to get a batch of amino acid fastas from list of entrez gene_ids

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News