Seqanswers Leaderboard Ad

**LeightonP** · 01-04-2013, 03:56 PM

An easy way to get the sequence is to ask entrez.efetch() to return a FASTA formatted sequence, as described in the Biopython tutorial at http://biopython.org/DIST/docs/tutor...al.html#htoc55 - note the rettype="fasta" argument. You can then treat this as any other FASTA stream (i.e. as if it were a file).

**umnklang** · 01-05-2013, 12:59 PM

That should work perfectly.

Can biopython convert ids? For example from entrez GeneIDs to protein accession numbers?

**maubp** · 01-06-2013, 07:22 AM

The NCBI can convert the gene IDs to protein IDs, try Entrez link (elink). See also:

[Biopython] Finding protein ID using Entrez.efetch

http://lists.open-bio.org/pipermail/biopython/2009-August/005472.html

**umnklang** · 01-08-2013, 09:15 AM

Ok so using the tutorial, I developed the following code (using trial and error):

from Bio import Entrez
from Bio import SeqIO
Entrez.email = "my_name@my_website.com"
id_list = set(open('pids_test.csv', 'rU'))
handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", \
id=id_list)
for seq_record in SeqIO.parse(handle, "fasta"):
print ">" + seq_record.id, seq_record.description
print seq_record.seq
handle.close()

this prints exactly what I want. I have two questions:

1) how can I get the results into a text file, rather than printing them in my output?

2) how can I let the user specify the input file (command line is fine)?

K

**maubp** · 01-08-2013, 09:30 AM

To save the NCBI FASTA formatted data to a file, try something like this:

Code:

from Bio import Entrez
from Bio import SeqIO
Entrez.email = "my_name@my_website.com"
id_list = set(open('pids_test.csv', 'rU'))
handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", \
id=id_list)	
out_handle = open("saved.fasta", "w")
for line in handle:
    out_handle.write(line)
out_handle.close()
handle.close()

P.S. There a very similar example in the Biopython Tutorial in the section "EFetch: Downloading full records from Entrez"

Biopython Tutorial and Cookbook

http://biopython.org/DIST/docs/tutorial/Tutorial.html

If you want to take the filename from the command line, learn about sys.argv, while to prompt the user try the input function or similar. Any good introduction to Python should cover this.

**umnklang** · 01-08-2013, 10:00 AM

Worked great, added sys.argv to allow user to specify file input and output:

import sys
from Bio import Entrez
from Bio import SeqIO
Entrez.email = "xxxxxXXXXXxxxxx"
id_list = set(open(sys.argv[1], 'rU'))
handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", \
id=id_list)
out_handle = open(sys.argv[2], 'w')

for line in handle :
out_handle.write(line)
out_handle.close()

handle.close()

Topics	Statistics	Last Post
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, 05-10-2024, 06:35 AM	0 responses 20 views 0 likes	Last Post by seqadmin 05-10-2024, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 26 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 21 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 21 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM

Seqanswers Leaderboard Ad

Announcement

Biopython - want to get a batch of amino acid fastas from list of entrez gene_ids

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News