SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Get fasta amino-acid BLAST result aliealexandre Bioinformatics 8 03-25-2015 03:09 AM
Finding amino acid from a position in the human genome kwatts59 Introductions 3 07-16-2012 05:04 PM
Biopython, entrez.efetch, how to get results file umnklang Bioinformatics 1 06-15-2012 12:54 AM
Genomic coordinates for amino acid variation ... how ? niyl_p Bioinformatics 1 05-24-2012 05:30 PM
amino acid sequence from GTF file mhadidi2002 Bioinformatics 0 03-06-2012 05:03 AM

Reply
 
Thread Tools
Old 01-04-2013, 11:43 AM   #1
umnklang
Member
 
Location: Minnesota, United States

Join Date: Oct 2011
Posts: 16
Default Biopython - want to get a batch of amino acid fastas from list of entrez gene_ids

I have a list of Entrez Gene IDs (~100) and I would like to obtain the amino acid fastas of each and create a multi-fasta file.

I'm trying to do this using the Entrez.efetch function in biopython but I'm not sure how to retrieve the amino acid sequence from the gene file.

Any ideas?
umnklang is offline   Reply With Quote
Old 01-04-2013, 02:56 PM   #2
LeightonP
Member
 
Location: Scotland

Join Date: Feb 2011
Posts: 29
Default

An easy way to get the sequence is to ask entrez.efetch() to return a FASTA formatted sequence, as described in the Biopython tutorial at http://biopython.org/DIST/docs/tutor...al.html#htoc55 - note the rettype="fasta" argument. You can then treat this as any other FASTA stream (i.e. as if it were a file).
LeightonP is offline   Reply With Quote
Old 01-05-2013, 11:59 AM   #3
umnklang
Member
 
Location: Minnesota, United States

Join Date: Oct 2011
Posts: 16
Default

That should work perfectly.

Can biopython convert ids? For example from entrez GeneIDs to protein accession numbers?
umnklang is offline   Reply With Quote
Old 01-06-2013, 06:22 AM   #4
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

The NCBI can convert the gene IDs to protein IDs, try Entrez link (elink). See also:
http://lists.open-bio.org/pipermail/...st/005472.html
maubp is offline   Reply With Quote
Old 01-08-2013, 08:15 AM   #5
umnklang
Member
 
Location: Minnesota, United States

Join Date: Oct 2011
Posts: 16
Default

Ok so using the tutorial, I developed the following code (using trial and error):

from Bio import Entrez
from Bio import SeqIO
Entrez.email = "my_name@my_website.com"
id_list = set(open('pids_test.csv', 'rU'))
handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", \
id=id_list)
for seq_record in SeqIO.parse(handle, "fasta"):
print ">" + seq_record.id, seq_record.description
print seq_record.seq
handle.close()

this prints exactly what I want. I have two questions:

1) how can I get the results into a text file, rather than printing them in my output?

2) how can I let the user specify the input file (command line is fine)?

K
umnklang is offline   Reply With Quote
Old 01-08-2013, 08:30 AM   #6
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

To save the NCBI FASTA formatted data to a file, try something like this:

Code:
from Bio import Entrez
from Bio import SeqIO
Entrez.email = "my_name@my_website.com"
id_list = set(open('pids_test.csv', 'rU'))
handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", \
id=id_list)	
out_handle = open("saved.fasta", "w")
for line in handle:
    out_handle.write(line)
out_handle.close()
handle.close()
P.S. There a very similar example in the Biopython Tutorial in the section "EFetch: Downloading full records from Entrez"
http://biopython.org/DIST/docs/tutorial/Tutorial.html

If you want to take the filename from the command line, learn about sys.argv, while to prompt the user try the input function or similar. Any good introduction to Python should cover this.

Last edited by maubp; 01-08-2013 at 08:34 AM. Reason: Added link
maubp is offline   Reply With Quote
Old 01-08-2013, 09:00 AM   #7
umnklang
Member
 
Location: Minnesota, United States

Join Date: Oct 2011
Posts: 16
Default

Worked great, added sys.argv to allow user to specify file input and output:


import sys
from Bio import Entrez
from Bio import SeqIO
Entrez.email = "xxxxxXXXXXxxxxx"
id_list = set(open(sys.argv[1], 'rU'))
handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", \
id=id_list)
out_handle = open(sys.argv[2], 'w')

for line in handle :
out_handle.write(line)
out_handle.close()

handle.close()
umnklang is offline   Reply With Quote
Reply

Tags
biopython

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:25 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO