![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Get fasta amino-acid BLAST result | aliealexandre | Bioinformatics | 8 | 03-25-2015 04:09 AM |
Finding amino acid from a position in the human genome | kwatts59 | Introductions | 3 | 07-16-2012 06:04 PM |
Biopython, entrez.efetch, how to get results file | umnklang | Bioinformatics | 1 | 06-15-2012 01:54 AM |
Genomic coordinates for amino acid variation ... how ? | niyl_p | Bioinformatics | 1 | 05-24-2012 06:30 PM |
amino acid sequence from GTF file | mhadidi2002 | Bioinformatics | 0 | 03-06-2012 06:03 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Minnesota, United States Join Date: Oct 2011
Posts: 16
|
![]()
I have a list of Entrez Gene IDs (~100) and I would like to obtain the amino acid fastas of each and create a multi-fasta file.
I'm trying to do this using the Entrez.efetch function in biopython but I'm not sure how to retrieve the amino acid sequence from the gene file. Any ideas? |
![]() |
![]() |
![]() |
#2 |
Member
Location: Scotland Join Date: Feb 2011
Posts: 29
|
![]()
An easy way to get the sequence is to ask entrez.efetch() to return a FASTA formatted sequence, as described in the Biopython tutorial at http://biopython.org/DIST/docs/tutor...al.html#htoc55 - note the rettype="fasta" argument. You can then treat this as any other FASTA stream (i.e. as if it were a file).
|
![]() |
![]() |
![]() |
#3 |
Member
Location: Minnesota, United States Join Date: Oct 2011
Posts: 16
|
![]()
That should work perfectly.
Can biopython convert ids? For example from entrez GeneIDs to protein accession numbers? |
![]() |
![]() |
![]() |
#4 |
Peter (Biopython etc)
Location: Dundee, Scotland, UK Join Date: Jul 2009
Posts: 1,543
|
![]()
The NCBI can convert the gene IDs to protein IDs, try Entrez link (elink). See also:
http://lists.open-bio.org/pipermail/...st/005472.html |
![]() |
![]() |
![]() |
#5 |
Member
Location: Minnesota, United States Join Date: Oct 2011
Posts: 16
|
![]()
Ok so using the tutorial, I developed the following code (using trial and error):
from Bio import Entrez from Bio import SeqIO Entrez.email = "my_name@my_website.com" id_list = set(open('pids_test.csv', 'rU')) handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", \ id=id_list) for seq_record in SeqIO.parse(handle, "fasta"): print ">" + seq_record.id, seq_record.description print seq_record.seq handle.close() this prints exactly what I want. I have two questions: 1) how can I get the results into a text file, rather than printing them in my output? 2) how can I let the user specify the input file (command line is fine)? K |
![]() |
![]() |
![]() |
#6 |
Peter (Biopython etc)
Location: Dundee, Scotland, UK Join Date: Jul 2009
Posts: 1,543
|
![]()
To save the NCBI FASTA formatted data to a file, try something like this:
Code:
from Bio import Entrez from Bio import SeqIO Entrez.email = "my_name@my_website.com" id_list = set(open('pids_test.csv', 'rU')) handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", \ id=id_list) out_handle = open("saved.fasta", "w") for line in handle: out_handle.write(line) out_handle.close() handle.close() http://biopython.org/DIST/docs/tutorial/Tutorial.html If you want to take the filename from the command line, learn about sys.argv, while to prompt the user try the input function or similar. Any good introduction to Python should cover this. Last edited by maubp; 01-08-2013 at 09:34 AM. Reason: Added link |
![]() |
![]() |
![]() |
#7 |
Member
Location: Minnesota, United States Join Date: Oct 2011
Posts: 16
|
![]()
Worked great, added sys.argv to allow user to specify file input and output:
import sys from Bio import Entrez from Bio import SeqIO Entrez.email = "xxxxxXXXXXxxxxx" id_list = set(open(sys.argv[1], 'rU')) handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", \ id=id_list) out_handle = open(sys.argv[2], 'w') for line in handle : out_handle.write(line) out_handle.close() handle.close() |
![]() |
![]() |
![]() |
Tags |
biopython |
Thread Tools | |
|
|