Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Biopython - want to get a batch of amino acid fastas from list of entrez gene_ids

    I have a list of Entrez Gene IDs (~100) and I would like to obtain the amino acid fastas of each and create a multi-fasta file.

    I'm trying to do this using the Entrez.efetch function in biopython but I'm not sure how to retrieve the amino acid sequence from the gene file.

    Any ideas?

  • #2
    An easy way to get the sequence is to ask entrez.efetch() to return a FASTA formatted sequence, as described in the Biopython tutorial at http://biopython.org/DIST/docs/tutor...al.html#htoc55 - note the rettype="fasta" argument. You can then treat this as any other FASTA stream (i.e. as if it were a file).

    Comment


    • #3
      That should work perfectly.

      Can biopython convert ids? For example from entrez GeneIDs to protein accession numbers?

      Comment


      • #4
        The NCBI can convert the gene IDs to protein IDs, try Entrez link (elink). See also:

        Comment


        • #5
          Ok so using the tutorial, I developed the following code (using trial and error):

          from Bio import Entrez
          from Bio import SeqIO
          Entrez.email = "my_name@my_website.com"
          id_list = set(open('pids_test.csv', 'rU'))
          handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", \
          id=id_list)
          for seq_record in SeqIO.parse(handle, "fasta"):
          print ">" + seq_record.id, seq_record.description
          print seq_record.seq
          handle.close()

          this prints exactly what I want. I have two questions:

          1) how can I get the results into a text file, rather than printing them in my output?

          2) how can I let the user specify the input file (command line is fine)?

          K

          Comment


          • #6
            To save the NCBI FASTA formatted data to a file, try something like this:

            Code:
            from Bio import Entrez
            from Bio import SeqIO
            Entrez.email = "my_name@my_website.com"
            id_list = set(open('pids_test.csv', 'rU'))
            handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", \
            id=id_list)	
            out_handle = open("saved.fasta", "w")
            for line in handle:
                out_handle.write(line)
            out_handle.close()
            handle.close()
            P.S. There a very similar example in the Biopython Tutorial in the section "EFetch: Downloading full records from Entrez"


            If you want to take the filename from the command line, learn about sys.argv, while to prompt the user try the input function or similar. Any good introduction to Python should cover this.
            Last edited by maubp; 01-08-2013, 09:34 AM. Reason: Added link

            Comment


            • #7
              Worked great, added sys.argv to allow user to specify file input and output:


              import sys
              from Bio import Entrez
              from Bio import SeqIO
              Entrez.email = "xxxxxXXXXXxxxxx"
              id_list = set(open(sys.argv[1], 'rU'))
              handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", \
              id=id_list)
              out_handle = open(sys.argv[2], 'w')

              for line in handle :
              out_handle.write(line)
              out_handle.close()

              handle.close()

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Advances in Sequencing Analysis Tools
                by seqadmin


                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                05-06-2024, 07:48 AM
              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 05-10-2024, 06:35 AM
              0 responses
              20 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-09-2024, 02:46 PM
              0 responses
              26 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-07-2024, 06:57 AM
              0 responses
              21 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-06-2024, 07:17 AM
              0 responses
              21 views
              0 likes
              Last Post seqadmin  
              Working...
              X