Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Slicing genbank file using biopython [problem]

    Hello, i'd like to slice multiple genbank files using biopython at different location:
    slice genome 1 at location 1. I have already coded this :
    >>> ident = 'AE009948','AE009947'
    >>> coor = '1256617:1311411','1973169:2005648'
    >>> for i in ident:
    from Bio import Entrez, SeqIO
    Entrez.email = "[email protected]"
    handle = Entrez.efetch(db="nucleotide", id=i, rettype="gb")
    record = SeqIO.read(handle, "gb")
    >>> for j in coor:
    sub_record = record[j]
    Problem is : i get this error : ValueError: Invalid index
    or TypeError: 'SeqRecord' object is not callable if i try with : sub_record = record(j)
    Can someone help me?
    Thanks by advance

  • #2
    You have a couple problems. Firstly the values inside "coor" are strings, not ranges. So trying to use them directly as ranges won't work. You could try:

    Code:
    coors = [[1256617, 1311411], [1973169, 2005648]]
    for bounds in coors :
        sub_record = record[bounds[0]:bounds[1]]
    and that would likely work. Of course, then you run into the problem that the coordinates you gave are beyond the end of the sequence you retrieved. Also, you ask for two records and then overwrite the first with the second. I presume you want the "foo j in coor :" loop inside the "for i in indent :" loop.

    Comment


    • #3
      Yeah you are right, i want to loop "for bounds in coors" inside the "for i in ident", i ll try your suggestion then tell you if it works. Anyway, i thank you for you time !

      Edit : i want to slice the first genome with locations one, then slice second genome with the second locations ( later it will be 100 genome and 100 locations )

      Anyway i'll try with your suggestion and come back later !
      Last edited by CrLs; 01-20-2014, 04:58 AM.

      Comment


      • #4
        Ah, then just get rid of the "for j in coor" loop, since you're already setting the index for coor if you nest that within the "for i in indent" loop.

        Comment


        • #5
          well, at the moment it give me back this error :
          TypeError: slice indices must be integers or None or have an __index__ method
          Should i change my coor for something else ?
          And about to remove the "for j in coor loop" , how can i nest that with the "for i in ident" loop ? something like for i in ident and for bounds in coors: ?

          Again, thanks a lot for your answer !

          Edit : i changed my coor, i forgot to put the '[ ]', my bad !

          Comment


          • #6
            Watch out for different counting conventions when you do the slicing...

            Also, you could ask the NCBI to pre-slice the records when you call Entrez.efetch by including the optional seq_start and seq_end arguments, see: http://www.ncbi.nlm.nih.gov/books/NB...hapter4.EFetch

            Comment


            • #7
              Hello
              Yep thank you, i ll check it !
              Hmm, to use optionals arguments , i should put 3 loops ? one with genome, one with start and one with stop right ?( i want to slice the first genome with the first location, second genome with 2 location ect )

              Comment


              • #8
                I would use ONE loop, something like this:

                Code:
                from Bio import Entrez, SeqIO
                Entrez.email = "[email protected]"
                for i, start, end in [('AE009948', 1256617, 1311411),
                                      ('AE009947', 1973169, 2005648)]:
                    print("Fetching %s:%i-%i now..." % (i, start, end))
                    #code here using Entrez.efetch(...)
                Last edited by maubp; 01-20-2014, 09:00 AM. Reason: typo

                Comment


                • #9
                  Well, thanks you for your answer, i'll try your way and the old way, i ll keep the faster ! ( i dont know if one take more memory than the other )
                  Anyway, Thanks you a lot ! I come back with a working code when i'm done with it

                  Comment


                  • #10
                    Ok Peter and Ryan thanks you for your help !
                    This is the working code, get you all the product ( or everything else you need ) between the location you want
                    Code:
                    >>> for i, start, end in [('AE009948', 1256617, 1311411),
                                          ('AE009948', 1973169, 2005648)]:
                    	handle = Entrez.efetch(db="nucleotide", id=i, seq_start=start,
                                seq_stop=end, rettype="gb")
                    	results2 = open('resultsRegion_note.csv', 'a')
                    	for seq_record in SeqIO.parse(handle, "gb"):
                    		results2.write('\n')
                    	for feature in seq_record.features:
                    			if feature.type=="CDS":
                    				results2.write(str(feature.qualifiers.get('product'))[1:-1])
                    	results2.close()
                    feel free to use (even if i think a lot of people can do the same )
                    Last edited by CrLs; 01-20-2014, 09:14 AM.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Advancing Precision Medicine for Rare Diseases in Children
                      by seqadmin




                      Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                      12-16-2024, 07:57 AM
                    • seqadmin
                      Recent Advances in Sequencing Technologies
                      by seqadmin



                      Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                      Long-Read Sequencing
                      Long-read sequencing has seen remarkable advancements,...
                      12-02-2024, 01:49 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 12-17-2024, 10:28 AM
                    0 responses
                    22 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-13-2024, 08:24 AM
                    0 responses
                    42 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-12-2024, 07:41 AM
                    0 responses
                    28 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-11-2024, 07:45 AM
                    0 responses
                    42 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X