SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Partitioning or slicing up a BAM file nimrod337 General 4 01-03-2014 09:26 AM
merging multigenbank file of contigs in one genbank file-input to Artemis J12 Bioinformatics 2 05-10-2013 11:29 AM
Biopython, entrez.efetch, how to get results file umnklang Bioinformatics 1 06-15-2012 12:54 AM
Extracting (Slicing) a BAM file ashkot Bioinformatics 4 12-13-2011 11:02 AM
Can Biopython parse fastq file? ardmore Bioinformatics 2 11-29-2011 02:43 PM

Reply
 
Thread Tools
Old 01-20-2014, 01:45 AM   #1
CrLs
Junior Member
 
Location: France

Join Date: Jan 2014
Posts: 6
Default Slicing genbank file using biopython [problem]

Hello, i'd like to slice multiple genbank files using biopython at different location:
slice genome 1 at location 1. I have already coded this :
>>> ident = 'AE009948','AE009947'
>>> coor = '1256617:1311411','1973169:2005648'
>>> for i in ident:
from Bio import Entrez, SeqIO
Entrez.email = "anothermail@mail.com"
handle = Entrez.efetch(db="nucleotide", id=i, rettype="gb")
record = SeqIO.read(handle, "gb")
>>> for j in coor:
sub_record = record[j]
Problem is : i get this error : ValueError: Invalid index
or TypeError: 'SeqRecord' object is not callable if i try with : sub_record = record(j)
Can someone help me?
Thanks by advance
CrLs is offline   Reply With Quote
Old 01-20-2014, 02:57 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

You have a couple problems. Firstly the values inside "coor" are strings, not ranges. So trying to use them directly as ranges won't work. You could try:

Code:
coors = [[1256617, 1311411], [1973169, 2005648]]
for bounds in coors :
    sub_record = record[bounds[0]:bounds[1]]
and that would likely work. Of course, then you run into the problem that the coordinates you gave are beyond the end of the sequence you retrieved. Also, you ask for two records and then overwrite the first with the second. I presume you want the "foo j in coor :" loop inside the "for i in indent :" loop.
dpryan is offline   Reply With Quote
Old 01-20-2014, 03:55 AM   #3
CrLs
Junior Member
 
Location: France

Join Date: Jan 2014
Posts: 6
Default

Yeah you are right, i want to loop "for bounds in coors" inside the "for i in ident", i ll try your suggestion then tell you if it works. Anyway, i thank you for you time !

Edit : i want to slice the first genome with locations one, then slice second genome with the second locations ( later it will be 100 genome and 100 locations )

Anyway i'll try with your suggestion and come back later !

Last edited by CrLs; 01-20-2014 at 03:58 AM.
CrLs is offline   Reply With Quote
Old 01-20-2014, 04:06 AM   #4
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Ah, then just get rid of the "for j in coor" loop, since you're already setting the index for coor if you nest that within the "for i in indent" loop.
dpryan is offline   Reply With Quote
Old 01-20-2014, 04:16 AM   #5
CrLs
Junior Member
 
Location: France

Join Date: Jan 2014
Posts: 6
Default

well, at the moment it give me back this error :
TypeError: slice indices must be integers or None or have an __index__ method
Should i change my coor for something else ?
And about to remove the "for j in coor loop" , how can i nest that with the "for i in ident" loop ? something like for i in ident and for bounds in coors: ?

Again, thanks a lot for your answer !

Edit : i changed my coor, i forgot to put the '[ ]', my bad !
CrLs is offline   Reply With Quote
Old 01-20-2014, 06:28 AM   #6
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Watch out for different counting conventions when you do the slicing...

Also, you could ask the NCBI to pre-slice the records when you call Entrez.efetch by including the optional seq_start and seq_end arguments, see: http://www.ncbi.nlm.nih.gov/books/NB...hapter4.EFetch
maubp is offline   Reply With Quote
Old 01-20-2014, 06:56 AM   #7
CrLs
Junior Member
 
Location: France

Join Date: Jan 2014
Posts: 6
Default

Hello
Yep thank you, i ll check it !
Hmm, to use optionals arguments , i should put 3 loops ? one with genome, one with start and one with stop right ?( i want to slice the first genome with the first location, second genome with 2 location ect )
CrLs is offline   Reply With Quote
Old 01-20-2014, 07:01 AM   #8
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

I would use ONE loop, something like this:

Code:
from Bio import Entrez, SeqIO
Entrez.email = "anothermail@mail.com"
for i, start, end in [('AE009948', 1256617, 1311411),
                      ('AE009947', 1973169, 2005648)]:
    print("Fetching %s:%i-%i now..." % (i, start, end))
    #code here using Entrez.efetch(...)

Last edited by maubp; 01-20-2014 at 08:00 AM. Reason: typo
maubp is offline   Reply With Quote
Old 01-20-2014, 07:09 AM   #9
CrLs
Junior Member
 
Location: France

Join Date: Jan 2014
Posts: 6
Default

Well, thanks you for your answer, i'll try your way and the old way, i ll keep the faster ! ( i dont know if one take more memory than the other )
Anyway, Thanks you a lot ! I come back with a working code when i'm done with it
CrLs is offline   Reply With Quote
Old 01-20-2014, 07:52 AM   #10
CrLs
Junior Member
 
Location: France

Join Date: Jan 2014
Posts: 6
Default

Ok Peter and Ryan thanks you for your help !
This is the working code, get you all the product ( or everything else you need ) between the location you want
Code:
>>> for i, start, end in [('AE009948', 1256617, 1311411),
                      ('AE009948', 1973169, 2005648)]:
	handle = Entrez.efetch(db="nucleotide", id=i, seq_start=start,
            seq_stop=end, rettype="gb")
	results2 = open('resultsRegion_note.csv', 'a')
	for seq_record in SeqIO.parse(handle, "gb"):
		results2.write('\n')
	for feature in seq_record.features:
			if feature.type=="CDS":
				results2.write(str(feature.qualifiers.get('product'))[1:-1])
	results2.close()
feel free to use (even if i think a lot of people can do the same )

Last edited by CrLs; 01-20-2014 at 08:14 AM.
CrLs is offline   Reply With Quote
Reply

Tags
biopython, genbank, python, slicing

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:11 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO