SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
OrthoMCL duplicate entry error flipwell Bioinformatics 6 09-24-2013 06:48 AM
Tophat - reporting only best hit given multi-reads/multi-maps. john_nl Bioinformatics 1 07-05-2012 12:15 PM
Looking for an Entry Level position bnfoguy Academic/Non-Profit Jobs 0 06-30-2012 06:46 PM
Multi-Genome Analysis Industry Session Features Don Gregory of GenomeQuest GenomeQuest Vendor Forum 0 09-20-2010 05:23 AM
Cuffdiff multi-protein vs multi-promoter RockChalkJayhawk Bioinformatics 2 03-26-2010 10:26 AM

Reply
 
Thread Tools
Old 11-10-2013, 07:29 PM   #1
yekwah
Junior Member
 
Location: Melbourne, Victoria

Join Date: Apr 2013
Posts: 6
Default Adding features to a multi entry genbank

I'm attempting to add features using BioPython to a multi-entry Genbank. Each entry in the genbank is a contig, and I'd like to annotate some BLAST hits.

I've performed a BLAST of my query against a multi-entry fasta, so I know which hits belong to which contig and what the contig is called. I'm able to parse my Genbank and find which contig I'd like to add the feature to. I then append the feature, but I'm problems at the final stage where I attempt to then write out the entire Genbank with the new feature added to one of the entries.

This is my code so far:


Code:
>>> handle = open("D13_multi.gbk", "rU")
>>> genbank = SeqIO.parse(handle, "genbank")
>>> for record in genbank:
...     if record.id == node:
                record.features.append(SeqFeature.SeqFeature(SeqFeature.FeatureLocation(start, end), type = "misc_feature"))
...             print record.features
...
[SeqFeature(FeatureLocation(ExactPosition(0), ExactPosition(35), strand=1), type='misc_feature'), SeqFeature(FeatureLocation(ExactPosition(652), ExactPosition(145)), type='misc_feature')]
After this for loop, SeqIO.write() writes 0 features to the new file I specify.
If I do this - SeqIO.write(genbank, "D13_multi_edited.gbk", "genbank"), it writes out all the entries after the the contig that I've just append a feature to.

I've also tried this:


Code:
>>> for record in genbank:
...     if record.id != node:
...             SeqIO.write(genbank, "D13_test.gbk", "genbank")
...     if record.id == node:
...             record.features.append(SeqFeature.SeqFeature(SeqFeature.FeatureLocation(start, end), type = "misc_feature"))
...             SeqIO.write(record, "D13_test.gbk", "genbank")
...
109
In this case it misses the first entry (there should be 110 entries) and writes out the entry that I've added a feature to but without the feature being added.

What am I doing wrong?

Last edited by yekwah; 11-11-2013 at 12:37 PM. Reason: Attempted to fix indenting for code
yekwah is offline   Reply With Quote
Old 11-11-2013, 03:10 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,961
Default

When editing a post choose "Edit" --> "Go Advanced" button at the bottom of the edit window. In the subsequent window that opens up use the mose to highlight the text you to designate as "code" then use the "#" button in the edit menu to add
Code:
$ code_example
to properly format your code.

For example:
Code:
>>> handle = open("D13_multi.gbk", "rU")
>>> genbank = SeqIO.parse(handle, "genbank")
>>> for record in genbank:
... if record.id == node:
record.features.append(SeqFeature.SeqFeature(SeqFeature.FeatureLocation(start, end), type = "misc_feature"))
... print record.features
...
[SeqFeature(FeatureLocation(ExactPosition(0), ExactPosition(35), strand=1), type='misc_feature'), SeqFeature(FeatureLocation(ExactPosition(652), ExactPosition(145)), type='misc_feature')]
Try it on the post above.

Last edited by GenoMax; 11-11-2013 at 03:12 AM.
GenoMax is offline   Reply With Quote
Old 11-11-2013, 09:29 AM   #3
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,541
Default

IF you can add the [ code ] ... [ /code ] tags it would help, one possibility is you keep calling SeqIO.write and each time over-write it with a singe record?

Last edited by maubp; 11-11-2013 at 09:31 AM.
maubp is offline   Reply With Quote
Old 11-11-2013, 12:40 PM   #4
yekwah
Junior Member
 
Location: Melbourne, Victoria

Join Date: Apr 2013
Posts: 6
Default

Thanks. I've fixed the indentation for the code now, so hopefully that will help.

I'm pretty certain that if I keep calling SeqIO.write that I overwrite whatever I had already. It seems like when I open the multi entry genbank with SeqIO.parse, it gets to the end of the file and then doesn't go back to the start. So in the for loop, I loop over all the records, get to the record I want and then add the feature, then it loops over the rest of the records. Once the for loop is done, I call SeqIO.write, but there isn't anything to write because its gotten to the end of the file.

So I'm just not sure how to write out my edited genbank with the features I want added in.
yekwah is offline   Reply With Quote
Old 11-11-2013, 12:54 PM   #5
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,541
Default

OK, keeping this simple (but not memory efficient), load all the records into memory as a list, modify the list, save list of records:

Code:
from Bio import SeqIO
from Bio.SeqFeature import SeqFeature, FeatureLocation
node = ...
records = list(SeqIO.parse("D13_multi.gbk", "gb"))
for record in records:
    if record.id == node:
        record.features.append(SeqFeature(FeatureLocation(start, end), type = "misc_feature"))
SeqIO.write(records, "D13_edited.gbk", "gb")
Getting a bit more clever, avoid loading all the records into memory, loop over the file once - and writing out the modified file one record at a time in the main loop:

Code:
from Bio import SeqIO
from Bio.SeqFeature import SeqFeature, FeatureLocation
node = ...
handle = open("D13_edited.gbk", "w")
for record in SeqIO.parse("D13_multi.gbk", "gb"):
    if record.id == node:
        record.features.append(SeqFeature(FeatureLocation(start, end), type = "misc_feature"))
    SeqIO.write(record, handle, "gb")
handle.close()
If you want to do this in a memory efficient way (like the second version) but also making one call to SeqIO.write, you could define a generator function, or use a generator expression with a custom function to annotate an individual SeqRecord. But I'm guessing that might be a little too complicated for now?
maubp is offline   Reply With Quote
Old 11-11-2013, 01:56 PM   #6
yekwah
Junior Member
 
Location: Melbourne, Victoria

Join Date: Apr 2013
Posts: 6
Default

Thank you so much for this!

The second option seems to be the best, though I would like to be as memory efficient as possible. I'm not entirely sure what a generator function is, so that might be a little complicated for the moment.
yekwah is offline   Reply With Quote
Reply

Tags
annotation, biopython, genbank

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:55 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO