SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
Create .bed file from Genbank file ksw9 Bioinformatics 4 09-07-2014 02:52 PM
covert sequin format file to genbank file bagaric Bioinformatics 1 04-21-2014 09:14 PM
modify a genbank file through vcf file infrmations Gorbenzer Bioinformatics 0 12-13-2013 07:21 AM
Adding features to a multi entry genbank yekwah Bioinformatics 5 11-11-2013 01:56 PM
merging multigenbank file of contigs in one genbank file-input to Artemis J12 Bioinformatics 2 05-10-2013 11:29 AM

Reply
 
Thread Tools
Old 07-28-2021, 10:30 AM   #1
mdom88
Junior Member
 
Location: San Francisco

Join Date: Jul 2021
Posts: 1
Question Adding Features to a GenBank File - Any Help?

Hi all!

I'm currently working with a multi entry genbank file and two dataframes in order to add new qualifiers to the genbank file via its key/value system.

Each entry is a contig, and so far I am able to add new keys on top of the already existing "locus_tag" and "translation" for each entry. However, I am having some difficulties adding values from the dataframes to each contigs.

Each dataframe is made of 3 columns but holds 6000+ lines of data. I am able to insert one specific column into the genbank file, but these 6000 lines print for every contig.

I've tried making a for loop, but the 6000 lines continue to print and I am not sure what else to do. Any help would be greatly appreciated!


This is the code I am working with:

import os
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.SeqFeature import SeqFeature, FeatureLocation
from Bio import SeqIO
import pandas as pd
df1 = pd.read_csv("-f_besthit.csv")
df2 = pd.read_csv("-f_filtered.csv")
annotation_handle = open("META.gbk","r")
recs = [rec for rec in SeqIO.parse("META.gbk", "genbank")]
my_start_pos = (2)
my_end_pos = (6)
my_feature_location = FeatureLocation(my_start_pos,my_end_pos)
for rec in recs:

my_feature_type = "CDS"
full_product={"full_product":"df_best[full_product]","complement":"(1..423)", "locus_tag":"contig", "besthit":"df_fil[besthit]"}
my_feature = SeqFeature(my_feature_location,type=my_feature_type, qualifiers=full_product)
besthit={"besthit":"df_fil[besthit]"}
my_feature_one = SeqFeature(my_feature_location,type=my_feature_type, qualifiers=besthit)

rec.features.append(my_feature)
feats = [feat for feat in rec.features if feat.type == "CDS"]
for feat in feats:
print(feat)
for record in SeqIO.parse(annotation_handle,"genbank"):

a = len(record.features)

for_rast = open("META.gbk","w")
x = 0
final_features = []

for f in record.features:
if f.type == "CDS":
f.qualifiers["full_product"] = "%s_%s" % (df2.loc[:,"besthit"], x+1)
x += 1
for f in record.features:
if f.qualifiers["full_product"] == df2.loc[:"orf"]:
final_features.append(f)
else:
pass


record.features = final_features
with open("META.gbk","w") as for_rast:
SeqIO.write(record, for_rast, "genbank")
mdom88 is offline   Reply With Quote
Reply

Tags
biopython, dataframe, genbank, python

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:46 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO