SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
GFF3 to GenBank convert sphil Bioinformatics 4 05-18-2012 08:59 AM
cuffdiff does not output all the CDS in cds.FPKM.tracking file xiangq Bioinformatics 20 04-26-2012 12:39 PM
GFF 2 genbank converter deMan Bioinformatics 3 02-16-2012 02:33 PM
genbank2gff.pl (Genbank 2 GFF problem) mcastell Bioinformatics 1 12-16-2011 07:26 AM
Assembled sequence submission to Genbank? Melissa General 0 04-26-2011 01:54 AM

Reply
 
Thread Tools
Old 11-05-2012, 04:24 AM   #1
thedamian
Member
 
Location: Barcelona

Join Date: Feb 2012
Posts: 49
Default GenBank files. Pairing mRNA with CDS.

Hi,

I have a GenBank file, that contains several mRNA and CDS.
I'd like to pull from that file pairs of mRNA and CDS.

For example:
Having a .gb file for NF1 gene
http://www.ncbi.nlm.nih.gov/nuccore/...report=genbank
I know, that mRNA with ID NM_000267.3 has corresponding CDS with ID NP_000258.1

I know it because a tag of mRNA: /product="neurofibromin 1, transcript variant 2"
describes "/product" tag of CDS: /product="neurofibromin isoform 2"

I use Perl's Bio::SeqIO for parsing .gb files. I can pull all main tags like mRNA and CDS, but I don't see any way how to combine them in pairs.

Thanks for suggestions.
thedamian is offline   Reply With Quote
Old 11-05-2012, 08:32 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

The pairing is implicit in GenBank format by virtue of the order of the features. The may be a BioPerl option to deduce this since BioPerl can IIRC do GenBank to GFF3 conversion.

P.S. If you used GFF3 the parent/child relationship would be explicit - the NCBI have fixed their GFF3 since I wrote this: http://blastedbio.blogspot.co.uk/201...ll-broken.html
maubp is offline   Reply With Quote
Old 11-06-2012, 06:54 AM   #3
thedamian
Member
 
Location: Barcelona

Join Date: Feb 2012
Posts: 49
Default

heh, I don't see this order
for example here:
http://www.ncbi.nlm.nih.gov/nuccore/NG_017013.1

mRNA with /transcript_id="NM_001126118.1" is the 5th mRNA from the top.
Corresponding CDS is 9th from the top.

Where is an order here?
thedamian is offline   Reply With Quote
Old 11-06-2012, 06:59 AM   #4
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by thedamian View Post
heh, I don't see this order
Oh, you've got multiple transcripts. That does complicate life
maubp is offline   Reply With Quote
Old 11-06-2012, 08:25 AM   #5
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

I am not sure if this might help ...

The gene2accession file from NCBI [ fetchable using wget like this ...
wget -nc ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz
and then ungzip ] contains the mrna and the protein acessions.

Eg.
grep NP_000258 gene2accession | grep GRCh37
9606 4763 REVIEWED NM_000267.3 270132515 NP_000258.1 4557793 NC_000017.10 224589808 29421944 29704694 + Reference GRCh37.p9 Primary Assembly
9606 4763 REVIEWED NM_000267.3 270132515 NP_000258.1 4557793 NT_010799.15 224514948 4158938 4441688 + Reference GRCh37.p9 Primary Assembly
Richard Finney is offline   Reply With Quote
Old 11-08-2012, 12:56 AM   #6
thedamian
Member
 
Location: Barcelona

Join Date: Feb 2012
Posts: 49
Default

Thank you Richard!
Seems it is what I need!
thedamian is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:26 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO