SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Refseq ID for mRNA and protein sehrrot Bioinformatics 0 03-25-2013 04:54 AM
NCBI RefSeq "unclassified transcription discrepancy" husamia Bioinformatics 0 01-04-2012 11:48 AM
BWA parameters for mRNA-seq aligning against mRNA refseq kwicher SOLiD 1 09-19-2011 03:45 AM
Multiple gene entries in cuffdiff output ae_ucla RNA Sequencing 1 10-22-2010 03:54 PM

Reply
 
Thread Tools
Old 05-09-2014, 12:31 AM   #1
beeman
Member
 
Location: Australia

Join Date: May 2012
Posts: 20
Default How does NCBI populate data fro gene entries? I want to get all refseq mRNA..

Hi All,

I'm just wondering if anyone can shed light on how to obtain the latest annotations of a given organism from NCBI, and more specifically how to get all of the current transcript variants that are listed as refseq's..

I'm working on honeybees and I've grabbed the latest gff flatfile annotations from ftp://ftp.ncbi.nih.gov/genomes/Apis_mellifera/GFF/ but they don't contain all of the current refseq transcripts..

For example, the gene cort (http://www.ncbi.nlm.nih.gov/gene/726912) has three transcript variants listed as refseq entries; XM_006557348.1, XM_001122629.3 and XM_006557349.1, but in the gff annotations the only transcript ID is XM_001122629.2...

Is there any way to build a current set of annotations from the data NCBI uses to populate transcripts for gene records?


Thanks
beeman is offline   Reply With Quote
Old 05-09-2014, 03:19 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

You can do this in a couple of different ways. One would be to get the invertebrate RefSeq data files (ftp://ftp.ncbi.nlm.nih.gov/refseq/release/invertebrate/) and parse out Apis mellifera entries.

A simpler way would be to do a search: http://www.ncbi.nlm.nih.gov/nuccore/...D+srcdb_refseq. Change the "Display settings" to indicate the format you want (GenBank, Fasta etc) and then "Send to" a File. I currently see ~40,500 entries for the search above.

Keep in mind that the above list will likely include all of these RefSeq types (http://www.ncbi.nlm.nih.gov/books/NB...ort=objectonly).

Last edited by GenoMax; 05-09-2014 at 03:21 AM.
GenoMax is offline   Reply With Quote
Old 05-09-2014, 03:25 AM   #3
Ohad
Member
 
Location: Israel TA

Join Date: Jul 2013
Posts: 28
Default

I downloaded ref_Amel_4.5_scaffolds.gff3

And I see XM_006557348.1, XM_001122629.3 and XM_006557349.1 inside...
Ohad is offline   Reply With Quote
Old 05-11-2014, 04:36 PM   #4
beeman
Member
 
Location: Australia

Join Date: May 2012
Posts: 20
Default

Thanks for your replies..

Definitely operator error...

I need to start appending version dates as I've been looking at the wrong flatfile ....
beeman is offline   Reply With Quote
Reply

Tags
annotation, ncbi

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:51 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO