View Single Post
Old 11-01-2016, 07:34 AM   #2
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

Get this file : ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz

This command will do it :wget ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz

NCBI used to promote the term "Entrez" in terms like "Entrez Gene ID" ... but they are apparently no longer emphasizing this term. "Entrez" apparently refered to the software system used to access NCBI information.

"Gene id" or "GeneID" is the accesison(?) number used by NCBI in column 2 in the file "gene_info" ( mentioned earlier).

The official name is in the "Full_name_from_nomenclature_authority" field.

Example for human TP53 gene ...

grep -P "\tTP53\t" gene_info | grep "^9606" | cut -f1-13
9606 7157 TP53 - BCC7|LFS1|P53|TRP53 MIM:191170|HGNC:HGNC:11998|Ensembl:ENSG00000141510|HPRD:01859|Vega:OTTHUMG00000162125 17 17p13.1 tumor protein p53 protein-coding TP53 tumor protein p53

NCBI GeneID is 7157 and offical (HUGO) name is TP53 : https://www.ncbi.nlm.nih.gov/gene/?term=7157

"NM_" identfiers or "RNA_nucleotide_accession.version" are in the file "gene2accession" , available at from he same place:
wget nc ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz

Last edited by Richard Finney; 11-01-2016 at 07:53 AM.
Richard Finney is offline   Reply With Quote