Hey guys,
I have a file of the proteome set of C. elegans that I retrieved from Uniprot in a EMBL flat file like this:
EMBL_FLAT_FILE_CELEGANS
NOTE: the file showed is here shortened.
Moreover, I have another file with a lot of gene full names and I would like to extract informations of GO for these genes from the EMBL flat file. In other words, I would like to know if someone here have some script that read my file with the gene full names (one per line), find it in this EMBL flat file and extract the GO. The output desirable is the gene full name followed by its gene ontology separated by comma (including each ontology).
OUTPUT
If you guys have other ideas it would be nice!
Cheers.
I have a file of the proteome set of C. elegans that I retrieved from Uniprot in a EMBL flat file like this:
EMBL_FLAT_FILE_CELEGANS
Code:
ID 14331_CAEEL Reviewed; 248 AA. AC P41932; Q21537; DT 01-NOV-1995, integrated into UniProtKB/Swiss-Prot. DT 22-JUL-2008, sequence version 2. DT 28-NOV-2012, entry version 95. DE RecName: Full=14-3-3-like protein 1; DE AltName: Full=Partitioning defective protein 5; GN Name=par-5; Synonyms=ftt-1; ORFNames=M117.2; OS Caenorhabditis elegans. OC Eukaryota; Metazoa; Ecdysozoa; Nematoda; Chromadorea; Rhabditida; OC Rhabditoidea; Rhabditidae; Peloderinae; Caenorhabditis. DR GO; GO:0005938; C:cell cortex; IDA:WormBase. DR GO; GO:0005634; C:nucleus; IDA:WormBase. DR GO; GO:0045167; P:asymmetric protein localization involved in cell fate determination; IMP:WormBase. DR GO; GO:0001708; P:cell fate specification; IMP:WormBase. DR GO; GO:0043053; P:dauer entry; IMP:WormBase. DR GO; GO:0008340; P:determination of adult lifespan; IMP:WormBase. DR GO; GO:0009792; P:embryo development ending in birth or egg hatching; IMP:WormBase. DR GO; GO:0000132; P:establishment of mitotic spindle orientation; IMP:WormBase. DR GO; GO:0030590; P:first cell cycle pseudocleavage; IMP:WormBase. DR GO; GO:0035188; P:hatching; IMP:WormBase. DR GO; GO:0007126; P:meiosis; IMP:WormBase. DR GO; GO:0002009; P:morphogenesis of an epithelium; IMP:WormBase. DR GO; GO:0009949; P:polarity specification of anterior/posterior axis; IMP:WormBase. DR GO; GO:0035046; P:pronuclear migration; IMP:WormBase. DR GO; GO:0006898; P:receptor-mediated endocytosis; IMP:WormBase. DR GO; GO:0007346; P:regulation of mitotic cell cycle; IMP:WormBase. DR GO; GO:0010070; P:zygote asymmetric cell division; IMP:WormBase. SQ SEQUENCE 248 AA; 28191 MW; ABBE0DA27D9341AF CRC64; MSDTVEELVQ RAKLAEQAER YDDMAAAMKK VTEQGQELSN EERNLLSVAY KNVVGARRSS WRVISSIEQK TEGSEKKQQL AKEYRVKVEQ ELNDICQDVL KLLDEFLIVK AGAAESKVFY LKMKGDYYRY LAEVASEDRA AVVEKSQKAY QEALDIAKDK MQPTHPIRLG LALNFSVFYY EILNTPEHAC QLAKQAFDDA IAELDTLNED SYKDSTLIMQ LLRDNLTLWT SDVGAEDQEQ EGNQEAGN //
Moreover, I have another file with a lot of gene full names and I would like to extract informations of GO for these genes from the EMBL flat file. In other words, I would like to know if someone here have some script that read my file with the gene full names (one per line), find it in this EMBL flat file and extract the GO. The output desirable is the gene full name followed by its gene ontology separated by comma (including each ontology).
OUTPUT
Code:
GENE_A,GO; GO:0001708; P:cell fate specification; IMP:WormBase, GO; GO:0043053; P:dauer entry; IMP:WormBase,GO; GO:0008340; P:determination of adult lifespan; IMP:WormBase,GO; GO:0009792; P:embryo development ending in birth or egg hatching; IMP:WormBase, GO; GO:0000132; P:establishment of mitotic spindle orientation; IMP:WormBase
Cheers.
Comment