SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
searching tool for ChIP-seq analysis of histone modification analysis tujchl Epigenetics 5 03-13-2013 09:00 AM
illumina smallRNA adapter sequence for downstram analysis + miRNA analysis steps ndeshpan Bioinformatics 2 06-14-2011 09:44 PM
ABRF2010-Next Gen Sequencing Analysis:Platform Independent Analysis You Can Use MichelleMD Events / Conferences 0 12-04-2009 06:35 PM

Reply
 
Thread Tools
Old 02-21-2013, 09:38 PM   #1
gene_x
Senior Member
 
Location: MO

Join Date: May 2010
Posts: 108
Default GO analysis followup

Guys,
I have a specific question about Gene Ontology analysis. I have read a paper where they claim, say, immune response pathway is enriched in the top 500 most differentially expressed genes. I searched it and found its GO ID is GO:0006955. My question is where do you go and download all the genes and its annotations etc so I can easily process them?

I found a place for mouse, ftp://ftp.informatics.jax.org/pub/reports/index.html#go

and the gene_association.mgi file seems to be the one I'm looking for.

Could anyone share your experience with this if you have any?

Thanks!
gene_x is offline   Reply With Quote
Old 02-21-2013, 11:49 PM   #2
mudshark
Senior Member
 
Location: Munich

Join Date: Jan 2009
Posts: 138
Default

i usually fetch the latest annotations here: http://www.geneontology.org/GO.downl...otations.shtml
mudshark is offline   Reply With Quote
Old 02-22-2013, 05:05 AM   #3
gene_x
Senior Member
 
Location: MO

Join Date: May 2010
Posts: 108
Default

Quote:
Originally Posted by mudshark View Post
i usually fetch the latest annotations here: http://www.geneontology.org/GO.downl...otations.shtml
I found the gene_association.mgi file in the link you mentioned as well. I downloaded it and it's almost the same as the one on MGI site.. slightly different.. I compared a few of the difference between two files and found the MGI site file has duplicated entries. Not sure how did that happen. Anyway, your site seems to be the one to go. Thanks!
gene_x is offline   Reply With Quote
Old 02-22-2013, 05:53 AM   #6
gene_x
Senior Member
 
Location: MO

Join Date: May 2010
Posts: 108
Default

Quote:
Originally Posted by Richard Finney View Post
You can grab the flat files from NCBI and ungzip them ...
You'll be interested in gene2go and gene_info.

This script does the trick ...
wget -nc ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz
wget -nc ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2go.gz
wget -nc ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2pubmed.gz
wget -nc ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq.gz
wget -nc ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2sts.gz
wget -nc ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2unigene.gz
wget -nc ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_history.gz
wget -nc ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz
wget -nc ftp://ftp.ncbi.nlm.nih.gov/gene/DATA...otkb_collab.gz
wget -nc ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/go_process.xml
wget -nc ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2ensembl.gz
wget -nc ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_group.gz
wget -nc ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/mim2gene.gz
wget -nc ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2sts.gz
wget -nc ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2unigene.gz
wget -nc ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2vega.gz
These genes contain human, mouse and etc? It looks like so to me.. I guess it's much easier to use the GO annotation files instead of process two files and trying to put things together for one species.
gene_x is offline   Reply With Quote
Old 02-22-2013, 09:16 AM   #7
girlwithglasses
Member
 
Location: USA

Join Date: Feb 2013
Posts: 17
Default

Quote:
Originally Posted by gene_x View Post
I found the gene_association.mgi file in the link you mentioned as well. I downloaded it and it's almost the same as the one on MGI site.. slightly different.. I compared a few of the difference between two files and found the MGI site file has duplicated entries. Not sure how did that happen. Anyway, your site seems to be the one to go. Thanks!
It will be almost identical - the file on the GO website is a slightly filtered version of the one available at MGI.

Have you tried using AmiGO to download all the annotations to your GO term and its descendants? Search for the GO term at amigo.geneontology.org, view the annotations, and then filter so that you're only viewing annotations for your species of interest. You should be able to download the resulting annotation set.
girlwithglasses is offline   Reply With Quote
Old 02-22-2013, 09:29 AM   #8
gene_x
Senior Member
 
Location: MO

Join Date: May 2010
Posts: 108
Default

Quote:
Originally Posted by girlwithglasses View Post
It will be almost identical - the file on the GO website is a slightly filtered version of the one available at MGI.

Have you tried using AmiGO to download all the annotations to your GO term and its descendants? Search for the GO term at amigo.geneontology.org, view the annotations, and then filter so that you're only viewing annotations for your species of interest. You should be able to download the resulting annotation set.
I just tried that finally got the filter working and saw the output. I compared output to a file (grep those terms) and the output is basically the same, with different annotation date and slight difference in format.

I think it's easier to stick to one and I think the one from annotation download site is the best one in terms of consistency among term annotations.
gene_x is offline   Reply With Quote
Old 02-26-2013, 08:46 AM   #9
rnavon
Junior Member
 
Location: Israel

Join Date: Feb 2013
Posts: 6
Default

Why calculate the enrichment yourself instead of using an external tool?
GOrilla http://cbl-gorilla.cs.technion.ac.il/ for example can calculate the enriched GO terms for you.
It has 2 modes:
1. Rank all your genes (not just your differentially expressed ones) according to differential expression and GOrilla will find GO terms enriched at the top of your list.
2. Compare a target set (e.g. top 500 genes) to a background set (e.g. all your genes).
rnavon is offline   Reply With Quote
Old 02-26-2013, 08:48 AM   #10
gene_x
Senior Member
 
Location: MO

Join Date: May 2010
Posts: 108
Default

Quote:
Originally Posted by rnavon View Post
Why calculate the enrichment yourself instead of using an external tool?
GOrilla http://cbl-gorilla.cs.technion.ac.il/ for example can calculate the enriched GO terms for you.
It has 2 modes:
1. Rank all your genes (not just your differentially expressed ones) according to differential expression and GOrilla will find GO terms enriched at the top of your list.
2. Compare a target set (e.g. top 500 genes) to a background set (e.g. all your genes).
Thanks for the note, GOrilla is a really catchy name
gene_x is offline   Reply With Quote
Old 02-26-2013, 08:59 AM   #11
mudshark
Senior Member
 
Location: Munich

Join Date: Jan 2009
Posts: 138
Default

Quote:
Originally Posted by rnavon View Post
2. Compare a target set (e.g. top 500 genes) to a background set (e.g. all your genes).
"ALL genes" is probably not a good background set. "ALL expressed genes" should be the universe.
mudshark is offline   Reply With Quote
Old 02-26-2013, 10:57 AM   #12
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

"ALL expressed genes" should be the universe.
Is that legit? I mean of course for multi-sample correction (FDR-Bonferroni)?
Richard Finney is offline   Reply With Quote
Old 02-26-2013, 11:36 AM   #13
rnavon
Junior Member
 
Location: Israel

Join Date: Feb 2013
Posts: 6
Default

Quote:
Originally Posted by mudshark View Post
"ALL genes" is probably not a good background set. "ALL expressed genes" should be the universe.
If a gene is not expressed in both groups. Doesn't it mean it is measured to be not differentially expressed between the two groups and should therefore be in the background set?
rnavon is offline   Reply With Quote
Old 02-26-2013, 02:29 PM   #14
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Quote:
Originally Posted by rnavon View Post
If a gene is not expressed in both groups. Doesn't it mean it is measured to be not differentially expressed between the two groups and should therefore be in the background set?
Not necessarily, that will depend on biological replicate number, degree of expression in the group where it is found, how variance is calculated, and the particulars of the test being used for differential expression analysis.
dpryan is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:57 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO