SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
Directly compare gene CPM between tissues, without running edgeR sindrle Bioinformatics 0 02-07-2014 06:45 AM
'read' bam file directly tyxer Bioinformatics 6 06-15-2012 02:07 PM
Why don't mapping programs map directly into BAM format? oiiio Bioinformatics 4 11-03-2011 05:01 AM
Directly compare microarray expression and RNA-Seq data? kidderb Bioinformatics 0 06-30-2011 08:52 AM
PubMed: Chromatin profiling by directly sequencing small quantities of immunoprecipit Newsbot! Literature Watch 0 05-09-2010 08:00 PM

Reply
 
Thread Tools
Old 06-16-2014, 05:49 PM   #1
millerma1
Junior Member
 
Location: USA

Join Date: Jun 2014
Posts: 6
Arrow Directly analyze GO numbers

I want to perform a gene ontology analysis on a list of significant genes using the GO numbers directly and not the sequences/gene IDs.

I started using Blast2GO but the program first takes all of your sequences through NCBI BLAST, which is a very time consuming process. This seemed unnecessary to me because I already know my gene ID's. I decided to pull GO numbers directly from an online database, and successfully achieved this in 1 day as opposed to 7+...

I now have gene ontology information (GO:0005515, GO:0009540, etc.) for several thousand genes but cannot find a tool to analyze the meaning & distribution of these numbers directly. It shouldn't be too hard because all I have really done is skipped the first, and most time intensive, part of the Blast2GO process.
Something that could provide graphical output would obviously be ideal but it's not absolutely necessary.

Any help is much appreciated!
millerma1 is offline   Reply With Quote
Old 06-16-2014, 11:53 PM   #2
WhatsOEver
Senior Member
 
Location: Germany

Join Date: Apr 2012
Posts: 215
Default

There are a couple of tools available, but the best (in my opinion) are:
- GOrilla for human, mouse, rat, c.elegans (http://cbl-gorilla.cs.technion.ac.il/)
- AgriGO for all plant stuff (http://bioinfo.cau.edu.cn/agriGO/analysis.php)
- GOeast for all non-model stuff (http://omicslab.genetics.ac.cn/GOEAS...microarray.php)

All of them are easy to handle and produce nice visual outputs. With AgriGO and GOeast you may also create your own database, so you're not restricted to the available datasets (if this is relevant for you)
WhatsOEver is offline   Reply With Quote
Old 06-17-2014, 02:12 PM   #3
millerma1
Junior Member
 
Location: USA

Join Date: Jun 2014
Posts: 6
Arrow

Thanks, I am working with tomato so I used agriGO and the results seem good but I'm not quiteee there yet.

Do you have experience with agriGO? I would be interested to hear how you have used it in the past because I am having a little bit of trouble interpreting the output. I like that this tool provides very specific GO annotations but the bar graphs that it gives leave a lot to be desired. The stats that it gives seem a little strange to me too...

I also used WEGO, (http://wego.genomics.org.cn/cgi-bin/wego/index.pl) which despite providing pretty looking graphs... has disappointed me so far. It only provides very vague categories and doesn't really allow for much customization of the x-axis (even though they claim that it does).

Thanks in advance to anyone who can help me out or suggest a tool to use.
millerma1 is offline   Reply With Quote
Old 06-18-2014, 03:18 AM   #4
WhatsOEver
Senior Member
 
Location: Germany

Join Date: Apr 2012
Posts: 215
Default

Concerning the settings:
I have used AgriGO for analyzing Arabidopsis and fungal microarray data.
For Arabidopsis I run the Parametric Analysis of Gene Set Enrichment (PAGE) with "Hochberg (FDR)" for adjustment. For SEA I had to use a customized annotation reference, as I designed the chip based on the TAIR10 release which is not available yet. For the statistical methods I used "hypergeometric" and "Hochberg (FDR)", but this may be different for you depending on the size of your dataset.
For the fungal data I only run SEA with a custom annotation reference and the same settings as for Arabidopsis.

Concerning the output:
I never used the bar chart as I like the other one better. The stats deplayed in the individual boxes are as follows (I just use data from the provided example for SEA):
Quote:
GO:0050896(1.52e-05) <- This is your FDR-corrected significance value. If it is below your set threshold (Std: 0.05) the box will be coloured. The lower the value, the higher the significance value (box becoming more red); These values should be equal to the values in the "Detail information" table
response to stimulus <- the GO name
49/168 | 3107/22479 <- the two values right of the forward slashes "/" are the number of genes in your input (168) and the number of genes in the background reference (22479 <- all A. thaliana genes). These values never change between boxes; the left two values are the number of genes annotated with this specific GO (in this case "response to stimulus") in your input (49) and in the the reference (3107).
I hope this helps. If you have other questions please provide some example data and explain on these what exactly "seems a little strange"
WhatsOEver is offline   Reply With Quote
Old 06-20-2014, 10:55 AM   #5
cacti
Member
 
Location: Massachusetts

Join Date: Jan 2014
Posts: 12
Default All genes under a GO Term

I have a similar issue... I have some specific GO categories in mind and I would like to get a list of my genes that match those categories. I have about 35,000 expressed genes (in the form of Entrez Gene ID numbers). Any simple solutions for mapping these genes against a single or small set of selected GO terms (i.e. which genes are involved in GO:0006950 response to stress and GO:0007568 aging)?
cacti is offline   Reply With Quote
Old 06-23-2014, 01:08 AM   #6
WhatsOEver
Senior Member
 
Location: Germany

Join Date: Apr 2012
Posts: 215
Default

@cacti:
Were are these 35k genes stored and how do you access them? Cause if they are in the NCBI database and you can select them via the NCBI search bar, you could simply add a GO-name field (e.g. "response to stress"[GO]) to your search.
If you have a txt file of your genes with associated GO numbers/names, a simple "grep"(assuming your familiar with linux command line progs) would be the fastest solution.
Otherwise please provide an example of your desired input and output first.
WhatsOEver is offline   Reply With Quote
Old 06-23-2014, 08:04 AM   #7
cacti
Member
 
Location: Massachusetts

Join Date: Jan 2014
Posts: 12
Default

@WhatsOEver:

Thanks... they are in a text file of results parsed from a -blastx against the NCBI database, then I used a mapping file to get Entrez Gene IDs from the associated gi numbers. My text file has columns for:
(1) contig # from de novo transcriptome assembly
(2) Entrez Gene ID
(3) annotation (i.e. sodium channel, actin-binding protein, etc)
(4) raw reads, etc

I don't have GO terms for each gene. I used GOseq to find overrepresented GO categories, but since this is a non-model species with no published genome, I couldn't figure out a good way to reverse map to find which of my genes are in each category.

So now I have some GO categories of interest and I want to find which of my genes are involved.
cacti is offline   Reply With Quote
Old 06-24-2014, 08:44 AM   #8
WhatsOEver
Senior Member
 
Location: Germany

Join Date: Apr 2012
Posts: 215
Default

sry, but if you used GOseq you had to provide GO mappings to the method?!

Quote:
Originally Posted by GOseq Manual
goseq obtains length data from UCSC and GO mappings from the organim packages (see link{getgo} and getlength for details). If your data is in an unsupported format you will need to obtain the GO category mapping and supply them to the goseq function using the gene2cat arguement.
How did you calculate overrepresentation without a GO mapping?
WhatsOEver is offline   Reply With Quote
Old 06-24-2014, 12:22 PM   #9
cacti
Member
 
Location: Massachusetts

Join Date: Jan 2014
Posts: 12
Default

I supplied my length data and used the database for the most closely related organism. It gave me some interesting leads for biological processes to look into... now I want to work with my whole set of genes.
cacti is offline   Reply With Quote
Old 06-25-2014, 12:18 AM   #10
WhatsOEver
Senior Member
 
Location: Germany

Join Date: Apr 2012
Posts: 215
Default

Ah, OK, so you're actually working with the annotations of a reference - and you're Entrez Gene ID's are those of the "closest relative"?!

There are then 2 possibilities:
1) (the easy way): Here is a link to a post in the bioconductor forum (https://stat.ethz.ch/pipermail/bioco...er/041019.html) which I used some time ago to do what you want (I was, however, working with an organism from the species package) What you are looking for in particular is the reversemapping function
Quote:
genes2go=getgo(names(YourGeneData),'hg19','ensGene')
go2genes=goseq:::reversemapping(genes2go)
2) (the more accurate way): Although it will get you where you want, I would suggest to run a complete analysis of your gene set using Blast2GO (http://www.blast2go.com/b2ghome). You just need your genes in fasta format. Within the program you then perform (a) blastx vs ncbi, (b) go mapping, (c) go annotation, (d) interpro scan, (e) merging of interpro go's to existing ones. The drawback of the method is that it may take you up to 2 weeks to finish everything with 35k genes. It is possible to speed it up by separating your data and running multiple Blast2GO instances in parallel (the individual Blast2GO projects can afterwards be merged in the program - you should, however, not be to greedy, because if the blast server gets to many requests from the same IP you may be blocked for some time)

The reason I favour 2 over 1 is that you're so far only working on proteins which have homologs in your relative. Running a complete analysis of your genes would give you a more complete list.
WhatsOEver is offline   Reply With Quote
Old 07-20-2015, 11:50 PM   #11
kurban910
Member
 
Location: urumqi

Join Date: Jul 2014
Posts: 58
Default

hello to all and @WhatsOEver,
the sample i have been working with also a non model species. after finding differential expressed unigene in its transcriptome data i have used Blast2GO done exactly same what @whatsoever have said above[ (a) blastx vs ncbi, (b) go mapping, (c) go annotation, (d) interpro scan, (e) merging of interpro GOs]. then i went into the blast2go charts menu get the results like which biological processes are most enriched for the up regulated transcripts. but what i want to know here is which transcripts/unigenes (at their IDs in fasta file) go into which biological process for example. how can i do that, any advice?
kurban910 is offline   Reply With Quote
Old 07-22-2015, 11:09 AM   #12
cacti
Member
 
Location: Massachusetts

Join Date: Jan 2014
Posts: 12
Default

It sounds like you're looking to reverse map from your GO category to the list of your genes that are in that category. How did you do your GO mapping initially? Can you just do a grep search to pull out your GO categories (and the associated genes) of interest?

If you don't have a category mapping file BUT you do have a common gene ID (like Ensembl, or Entrez gene ID), you can make one by using a flat file from NCBI (and merge functions in R or a simple python script) to link genes to their GO categories. And then do the search to pull out only your categories of interest.

HTH
cacti is offline   Reply With Quote
Old 07-22-2015, 11:38 AM   #13
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

Notice there are a couple of large flat files if you are programmatically inclined

You can use wget and gunzip to get "gene2go" and "gene_info" from NCBI.

wget -nc ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2go.gz
wget -nc ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz
-

head -1 gene2go
#Format: tax_id GeneID GO_ID Evidence Qualifier GO_term PubMed Category (tab is used as a separator, pound sign - start of a comment)
head -1 gene_info
#Format: tax_id GeneID Symbol LocusTag Synonyms dbXrefs chromosome map_location description type_of_gene Symbol_from_nomenclature_authority Full_name_from_nomenclature_authority Nomenclature_status Other_designations Modification_date (tab is used as a separator, pound sign - start of a comment)
Richard Finney is offline   Reply With Quote
Old 08-03-2015, 02:13 AM   #14
kurban910
Member
 
Location: urumqi

Join Date: Jul 2014
Posts: 58
Default

hi @cacti and @Richard Finney
what i have is only multiple transcript sequences (which came from RNA-seq assembly), so i used "blast2go" software blasted them with ncbi nr, then interpro scan (which assign the sequences to their corresponding GO term based on their domain), merged the GOs. i have done all this on win7 machine and i only get the graphical results of GO enrichment. but i wanna know which transcripts go into which biological process. here i only have sequences ,not Go mapping, not Ensembl or Entrez gene ID. so could you please give me some advise how i could do that?

Last edited by kurban910; 08-03-2015 at 05:04 AM.
kurban910 is offline   Reply With Quote
Old 08-03-2015, 11:18 AM   #15
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

Is there a manual for blast 2 go ?

Have you tried the command line version:
Richard Finney is offline   Reply With Quote
Old 09-23-2015, 07:59 AM   #16
sgoetz
Member
 
Location: Spain

Join Date: May 2010
Posts: 11
Default

If you want to know which sequences (transcripts/unigenes/protein) belong to which biological function/gene ontology term with Blast2GO you can use the "Combined Graph Function" and then export the graph as .txt file (from the side-panel). This gives you a spreadsheet with a list of all sequence IDs for each GO terms (one GO per row) for all GO levels and categories. This means direct and indirect functional annotations for MF, CC and BP.

Last edited by sgoetz; 09-23-2015 at 08:08 AM.
sgoetz is offline   Reply With Quote
Old 09-23-2015, 08:04 AM   #17
sgoetz
Member
 
Location: Spain

Join Date: May 2010
Posts: 11
Default

A Blast2GO User Manual can be found here: https://www.blast2go.com/images/b2g_...ser_manual.pdf
sgoetz is offline   Reply With Quote
Old 12-27-2019, 07:01 PM   #18
jsong02
Junior Member
 
Location: Virginia

Join Date: Mar 2018
Posts: 1
Default

I have a same question five years later! Can you let me know who did you approached the problem in the end? Thanks!
jsong02 is offline   Reply With Quote
Reply

Tags
blast2go, gene ontology, go analysis

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:23 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO