Greetings. I am looking to do a GO-term enrichment analysis on a set of genes from an annotated genome whose GO terms are not loaded into the standard libraries. I have an EMBL formatted file with the GO terms for each locus, I just need an idea on how to do this "enrichment." All advice and ideas are appreciated.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Hi Genomics101,
I'm not sure if I understand your problem. Don't you know how to perform the enrichment analysis or don't you know how to convert your data to another format that can be used by existing programs to calculate GO enrichment?
For those cases in which no pre-formatted database is available, I always use GOEAST. http://omicslab.genetics.ac.cn/GOEAS...microarray.php
It states that it is designed for microarray data but as no expression data needs to be supplied it will also work for all other purposes. The only tricky part is eventually to convert your GO table to their format requirements (look here: http://omicslab.genetics.ac.cn/GOEAS...e/example1.txt). If this IS your actual problem, please give an example line of your EMBL formatted GOs and I will tell you how to convert it
-
@WhatsOEver
Thanks! Here is an example of a record for one CDS:
Code:FT CDS 110938..112365 FT /colour=7 FT /ortholog="PFIT_0602100 PFIT_0602100; FT cluster_name=Plasmodium:ORTHOMCL397; program=OrthoMCL; FT rank=0" FT /GO="aspect=F;GOid=GO:0017111;term=nucleoside-triphosphatase FT activity;date=20111226;evidence=IEA;autocomment=From FT iprscan" FT /product="bcs1-like protein, putative" FT /locus_tag="PYYM_0103000"
Comment
-
@Wallysb01: I'm not really familiar with David, but from a short glance at it, you will also have to supply a kind of "background" file (containing all GOs for all genes) for "not-standard" genomes (as Genomics stated he has), won't you?
@Genomics: As I don't know, what you would like to use as identifier, I suggest to use the following:
1) Get a list of identifiers:
awk -F\" '/locus_tag/{print $2}' ./yourEsembleGOfile > outputFile1.tdt
In your example this will output PYYM_0103000. If you want to use something else and you're not familiar with awk, just write again.
2) Get a list of GOs in the GOEAST required format:
grep -iE "(locus_tag|GOid)" ./yourEsembleGOfile | awk '{if($0!~"locus_tag"){printf "%s", " // "substr($0, match($0, "GOid")+5, 10)}else{print ""}}' | cut -d "/" -f 3- > ./outputFile2.tdt
This will generate a list of GOs, separated with "//", if there are more than 1 GOs listed per CDS.
3) You just need to copy the GO list (2) next to the identifier column (1) (awk would again work here, but as its a one-time work, everything else will also do (excel or equivalent)).
In my limited example, everything worked fine but if you have problems -> just ask.
Comment
Latest Articles
Collapse
-
by seqadmin
The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...-
Channel: Articles
05-06-2024, 07:48 AM -
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 02:46 PM
|
0 responses
8 views
0 likes
|
Last Post
by seqadmin
Today, 02:46 PM
|
||
Started by seqadmin, 05-07-2024, 06:57 AM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
05-07-2024, 06:57 AM
|
||
Started by seqadmin, 05-06-2024, 07:17 AM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
05-06-2024, 07:17 AM
|
||
Started by seqadmin, 05-02-2024, 08:06 AM
|
0 responses
23 views
0 likes
|
Last Post
by seqadmin
05-02-2024, 08:06 AM
|
Comment