Seqanswers Leaderboard Ad

**WhatsOEver** · 04-25-2014, 04:56 AM

Hi Genomics101,
I'm not sure if I understand your problem. Don't you know how to perform the enrichment analysis or don't you know how to convert your data to another format that can be used by existing programs to calculate GO enrichment?

For those cases in which no pre-formatted database is available, I always use GOEAST. http://omicslab.genetics.ac.cn/GOEAS...microarray.php
It states that it is designed for microarray data but as no expression data needs to be supplied it will also work for all other purposes. The only tricky part is eventually to convert your GO table to their format requirements (look here: http://omicslab.genetics.ac.cn/GOEAS...e/example1.txt). If this IS your actual problem, please give an example line of your EMBL formatted GOs and I will tell you how to convert it

**Genomics101** · 04-25-2014, 07:49 AM

@WhatsOEver

Thanks! Here is an example of a record for one CDS:

Code:

FT   CDS             110938..112365
FT                   /colour=7
FT                   /ortholog="PFIT_0602100 PFIT_0602100;
FT                   cluster_name=Plasmodium:ORTHOMCL397; program=OrthoMCL;
FT                   rank=0"
FT                   /GO="aspect=F;GOid=GO:0017111;term=nucleoside-triphosphatase
FT                   activity;date=20111226;evidence=IEA;autocomment=From
FT                   iprscan"
FT                   /product="bcs1-like protein, putative"
FT                   /locus_tag="PYYM_0103000"

**Wallysb01** · 04-25-2014, 08:16 AM

If you have gene short names, why not just feed those into DAVID?

**WhatsOEver** · 04-28-2014, 08:04 AM

@Wallysb01: I'm not really familiar with David, but from a short glance at it, you will also have to supply a kind of "background" file (containing all GOs for all genes) for "not-standard" genomes (as Genomics stated he has), won't you?

@Genomics: As I don't know, what you would like to use as identifier, I suggest to use the following:

1) Get a list of identifiers:
awk -F\" '/locus_tag/{print $2}' ./yourEsembleGOfile > outputFile1.tdt

In your example this will output PYYM_0103000. If you want to use something else and you're not familiar with awk, just write again.

2) Get a list of GOs in the GOEAST required format:
grep -iE "(locus_tag|GOid)" ./yourEsembleGOfile | awk '{if($0!~"locus_tag"){printf "%s", " // "substr($0, match($0, "GOid")+5, 10)}else{print ""}}' | cut -d "/" -f 3- > ./outputFile2.tdt

This will generate a list of GOs, separated with "//", if there are more than 1 GOs listed per CDS.

3) You just need to copy the GO list (2) next to the identifier column (1) (awk would again work here, but as its a one-time work, everything else will also do (excel or equivalent)).

In my limited example, everything worked fine but if you have problems -> just ask.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 47 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

GO term enrichment with non-model organism (I have the GO terms for every locus)

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News