Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GO term enrichment with non-model organism (I have the GO terms for every locus)

    Greetings. I am looking to do a GO-term enrichment analysis on a set of genes from an annotated genome whose GO terms are not loaded into the standard libraries. I have an EMBL formatted file with the GO terms for each locus, I just need an idea on how to do this "enrichment." All advice and ideas are appreciated.
    Last edited by Genomics101; 04-23-2014, 01:09 PM. Reason: spelling

  • #2
    Hi Genomics101,
    I'm not sure if I understand your problem. Don't you know how to perform the enrichment analysis or don't you know how to convert your data to another format that can be used by existing programs to calculate GO enrichment?

    For those cases in which no pre-formatted database is available, I always use GOEAST. http://omicslab.genetics.ac.cn/GOEAS...microarray.php
    It states that it is designed for microarray data but as no expression data needs to be supplied it will also work for all other purposes. The only tricky part is eventually to convert your GO table to their format requirements (look here: http://omicslab.genetics.ac.cn/GOEAS...e/example1.txt). If this IS your actual problem, please give an example line of your EMBL formatted GOs and I will tell you how to convert it

    Comment


    • #3
      @WhatsOEver

      Thanks! Here is an example of a record for one CDS:

      Code:
      FT   CDS             110938..112365
      FT                   /colour=7
      FT                   /ortholog="PFIT_0602100 PFIT_0602100;
      FT                   cluster_name=Plasmodium:ORTHOMCL397; program=OrthoMCL;
      FT                   rank=0"
      FT                   /GO="aspect=F;GOid=GO:0017111;term=nucleoside-triphosphatase
      FT                   activity;date=20111226;evidence=IEA;autocomment=From
      FT                   iprscan"
      FT                   /product="bcs1-like protein, putative"
      FT                   /locus_tag="PYYM_0103000"

      Comment


      • #4
        If you have gene short names, why not just feed those into DAVID?

        Comment


        • #5
          @Wallysb01: I'm not really familiar with David, but from a short glance at it, you will also have to supply a kind of "background" file (containing all GOs for all genes) for "not-standard" genomes (as Genomics stated he has), won't you?


          @Genomics: As I don't know, what you would like to use as identifier, I suggest to use the following:

          1) Get a list of identifiers:
          awk -F\" '/locus_tag/{print $2}' ./yourEsembleGOfile > outputFile1.tdt

          In your example this will output PYYM_0103000. If you want to use something else and you're not familiar with awk, just write again.

          2) Get a list of GOs in the GOEAST required format:
          grep -iE "(locus_tag|GOid)" ./yourEsembleGOfile | awk '{if($0!~"locus_tag"){printf "%s", " // "substr($0, match($0, "GOid")+5, 10)}else{print ""}}' | cut -d "/" -f 3- > ./outputFile2.tdt

          This will generate a list of GOs, separated with "//", if there are more than 1 GOs listed per CDS.

          3) You just need to copy the GO list (2) next to the identifier column (1) (awk would again work here, but as its a one-time work, everything else will also do (excel or equivalent)).

          In my limited example, everything worked fine but if you have problems -> just ask.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          18 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          22 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          47 views
          0 likes
          Last Post seqadmin  
          Working...
          X