Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trouble with Orthomcl Clusters

    I recently did analysis on over 100 genomes within the same phylum using Orthomcl. Sadly, once I finally got the results a couple of weeks later, I discovered that several clusters appeared to have the same function. In fact, in one instance, there were 11 clusters that were likely fructose-2,6-bisphosphatase. I played around with the inflation value a little and found that by it resulted in clusters that appeared to be too mixed in terms of function yet there were still several repeat function clusters. Clearly, this could be a result of bad annotation, but I wanted to see if anyone has had similar problems with Orthomcl cluster prediction.

    Thanks!

  • #2
    Well, I will answer my own question. One inherent issue that i believe caused this trouble was my blast parameters. I had 100 genomes and I only allowed 250 hits since I was concerned about diskspace and time. Orhtomcl acutally recommends not limiting this (https://docs.google.com/document/d/1...xqDAMjyP_w/pub).

    I figured that 2.5 orthologs in each genome for the same gene was enough, but maybe not. I am currently rerunning the process with a larger allowed hits and hopefully this will fix it.

    Comment


    • #3
      following up...

      I'm interested to see how this has worked out for you now. Did it solve the problem?

      Comment


      • #4
        Depending on your research question, it might be a better idea to cluster your proteins based on hmmer searches against Pfam. This is something that can take minutes on a 4 core laptop (with say 100 bacterial genomes) vs. days on a 100 core cluster (when going with something blast-based).
        savetherhino.org

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        27 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        30 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        26 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Working...
        X