Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • orthology search-Biomart or RBH

    Hi all,

    I have set of genes of interest(in Ensembl gene ids), I want to compare with five closely related species in Ensembl and to find orthologues of genes among all. I have plan to go for reciprocal blast(RBH) or using Ensembl-Biomart. Which is the best method to find orthologues?

  • #2
    You have not said what species you are working with but NCBI may have some pre-computed results that may be of interest to you: http://www.ncbi.nlm.nih.gov/homologene

    Reciprocal blast searches can locate the orthologs easily since the species are closely related.

    Comment


    • #3
      Ensembl.

      To say that two sequences are orthologous is to make a statement about their evolutionary history. Specifically, that their divergences is due to a speciation event.

      Ensembl's orthologue-finding method (Compara) can establish the existence of such a relationship, within the limits of the supplied data. See http://www.ensembl.org/info/docs/com...gy_method.html for a description of the method.

      Reciprocal best hits cannot, by themselves, establish such a relationship. Although it is typically true that a pair of orthologues must also be reciprocal best hits, the converse is not true. See http://armchairbiology.blogspot.co.u...last-hits.html for a bit more detail.

      It might be worth remembering that just because two sequences are orthologous, it is not necessarily the case that their function is conserved. Orthology is a statement about evolutionary history, not biochemical or biological function.

      Comment


      • #4
        You may want to look at the orthoMCL database which also has precomputed orthologs for many species. The software for computing the ortholog/paralog gene relationships between species is freely available if your species are not already in the database.

        Comment


        • #5
          It's worth noting though that, despite the name, OrthoMCL isn't necessarily the most reliable way to determine putative orthologues. From the OrthoMCL site:

          OrthoMCL starts with reciprocal best hits within each genome as potential in-paralog/recent paralog pairs and reciprocal best hits across any two genomes as potential ortholog pairs. Related proteins are interlinked in a similarity graph. Then MCL (Markov Clustering algorithm,Van Dongen 2000; www.micans.org/mcl) is invoked to split mega-clusters. This process is analogous to the manual review in COG construction. MCL clustering is based on weights between each pair of proteins, so to correct for differences in evolutionary distance the weights are normalized before running MCL.
          i.e. there is MCL-based clustering on the basis of sequence similarity, but not detailed reconstruction of evolutionary relationships.

          OrthoMCL was compared with a (clustered) RBH approach in this paper, which notes that

          simple algorithms, like CRBH, may be better ortholog predictors than more complex ones (e.g., ORTHOMCL and MULTIPARANOID) for evolutionary and functional genomics studies where the objective is the accurate inference of single-copy orthologs (e.g., molecular phylogenetics), but that all algorithms fail to accurately predict orthologs when paralogy is rampant.
          As with any bioinformatic method, it is useful to understand both the method itself, and how it answers the well-formed question that you set for it

          Comment


          • #6
            Originally posted by LeightonP View Post
            As with any bioinformatic method, it is useful to understand both the method itself, and how it answers the well-formed question that you set for it
            Yes, and as the introduction of the paper you link to says, that method is useful only for detecting single-copy orthologs. It is not clear that this approach will inform you about gene family evolution within a species given that they say "all" algorithms will fail when paralogy is involved. That is a bit of an overstatement that may have been included because they did not address this issue. Also, the study focused on six closely related yeast species and I'm unsure the same results would be achieved with more distantly related species with a history of duplication events between them. This is especially true given that there are other publications demonstrating that methods like orthoMCL are more accurate than using RBH alone. So, it is best to consider what is most appropriate for the nature of the species and questions being investigated.

            Comment


            • #7
              Originally posted by SES View Post
              Yes, and as the introduction of the paper you link to says, that method is useful only for detecting single-copy orthologs.
              Not so much that it's only useful for detecting single-copy orthologues, but that

              simpler algorithms, like CRBH and CRSD, might be better choices for many downstream evolutionary analyses than more complex ones in cases where the objective is to identify orthogroups [...] the trend of several studies toward using more complex ortholog prediction strategies is not always justified. One of the criteria used in our selection of algorithms was for ones whose orthogroup predictions would be appropriate for use in phylogenetic analyses. Thus, we did not evaluate tree-based or hybrid-based algorithms.
              (from further into the paper)

              Originally posted by SES View Post
              It is not clear that this approach will inform you about gene family evolution within a species given that they say "all" algorithms will fail when paralogy is involved.
              Clearly they mean "all algorithms tested in this study", not "all possible algorithms". If you're interested in gene family evolution, you're probably interested in more than just orthology, anyway.

              Originally posted by SES View Post
              That is a bit of an overstatement that may have been included because they did not address this issue. Also, the study focused on six closely related yeast species and I'm unsure the same results would be achieved with more distantly related species with a history of duplication events between them.
              Yes, but once you introduce gene duplication you're looking at a more complicated situation than orthology - immediately there's paralogy, and possible inparalogy, neither of which are orthology.

              Regarding the limitations of the yeast-only study, maybe a study that included bacteria, fungi and other eukaryotes would be more convincing? They also explore more than one definition of 'orthology', eliding it into 'functional equivalence'.

              Originally posted by SES View Post
              This is especially true given that there are other publications demonstrating that methods like orthoMCL are more accurate than using RBH alone.
              You pays your money, and you takes your choice, as they say. This is a figure indicating that greater accuracy (and lower FDR) was found in orthologue prediction for RBH than OrthoMCL. I wouldn't be surprised to see a study that showed the opposite, given the importance of dataset choice for benchmarking.

              Also

              we observe that the more sophisticated tree reconstruction/reconciliation approach of Ensembl Compara was at times outperformed by pairwise comparison approaches, even in phylogenetic tests.
              So, no easy answers. Ensembl isn't always better than RBH.

              Originally posted by SES View Post
              So, it is best to consider what is most appropriate for the nature of the species and questions being investigated.
              Indeed. There are many important factors here: relationships between organisms, nature of the genes/gene family, and whether you really want orthologues, or paralogues, or instead functional equivalents, or if you would be happy with identifying 'family members'.

              Comment


              • #8
                Originally posted by LeightonP View Post
                Indeed. There are many important factors here: relationships between organisms, nature of the genes/gene family, and whether you really want orthologues, or paralogues, or instead functional equivalents, or if you would be happy with identifying 'family members'.
                We seem to agree that investigating orthology and paralogy require care attention and the question will necessarily determine the path. So, there is no need in belaboring the obvious fact that no approach is always right and lecturing on what has been stated numerous times.

                As a side note: please don't try to get prosy and repost links to the same material in order to prove a point. Of course, you can do that, but it may distract from the point of this thread, which is helping the OP and providing some sound advice. I'll add that I appreciate you posting the links to those papers, but I don't think arguing for one approach is constructive when it has not been established what the system and specific questions are.

                Comment


                • #9
                  Originally posted by SES View Post
                  As a side note: please don't try to get prosy and repost links to the same material in order to prove a point. Of course, you can do that, but it may distract from the point of this thread, which is helping the OP and providing some sound advice.
                  I'm sure the OP finds your criticism of my writing style most helpful

                  Comment


                  • #10
                    Originally posted by LeightonP View Post
                    OrthoMCL was compared with a (clustered) RBH approach in this paper
                    How is this approach implemented? It is not clear how repeatable this method is if everyone is to derive there own interpretation when trying to apply the approach.

                    Comment


                    • #11
                      Originally posted by SES View Post
                      How is this approach implemented? It is not clear how repeatable this method is if everyone is to derive there own interpretation when trying to apply the approach.
                      If you mean the cRBH algorithm, it's in the methods section.

                      Comment


                      • #12
                        Originally posted by LeightonP View Post
                        If you mean the cRBH algorithm, it's in the methods section.
                        Obviously, there is an analytical description of the algorithm in the methods section. My question was how is it implemented? Meaning, how did the authors carry out the described procedure and how are others supposed to apply it to their own data?

                        This is just a practical consideration, but it should be considered when deciding which method you choose to use in your study. It may not be worthwhile to try and implement a serious of algorithms based on one study when other methods, more or less accurate depending on the system and questions, already exist. In this case, the authors found an alternative method for ortholog detection that did make a difference (relative to existing clustering methods), but it may or may not be easy to repeat, and may or may not make a big difference in another application.

                        @LeightonP, I was not trying to criticize your writing style, but did not see the point in a long-winded dissection of previous comments with the same basic information. It appeared to be argumentative in nature, but perhaps I misunderstood. Apologies if I took that incorrectly because I felt like we were saying much the same thing.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        18 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        22 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        16 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        47 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X