Seqanswers Leaderboard Ad

**GenoMax** · 08-31-2012, 11:36 AM

You have not said what species you are working with but NCBI may have some pre-computed results that may be of interest to you: http://www.ncbi.nlm.nih.gov/homologene

Reciprocal blast searches can locate the orthologs easily since the species are closely related.

**LeightonP** · 08-31-2012, 11:52 PM

Ensembl.

To say that two sequences are orthologous is to make a statement about their evolutionary history. Specifically, that their divergences is due to a speciation event.

Ensembl's orthologue-finding method (Compara) can establish the existence of such a relationship, within the limits of the supplied data. See http://www.ensembl.org/info/docs/com...gy_method.html for a description of the method.

Reciprocal best hits cannot, by themselves, establish such a relationship. Although it is typically true that a pair of orthologues must also be reciprocal best hits, the converse is not true. See http://armchairbiology.blogspot.co.u...last-hits.html for a bit more detail.

It might be worth remembering that just because two sequences are orthologous, it is not necessarily the case that their function is conserved. Orthology is a statement about evolutionary history, not biochemical or biological function.

**SES** · 09-02-2012, 07:07 AM

You may want to look at the orthoMCL database which also has precomputed orthologs for many species. The software for computing the ortholog/paralog gene relationships between species is freely available if your species are not already in the database.

**LeightonP** · 09-02-2012, 11:38 AM

It's worth noting though that, despite the name, OrthoMCL isn't necessarily the most reliable way to determine putative orthologues. From the OrthoMCL site:

OrthoMCL starts with reciprocal best hits within each genome as potential in-paralog/recent paralog pairs and reciprocal best hits across any two genomes as potential ortholog pairs. Related proteins are interlinked in a similarity graph. Then MCL (Markov Clustering algorithm,Van Dongen 2000; www.micans.org/mcl) is invoked to split mega-clusters. This process is analogous to the manual review in COG construction. MCL clustering is based on weights between each pair of proteins, so to correct for differences in evolutionary distance the weights are normalized before running MCL.

i.e. there is MCL-based clustering on the basis of sequence similarity, but not detailed reconstruction of evolutionary relationships.

OrthoMCL was compared with a (clustered) RBH approach in this paper, which notes that

simple algorithms, like CRBH, may be better ortholog predictors than more complex ones (e.g., ORTHOMCL and MULTIPARANOID) for evolutionary and functional genomics studies where the objective is the accurate inference of single-copy orthologs (e.g., molecular phylogenetics), but that all algorithms fail to accurately predict orthologs when paralogy is rampant.

As with any bioinformatic method, it is useful to understand both the method itself, and how it answers the well-formed question that you set for it

**SES** · 09-02-2012, 01:03 PM

Originally posted by LeightonP View Post

As with any bioinformatic method, it is useful to understand both the method itself, and how it answers the well-formed question that you set for it

Yes, and as the introduction of the paper you link to says, that method is useful only for detecting single-copy orthologs. It is not clear that this approach will inform you about gene family evolution within a species given that they say "all" algorithms will fail when paralogy is involved. That is a bit of an overstatement that may have been included because they did not address this issue. Also, the study focused on six closely related yeast species and I'm unsure the same results would be achieved with more distantly related species with a history of duplication events between them. This is especially true given that there are other publications demonstrating that methods like orthoMCL are more accurate than using RBH alone. So, it is best to consider what is most appropriate for the nature of the species and questions being investigated.

**LeightonP** · 09-02-2012, 02:35 PM

Originally posted by SES View Post

Yes, and as the introduction of the paper you link to says, that method is useful only for detecting single-copy orthologs.

Not so much that it's only useful for detecting single-copy orthologues, but that

simpler algorithms, like CRBH and CRSD, might be better choices for many downstream evolutionary analyses than more complex ones in cases where the objective is to identify orthogroups [...] the trend of several studies toward using more complex ortholog prediction strategies is not always justified. One of the criteria used in our selection of algorithms was for ones whose orthogroup predictions would be appropriate for use in phylogenetic analyses. Thus, we did not evaluate tree-based or hybrid-based algorithms.

(from further into the paper)

Originally posted by SES View Post

It is not clear that this approach will inform you about gene family evolution within a species given that they say "all" algorithms will fail when paralogy is involved.

Clearly they mean "all algorithms tested in this study", not "all possible algorithms". If you're interested in gene family evolution, you're probably interested in more than just orthology, anyway.

Originally posted by SES View Post

That is a bit of an overstatement that may have been included because they did not address this issue. Also, the study focused on six closely related yeast species and I'm unsure the same results would be achieved with more distantly related species with a history of duplication events between them.

Yes, but once you introduce gene duplication you're looking at a more complicated situation than orthology - immediately there's paralogy, and possible inparalogy, neither of which are orthology.

Regarding the limitations of the yeast-only study, maybe a study that included bacteria, fungi and other eukaryotes would be more convincing?

They also explore more than one definition of 'orthology', eliding it into 'functional equivalence'.

Originally posted by SES View Post

This is especially true given that there are other publications demonstrating that methods like orthoMCL are more accurate than using RBH alone.

You pays your money, and you takes your choice, as they say. This is a figure indicating that greater accuracy (and lower FDR) was found in orthologue prediction for RBH than OrthoMCL. I wouldn't be surprised to see a study that showed the opposite, given the importance of dataset choice for benchmarking.

Also

we observe that the more sophisticated tree reconstruction/reconciliation approach of Ensembl Compara was at times outperformed by pairwise comparison approaches, even in phylogenetic tests.

So, no easy answers. Ensembl isn't always better than RBH.

Originally posted by SES View Post

So, it is best to consider what is most appropriate for the nature of the species and questions being investigated.

Indeed. There are many important factors here: relationships between organisms, nature of the genes/gene family, and whether you really want orthologues, or paralogues, or instead functional equivalents, or if you would be happy with identifying 'family members'.

**SES** · 09-02-2012, 04:35 PM

Originally posted by LeightonP View Post

Indeed. There are many important factors here: relationships between organisms, nature of the genes/gene family, and whether you really want orthologues, or paralogues, or instead functional equivalents, or if you would be happy with identifying 'family members'.

We seem to agree that investigating orthology and paralogy require care attention and the question will necessarily determine the path. So, there is no need in belaboring the obvious fact that no approach is always right and lecturing on what has been stated numerous times.

As a side note: please don't try to get prosy and repost links to the same material in order to prove a point. Of course, you can do that, but it may distract from the point of this thread, which is helping the OP and providing some sound advice. I'll add that I appreciate you posting the links to those papers, but I don't think arguing for one approach is constructive when it has not been established what the system and specific questions are.

**LeightonP** · 09-02-2012, 11:31 PM

Originally posted by SES View Post

As a side note: please don't try to get prosy and repost links to the same material in order to prove a point. Of course, you can do that, but it may distract from the point of this thread, which is helping the OP and providing some sound advice.

I'm sure the OP finds your criticism of my writing style most helpful

**SES** · 09-03-2012, 06:07 AM

Originally posted by LeightonP View Post

OrthoMCL was compared with a (clustered) RBH approach in this paper

How is this approach implemented? It is not clear how repeatable this method is if everyone is to derive there own interpretation when trying to apply the approach.

**LeightonP** · 09-03-2012, 08:36 AM

Originally posted by SES View Post

How is this approach implemented? It is not clear how repeatable this method is if everyone is to derive there own interpretation when trying to apply the approach.

If you mean the cRBH algorithm, it's in the methods section.

**SES** · 09-03-2012, 10:22 AM

Originally posted by LeightonP View Post

If you mean the cRBH algorithm, it's in the methods section.

Obviously, there is an analytical description of the algorithm in the methods section. My question was how is it implemented? Meaning, how did the authors carry out the described procedure and how are others supposed to apply it to their own data?

This is just a practical consideration, but it should be considered when deciding which method you choose to use in your study. It may not be worthwhile to try and implement a serious of algorithms based on one study when other methods, more or less accurate depending on the system and questions, already exist. In this case, the authors found an alternative method for ortholog detection that did make a difference (relative to existing clustering methods), but it may or may not be easy to repeat, and may or may not make a big difference in another application.

@LeightonP, I was not trying to criticize your writing style, but did not see the point in a long-winded dissection of previous comments with the same basic information. It appeared to be argumentative in nature, but perhaps I misunderstood. Apologies if I took that incorrectly because I felt like we were saying much the same thing.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

orthology search-Biomart or RBH

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News