SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
search fasta maria_mari Bioinformatics 2 06-11-2012 04:28 AM
TopHat closure based search and coverage based search tasteandsee Bioinformatics 1 03-27-2012 02:47 AM
search a introduction camelbbs Bioinformatics 0 08-31-2011 04:15 PM
ensembl biomart 3' UTR problem NicoBxl Bioinformatics 0 04-07-2011 01:03 AM
software search? nucleosome Bioinformatics 1 01-31-2010 01:18 AM

Reply
 
Thread Tools
Old 08-31-2012, 12:07 PM   #1
bioman1
Member
 
Location: US

Join Date: May 2012
Posts: 80
Default orthology search-Biomart or RBH

Hi all,

I have set of genes of interest(in Ensembl gene ids), I want to compare with five closely related species in Ensembl and to find orthologues of genes among all. I have plan to go for reciprocal blast(RBH) or using Ensembl-Biomart. Which is the best method to find orthologues?
bioman1 is offline   Reply With Quote
Old 08-31-2012, 12:36 PM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,091
Default

You have not said what species you are working with but NCBI may have some pre-computed results that may be of interest to you: http://www.ncbi.nlm.nih.gov/homologene

Reciprocal blast searches can locate the orthologs easily since the species are closely related.
GenoMax is offline   Reply With Quote
Old 09-01-2012, 12:52 AM   #3
LeightonP
Member
 
Location: Scotland

Join Date: Feb 2011
Posts: 29
Default

Ensembl.

To say that two sequences are orthologous is to make a statement about their evolutionary history. Specifically, that their divergences is due to a speciation event.

Ensembl's orthologue-finding method (Compara) can establish the existence of such a relationship, within the limits of the supplied data. See http://www.ensembl.org/info/docs/com...gy_method.html for a description of the method.

Reciprocal best hits cannot, by themselves, establish such a relationship. Although it is typically true that a pair of orthologues must also be reciprocal best hits, the converse is not true. See http://armchairbiology.blogspot.co.u...last-hits.html for a bit more detail.

It might be worth remembering that just because two sequences are orthologous, it is not necessarily the case that their function is conserved. Orthology is a statement about evolutionary history, not biochemical or biological function.
LeightonP is offline   Reply With Quote
Old 09-02-2012, 08:07 AM   #4
SES
Senior Member
 
Location: Vancouver, BC

Join Date: Mar 2010
Posts: 275
Default

You may want to look at the orthoMCL database which also has precomputed orthologs for many species. The software for computing the ortholog/paralog gene relationships between species is freely available if your species are not already in the database.
SES is offline   Reply With Quote
Old 09-02-2012, 12:38 PM   #5
LeightonP
Member
 
Location: Scotland

Join Date: Feb 2011
Posts: 29
Default

It's worth noting though that, despite the name, OrthoMCL isn't necessarily the most reliable way to determine putative orthologues. From the OrthoMCL site:

Quote:
OrthoMCL starts with reciprocal best hits within each genome as potential in-paralog/recent paralog pairs and reciprocal best hits across any two genomes as potential ortholog pairs. Related proteins are interlinked in a similarity graph. Then MCL (Markov Clustering algorithm,Van Dongen 2000; www.micans.org/mcl) is invoked to split mega-clusters. This process is analogous to the manual review in COG construction. MCL clustering is based on weights between each pair of proteins, so to correct for differences in evolutionary distance the weights are normalized before running MCL.
i.e. there is MCL-based clustering on the basis of sequence similarity, but not detailed reconstruction of evolutionary relationships.

OrthoMCL was compared with a (clustered) RBH approach in this paper, which notes that

Quote:
simple algorithms, like CRBH, may be better ortholog predictors than more complex ones (e.g., ORTHOMCL and MULTIPARANOID) for evolutionary and functional genomics studies where the objective is the accurate inference of single-copy orthologs (e.g., molecular phylogenetics), but that all algorithms fail to accurately predict orthologs when paralogy is rampant.
As with any bioinformatic method, it is useful to understand both the method itself, and how it answers the well-formed question that you set for it
LeightonP is offline   Reply With Quote
Old 09-02-2012, 02:03 PM   #6
SES
Senior Member
 
Location: Vancouver, BC

Join Date: Mar 2010
Posts: 275
Default

Quote:
Originally Posted by LeightonP View Post
As with any bioinformatic method, it is useful to understand both the method itself, and how it answers the well-formed question that you set for it
Yes, and as the introduction of the paper you link to says, that method is useful only for detecting single-copy orthologs. It is not clear that this approach will inform you about gene family evolution within a species given that they say "all" algorithms will fail when paralogy is involved. That is a bit of an overstatement that may have been included because they did not address this issue. Also, the study focused on six closely related yeast species and I'm unsure the same results would be achieved with more distantly related species with a history of duplication events between them. This is especially true given that there are other publications demonstrating that methods like orthoMCL are more accurate than using RBH alone. So, it is best to consider what is most appropriate for the nature of the species and questions being investigated.
SES is offline   Reply With Quote
Old 09-02-2012, 03:35 PM   #7
LeightonP
Member
 
Location: Scotland

Join Date: Feb 2011
Posts: 29
Default

Quote:
Originally Posted by SES View Post
Yes, and as the introduction of the paper you link to says, that method is useful only for detecting single-copy orthologs.
Not so much that it's only useful for detecting single-copy orthologues, but that

Quote:
simpler algorithms, like CRBH and CRSD, might be better choices for many downstream evolutionary analyses than more complex ones in cases where the objective is to identify orthogroups [...] the trend of several studies toward using more complex ortholog prediction strategies is not always justified. One of the criteria used in our selection of algorithms was for ones whose orthogroup predictions would be appropriate for use in phylogenetic analyses. Thus, we did not evaluate tree-based or hybrid-based algorithms.
(from further into the paper)

Quote:
Originally Posted by SES View Post
It is not clear that this approach will inform you about gene family evolution within a species given that they say "all" algorithms will fail when paralogy is involved.
Clearly they mean "all algorithms tested in this study", not "all possible algorithms". If you're interested in gene family evolution, you're probably interested in more than just orthology, anyway.

Quote:
Originally Posted by SES View Post
That is a bit of an overstatement that may have been included because they did not address this issue. Also, the study focused on six closely related yeast species and I'm unsure the same results would be achieved with more distantly related species with a history of duplication events between them.
Yes, but once you introduce gene duplication you're looking at a more complicated situation than orthology - immediately there's paralogy, and possible inparalogy, neither of which are orthology.

Regarding the limitations of the yeast-only study, maybe a study that included bacteria, fungi and other eukaryotes would be more convincing? They also explore more than one definition of 'orthology', eliding it into 'functional equivalence'.

Quote:
Originally Posted by SES View Post
This is especially true given that there are other publications demonstrating that methods like orthoMCL are more accurate than using RBH alone.
You pays your money, and you takes your choice, as they say. This is a figure indicating that greater accuracy (and lower FDR) was found in orthologue prediction for RBH than OrthoMCL. I wouldn't be surprised to see a study that showed the opposite, given the importance of dataset choice for benchmarking.

Also

Quote:
we observe that the more sophisticated tree reconstruction/reconciliation approach of Ensembl Compara was at times outperformed by pairwise comparison approaches, even in phylogenetic tests.
So, no easy answers. Ensembl isn't always better than RBH.

Quote:
Originally Posted by SES View Post
So, it is best to consider what is most appropriate for the nature of the species and questions being investigated.
Indeed. There are many important factors here: relationships between organisms, nature of the genes/gene family, and whether you really want orthologues, or paralogues, or instead functional equivalents, or if you would be happy with identifying 'family members'.
LeightonP is offline   Reply With Quote
Old 09-02-2012, 05:35 PM   #8
SES
Senior Member
 
Location: Vancouver, BC

Join Date: Mar 2010
Posts: 275
Default

Quote:
Originally Posted by LeightonP View Post
Indeed. There are many important factors here: relationships between organisms, nature of the genes/gene family, and whether you really want orthologues, or paralogues, or instead functional equivalents, or if you would be happy with identifying 'family members'.
We seem to agree that investigating orthology and paralogy require care attention and the question will necessarily determine the path. So, there is no need in belaboring the obvious fact that no approach is always right and lecturing on what has been stated numerous times.

As a side note: please don't try to get prosy and repost links to the same material in order to prove a point. Of course, you can do that, but it may distract from the point of this thread, which is helping the OP and providing some sound advice. I'll add that I appreciate you posting the links to those papers, but I don't think arguing for one approach is constructive when it has not been established what the system and specific questions are.
SES is offline   Reply With Quote
Old 09-03-2012, 12:31 AM   #9
LeightonP
Member
 
Location: Scotland

Join Date: Feb 2011
Posts: 29
Default

Quote:
Originally Posted by SES View Post
As a side note: please don't try to get prosy and repost links to the same material in order to prove a point. Of course, you can do that, but it may distract from the point of this thread, which is helping the OP and providing some sound advice.
I'm sure the OP finds your criticism of my writing style most helpful
LeightonP is offline   Reply With Quote
Old 09-03-2012, 07:07 AM   #10
SES
Senior Member
 
Location: Vancouver, BC

Join Date: Mar 2010
Posts: 275
Default

Quote:
Originally Posted by LeightonP View Post
OrthoMCL was compared with a (clustered) RBH approach in this paper
How is this approach implemented? It is not clear how repeatable this method is if everyone is to derive there own interpretation when trying to apply the approach.
SES is offline   Reply With Quote
Old 09-03-2012, 09:36 AM   #11
LeightonP
Member
 
Location: Scotland

Join Date: Feb 2011
Posts: 29
Default

Quote:
Originally Posted by SES View Post
How is this approach implemented? It is not clear how repeatable this method is if everyone is to derive there own interpretation when trying to apply the approach.
If you mean the cRBH algorithm, it's in the methods section.
LeightonP is offline   Reply With Quote
Old 09-03-2012, 11:22 AM   #12
SES
Senior Member
 
Location: Vancouver, BC

Join Date: Mar 2010
Posts: 275
Default

Quote:
Originally Posted by LeightonP View Post
If you mean the cRBH algorithm, it's in the methods section.
Obviously, there is an analytical description of the algorithm in the methods section. My question was how is it implemented? Meaning, how did the authors carry out the described procedure and how are others supposed to apply it to their own data?

This is just a practical consideration, but it should be considered when deciding which method you choose to use in your study. It may not be worthwhile to try and implement a serious of algorithms based on one study when other methods, more or less accurate depending on the system and questions, already exist. In this case, the authors found an alternative method for ortholog detection that did make a difference (relative to existing clustering methods), but it may or may not be easy to repeat, and may or may not make a big difference in another application.

@LeightonP, I was not trying to criticize your writing style, but did not see the point in a long-winded dissection of previous comments with the same basic information. It appeared to be argumentative in nature, but perhaps I misunderstood. Apologies if I took that incorrectly because I felt like we were saying much the same thing.
SES is offline   Reply With Quote
Reply

Tags
bioinformatics, comparative genomics

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:35 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO