SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to set parameters for mapping reads to a bacterial genome using "bwa-mem/bowtie"? joyjane88 RNA Sequencing 1 11-13-2013 02:54 AM
RNA-seq: reference genome of another species JQL Bioinformatics 7 07-17-2013 03:27 AM
How to measure the similarity? Fad2012 Bioinformatics 2 05-25-2013 04:02 AM
quantitative measure of coverage and reference genome issue sara_ General 1 03-14-2011 07:37 PM
Genome similarity measures GerryB General 3 05-23-2009 07:51 AM

Reply
 
Thread Tools
Old 08-24-2014, 10:02 PM   #1
zhaopeihua
Member
 
Location: china

Join Date: Aug 2013
Posts: 18
Default how to measure similarity between species genome?

hi,

How to calculate similarity between humans and animals?
For example, Chimpanzees are 96% genetically similar to humans.
zhaopeihua is offline   Reply With Quote
Old 08-25-2014, 05:14 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,031
Default

I am not sure if you are merely interested in a numeric % value (not trivial to calculate) but the two main genome browsers do the following.

UCSC - e.g. 46-way conservation track for Vertebrates

Quote:
This track shows multiple alignments of 46 vertebrate species and measurements of evolutionary conservation using two methods (phastCons and phyloP) from the PHAST package, for all species (vertebrate) and two subsets (primate and placental mammal). The multiple alignments were generated using multiz and other tools in the UCSC/Penn State Bioinformatics comparative genomics alignment pipeline. Conserved elements identified by phastCons are also displayed in this track.

PhastCons (which has been used in previous Conservation tracks) is a hidden Markov model-based method that estimates the probability that each nucleotide belongs to a conserved element, based on the multiple alignment. It considers not just each individual alignment column, but also its flanking columns. By contrast, phyloP separately measures conservation at individual columns, ignoring the effects of their neighbors. As a consequence, the phyloP plots have a less smooth appearance than the phastCons plots, with more "texture" at individual sites. The two methods have different strengths and weaknesses. PhastCons is sensitive to "runs" of conserved sites, and is therefore effective for picking out conserved elements. PhyloP, on the other hand, is more appropriate for evaluating signatures of selection at particular nucleotides or classes of nucleotides (e.g., third codon positions, or first positions of miRNA target sites).

Another important difference is that phyloP can measure acceleration (faster evolution than expected under neutral drift) as well as conservation (slower than expected evolution). In the phyloP plots, sites predicted to be conserved are assigned positive scores (and shown in blue), while sites predicted to be fast-evolving are assigned negative scores (and shown in red). The absolute values of the scores represent -log p-values under a null hypothesis of neutral evolution. The phastCons scores, by contrast, represent probabilities of negative selection and range between 0 and 1.

Both phastCons and phyloP treat alignment gaps and unaligned nucleotides as missing data, and both were run with the same parameters for each species set (vertebrates, placental mammals, and primates). Thus, in regions in which only primates appear in the alignment, all three sets of scores will be the same, but in regions in which additional species are available, the mammalian and/or vertebrate scores may differ from the primate scores. The alternative plots help to identify sequences that are under different evolutionary pressures in, say, primates and non-primates, or mammals and non-mammals.
Ensembl uses these methods:

http://www.ensembl.org/info/genome/compara/index.html
GenoMax is offline   Reply With Quote
Old 08-25-2014, 05:49 AM   #3
mbblack
Senior Member
 
Location: Research Triangle Park, NC

Join Date: Aug 2009
Posts: 245
Default

Quote:
Originally Posted by zhaopeihua View Post
hi,

How to calculate similarity between humans and animals?
For example, Chimpanzees are 96% genetically similar to humans.
By one groups overall bulk estimate, yes. Since that number is based on overall genome alignment, and since there are large tracts of the genome that simply do not have a single unambiguous optimal alignment, anyone else computing a single overall similarity may get a value somewhat different. That 96% value includes a lot of highly repetitive elements covering large regions of the genome.

Of the 4% difference in that one estimate, barely 1.2% was actual single nucletoide polymorphisms in known coding regions. So the 96% similarity estimates doesn't really tell you much in the way of what differences are actually important or not.

As far as I know, there is no single method or algorithm for computing such similarity scores, as the first thing you need is a single overall optimal genomic alignment. And there will always be some subjectivity, for at least some regions, in such an alignment in two complete mammalian genomes. A single such number also fails to inform you at all about how the differences are distributed in the genome. For example, they are not at all uniformily distributed across homologous chromosomes, with chromosomes 4, 9 and 12 being quite distinctive from the others.

http://www.nature.com/nature/journal...ture04072.html
__________________
Michael Black, Ph.D.
ScitoVation LLC. RTP, N.C.

Last edited by mbblack; 08-25-2014 at 05:54 AM.
mbblack is offline   Reply With Quote
Old 08-25-2014, 05:49 PM   #4
zhaopeihua
Member
 
Location: china

Join Date: Aug 2013
Posts: 18
Default

I wanna use the proportion of animal genome that could be align to human representing similarity, is this way reasonable?
zhaopeihua is offline   Reply With Quote
Old 08-25-2014, 05:54 PM   #5
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,031
Default

What exactly are you trying to do? Identify orthologs/paralogs or longer syntenic regions?
GenoMax is offline   Reply With Quote
Old 08-25-2014, 05:59 PM   #6
zhaopeihua
Member
 
Location: china

Join Date: Aug 2013
Posts: 18
Default

Quote:
Originally Posted by GenoMax View Post
I am not sure if you are merely interested in a numeric % value (not trivial to calculate) but the two main genome browsers do the following.

UCSC - e.g. 46-way conservation track for Vertebrates



Ensembl uses these methods:

http://www.ensembl.org/info/genome/compara/index.html
Quote:
Originally Posted by GenoMax View Post
What exactly are you trying to do? Identify orthologs/paralogs or longer syntenic regions?
Just need an indicator reflects genome similarity
zhaopeihua is offline   Reply With Quote
Old 08-25-2014, 07:25 PM   #7
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

Quote:
Just need an indicator reflects genome similarity
But what do you mean by similarity. This is not an easy thing to answer, and can be quite subjective depending on the measure / method. Here are some that I can think of off the top of my head:
  • Proportion of SNPs that have the same major (or reference) allele
  • Proportion of the genome that matches using a BLAST-like search with default options
  • Median percent identity (or similarity) for the 100 most abundant proteins
  • Proportion of genes with homologous genes in the other species
  • Number of large-scale chromosomal rearrangement events (doesn't translate well to a percentage)

And if your answer is "yes, any of those will do", then you're probably better off sticking with "some random people say we are 99%/96%/50% similar to bananas/chimpanzees/our siblings", and not caring about the specifics of the number or the method.
gringer is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:38 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO