Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to measure similarity between species genome?

    hi,

    How to calculate similarity between humans and animals?
    For example, Chimpanzees are 96% genetically similar to humans.

  • #2
    I am not sure if you are merely interested in a numeric % value (not trivial to calculate) but the two main genome browsers do the following.

    UCSC - e.g. 46-way conservation track for Vertebrates

    This track shows multiple alignments of 46 vertebrate species and measurements of evolutionary conservation using two methods (phastCons and phyloP) from the PHAST package, for all species (vertebrate) and two subsets (primate and placental mammal). The multiple alignments were generated using multiz and other tools in the UCSC/Penn State Bioinformatics comparative genomics alignment pipeline. Conserved elements identified by phastCons are also displayed in this track.

    PhastCons (which has been used in previous Conservation tracks) is a hidden Markov model-based method that estimates the probability that each nucleotide belongs to a conserved element, based on the multiple alignment. It considers not just each individual alignment column, but also its flanking columns. By contrast, phyloP separately measures conservation at individual columns, ignoring the effects of their neighbors. As a consequence, the phyloP plots have a less smooth appearance than the phastCons plots, with more "texture" at individual sites. The two methods have different strengths and weaknesses. PhastCons is sensitive to "runs" of conserved sites, and is therefore effective for picking out conserved elements. PhyloP, on the other hand, is more appropriate for evaluating signatures of selection at particular nucleotides or classes of nucleotides (e.g., third codon positions, or first positions of miRNA target sites).

    Another important difference is that phyloP can measure acceleration (faster evolution than expected under neutral drift) as well as conservation (slower than expected evolution). In the phyloP plots, sites predicted to be conserved are assigned positive scores (and shown in blue), while sites predicted to be fast-evolving are assigned negative scores (and shown in red). The absolute values of the scores represent -log p-values under a null hypothesis of neutral evolution. The phastCons scores, by contrast, represent probabilities of negative selection and range between 0 and 1.

    Both phastCons and phyloP treat alignment gaps and unaligned nucleotides as missing data, and both were run with the same parameters for each species set (vertebrates, placental mammals, and primates). Thus, in regions in which only primates appear in the alignment, all three sets of scores will be the same, but in regions in which additional species are available, the mammalian and/or vertebrate scores may differ from the primate scores. The alternative plots help to identify sequences that are under different evolutionary pressures in, say, primates and non-primates, or mammals and non-mammals.
    Ensembl uses these methods:

    Comment


    • #3
      Originally posted by zhaopeihua View Post
      hi,

      How to calculate similarity between humans and animals?
      For example, Chimpanzees are 96% genetically similar to humans.
      By one groups overall bulk estimate, yes. Since that number is based on overall genome alignment, and since there are large tracts of the genome that simply do not have a single unambiguous optimal alignment, anyone else computing a single overall similarity may get a value somewhat different. That 96% value includes a lot of highly repetitive elements covering large regions of the genome.

      Of the 4% difference in that one estimate, barely 1.2% was actual single nucletoide polymorphisms in known coding regions. So the 96% similarity estimates doesn't really tell you much in the way of what differences are actually important or not.

      As far as I know, there is no single method or algorithm for computing such similarity scores, as the first thing you need is a single overall optimal genomic alignment. And there will always be some subjectivity, for at least some regions, in such an alignment in two complete mammalian genomes. A single such number also fails to inform you at all about how the differences are distributed in the genome. For example, they are not at all uniformily distributed across homologous chromosomes, with chromosomes 4, 9 and 12 being quite distinctive from the others.

      Last edited by mbblack; 08-25-2014, 05:54 AM.
      Michael Black, Ph.D.
      ScitoVation LLC. RTP, N.C.

      Comment


      • #4
        I wanna use the proportion of animal genome that could be align to human representing similarity, is this way reasonable?

        Comment


        • #5
          What exactly are you trying to do? Identify orthologs/paralogs or longer syntenic regions?

          Comment


          • #6
            Originally posted by GenoMax View Post
            I am not sure if you are merely interested in a numeric % value (not trivial to calculate) but the two main genome browsers do the following.

            UCSC - e.g. 46-way conservation track for Vertebrates



            Ensembl uses these methods:

            http://www.ensembl.org/info/genome/compara/index.html
            Originally posted by GenoMax View Post
            What exactly are you trying to do? Identify orthologs/paralogs or longer syntenic regions?
            Just need an indicator reflects genome similarity

            Comment


            • #7
              Just need an indicator reflects genome similarity
              But what do you mean by similarity. This is not an easy thing to answer, and can be quite subjective depending on the measure / method. Here are some that I can think of off the top of my head:
              • Proportion of SNPs that have the same major (or reference) allele
              • Proportion of the genome that matches using a BLAST-like search with default options
              • Median percent identity (or similarity) for the 100 most abundant proteins
              • Proportion of genes with homologous genes in the other species
              • Number of large-scale chromosomal rearrangement events (doesn't translate well to a percentage)


              And if your answer is "yes, any of those will do", then you're probably better off sticking with "some random people say we are 99%/96%/50% similar to bananas/chimpanzees/our siblings", and not caring about the specifics of the number or the method.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              51 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              68 views
              0 likes
              Last Post seqadmin  
              Working...
              X