Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Assessing quality and accuracy of de novo genome assembly

    All,
    I am curious whether anyone out there has a method for assessing the quality and accuracy of de novo genome assemblies? I am currently doing in silico simulations of de novo genome assembly from a previously sequenced genome to determine the best assembly parameters (K-mer size, coverage cutoff etc) and optimal dataset (mate pair library size, coverage etc). The ultimate goal will be to use these parameters to assemble the genome of a related species, de novo.

    However, the difficulty is that after simulating the data and making a de novo assembly I don't know of any statistics or methods to compare the assembled contigs back to original sequence that they were simulated from. This requires two steps
    (1) align assembled contigs to reference genome
    (2) assess the fit

    People often optimize N50, assembly size, contig number and other length-based measurements - but this only makes for bigger and bigger contigs and there is little information about whether these contigs are accurate. I have been using BLAST to compare the contigs to the reference and asking how well they fit, how long the alignments are and how many mis-assembled contigs there are. If anyone has ideas or methods for assessing the accuracy ( or overall similarity of an assembly and a genome) I would be grateful to hear about it. - Rob

  • #2
    Good questions - I don't think there is a simple or single answer at this point.

    This paper suggests a potential metric
    An international, peer-reviewed genome sciences journal featuring outstanding original research that offers novel insights into the biology of all organisms


    You might be interested in the recently announced Assemblathon
    An offshoot of the Genome 10K project, and primarily organized by the UC Davis Genome Center, Assemblathons are contests to assess state-of-the-art methods in the field of genome assembly....

    Comment


    • #3
      To check assemblies...

      ... you need an external truth. We are using 'optical mapping' from OpGen, for example. Any sort of physical map or some kinds of PCR can be used, however, depending on the size of the genomes. You can't rely on other genomes.

      For some microbes, things like skew can help you get a sense of whether your assemblies seem wrong, but they aren't necessarily a solid confirmation one way or the other.

      Comment


      • #4
        Originally posted by bckirkup View Post
        ... you need an external truth. We are using 'optical mapping' from OpGen, for example. Any sort of physical map or some kinds of PCR can be used, however, depending on the size of the genomes. You can't rely on other genomes.

        For some microbes, things like skew can help you get a sense of whether your assemblies seem wrong, but they aren't necessarily a solid confirmation one way or the other.
        Sorry I think I was unclear - what I am doing is simulating short read data from a species with a sequenced genome. Then trying to do de novo assembly of that simulated data. Then I want to compare my assemblies to the original genome to see how well the assemblies performed. It is similar to just testing the quality of short read genome assemblers. However, my ultimate goal is to use what I have learned in this species to apply to de novo assembly of another, related species with NO previously sequenced genome. I am trying to find a good method/metric to assess the quality of those simulated assemblies.
        Nickloman's comments are helpful, but I haven't yet been able to read the first paper that was recommended.

        Comment


        • #5
          This paper (interesting read otherwise) describes some metrics, I guess there will be other papers with yet more ways of getting quality metrics. Feels like we should develop a consensus...

          Comment


          • #6
            I don't think there is any one metric for genome assembly quality. Obviously, size matters, but so does representation (how much of the genome sequence is actually covered), mismatch rate, indel rate, and misassembly rate. All these quality metrics are derived relatively easily for a known standard genome, when you use simulated reads extracted from it. Check out www.plantagora.org, for more information on the whole question. That's what it's focus is - simulated read assembly to evaluate a lot of different sequencing and assembly approaches. The project uses a long list of metrics for evaluation of the assemblies (so you can decide which are most important to you).

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-27-2024, 06:37 PM
            0 responses
            12 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-27-2024, 06:07 PM
            0 responses
            11 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            53 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            69 views
            0 likes
            Last Post seqadmin  
            Working...
            X