Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Coverage requirement for denovo sequencing

    Is there any study to explore the relationship between read length, raw read accuracy, coverage, genome size and the resulting number of contigs for denovo assembly? I am particularly interested in human genome. For example, if we have 10k reads, do we need less throughput to achieve the same quality assembly?

    Appreciate someone give me some pointers. Simulation is fine too.

  • #2
    A place to start would be Lander-Waterman statistics


    however, these don't incorporate the actual structure of the human genome -- notably the repeats. These will favor long reads even more.

    Comment


    • #3
      Originally posted by krobison View Post
      A place to start would be Lander-Waterman statistics


      however, these don't incorporate the actual structure of the human genome -- notably the repeats. These will favor long reads even more.
      Thanks for the pointer. Its a very useful set of notes. In the NIH estimate for human genome sequencing cost:

      The following 'sequence coverage' values were used in calculating the cost per genome:

      Sanger-based sequencing (average read length=500-600 bases): 6-fold coverage
      454 sequencing (average read length=300-400 bases): 10-fold coverage
      Illumina and SOLiD sequencing (average read length=50-100 bases): 30-fold coverage

      (http://www.genome.gov/sequencingcosts/)

      They listed different coverage numbers from that estimated from Lander-Waterman statistics. If I use genome size 3Gb, 1 contig, read lengths 600, 400 and 100 bps, the formula gives coverage of 20.2, 18.7 and 18.3. (The numbers above are 30, 10 and 6). Any idea why its way off, esp for Sanger?

      Comment


      • #4
        I would assume the NIH estimates take into account repeats and error rates as well as sequencing biases.

        Different sequencing technologies favor different GC% sequences, and to get the 99% coverage some technologies must sequence more to get the needed coverage of GC% extremes.

        Also repeats complicate matters, as short sequences containing repeats can not be effectively anchored, requiring higher coverage.

        Finally error rates result in some loss of coverage, again depending on the tech used.

        So in summary, yes a formula would be great, but guidelines based on experimental results are probably going to be more accurate as several different factors besides read length strongly impact the results.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        30 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        32 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Working...
        X