Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • draft genome quality

    Hi everyone.
    I just trying to deal with the work started for one of the master students in my new lab. I'm now dealing with a set of 64 bacterial draft genomes, and the first thing that I need to understand is if the quality of the assembly is good or not.
    I have a huge list of different statistics, that I supposed were obtained by using Velvet for genome assembly. These statistics include the following:
    - Read length
    - Total read length (bp)
    - Total read length (Mb)
    - Estimated coverage
    - Velvet hash value
    - Total number of sequences
    - Total number of contings
    - N50
    - Length lof longest contig
    - Total bases in contigs
    - Number of consigns > 1K
    - Total bases in contigs > 1K

    Can any one explain me what's the meaning of each statistic and where I should relay most to evaluate the quality of the obtained draft genome??
    I'm completely lost!

    Thank you very much in advance

  • #2
    The problem with the statistics is that none of them tell you what you really want to know - how accurate is the assembly.

    The n50 is probably the most used statistic, but there are other statistics that give important information. Have a look at this paper,
    De novo genome assembly: What every biologist should know.



    If the assemblies were done using velvet, you might also want to have a look
    at the velvet manual:


    If you need to know more about de novo assembly, have a look at this article which used to be on the SEQanswers SeqWiki

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      04-22-2024, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 11:49 AM
    0 responses
    13 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-24-2024, 08:47 AM
    0 responses
    16 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    61 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    60 views
    0 likes
    Last Post seqadmin  
    Working...
    X