Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Velvet: choice of hash length k

    Hi,
    In choosing the hash length k, I found some tip from the Velvet manual.

    It says the relation between k-mer coverage C(k) and standard (nucleotide-wise) coverage C: C(k) = C*(L-k+1)/L where k is hash length, and L is read length. It recommends that C(k) should be above 10 to start getting decent results.

    For test, I wanted to use 1 lane SOLEXA result whose read length is 101 and total read count is 69,084,522 which is corresponding to standard (nucleotide-wise) coverage C, 2.46.

    With the C(2.46), L(101), I wanted to find the hash length k when setting C(k) to 10. However, calculated k is about -308 which is strange negative number.

    I've also seen that if I increase 'C' value, I can obtain the reasonable 'k' value. e.g) when 'C' is 14.23, 'k' can be about 31.
    Does it mean that I should increase 'C' value to get the reasonable hash length k?
    To do so, I think I should pool multiple lanes to increase 'C' value.
    However, it may cause memory problem.

    Is my approach correct? or is there anybody who has a different idea?
    Please let me know. Thanks in advance.

    Won-Chul.

  • #2
    C(k) simply refers to "kmer-coverage" or coverage depth measured in read-kmers per contig kmer.

    If you coverage is 2x, your read length is 101, and your kmer is 31 then your kmer-coverage is 1.4
    C(k) = C*(L-k+1)/L

    You should choose kmer based on specificity/sensitivity judgements.
    --
    Jeremy Leipzig
    Bioinformatics Programmer
    --
    My blog
    Twitter

    Comment


    • #3
      Thanks to your answer.

      As you said, kmer-coverage is 1.4 with the conditions I have.
      However, Velvet manual says that kmer-coverage should be above 10 to start getting decent results.
      If so, I think increasing 'C' is the only way to get kmer-coverage above 10 if I do not want to change other values such as 'L' and 'k'.
      Is that right?

      Won-Chul

      Comment


      • #4
        Ye, of course the only way to actually get deeper coverage is to generate more sequence. Assemblies improve rapidly as you go from 10x to 20x to 30x and slowly from there on.
        --
        Jeremy Leipzig
        Bioinformatics Programmer
        --
        My blog
        Twitter

        Comment


        • #5
          Originally posted by wclee47 View Post
          Hi,
          In choosing the hash length k, I found some tip from the Velvet manual.
          Won-Chul.
          BTW, you want de novo assemble genome? Your standard (nucleotide-wise) coverage C was so low, 2.46. Big genome?

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin


            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
            Today, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          37 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          41 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          35 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          54 views
          0 likes
          Last Post seqadmin  
          Working...
          X