Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Velvet: choice of hash length k

    Hi,
    In choosing the hash length k, I found some tip from the Velvet manual.

    It says the relation between k-mer coverage C(k) and standard (nucleotide-wise) coverage C: C(k) = C*(L-k+1)/L where k is hash length, and L is read length. It recommends that C(k) should be above 10 to start getting decent results.

    For test, I wanted to use 1 lane SOLEXA result whose read length is 101 and total read count is 69,084,522 which is corresponding to standard (nucleotide-wise) coverage C, 2.46.

    With the C(2.46), L(101), I wanted to find the hash length k when setting C(k) to 10. However, calculated k is about -308 which is strange negative number.

    I've also seen that if I increase 'C' value, I can obtain the reasonable 'k' value. e.g) when 'C' is 14.23, 'k' can be about 31.
    Does it mean that I should increase 'C' value to get the reasonable hash length k?
    To do so, I think I should pool multiple lanes to increase 'C' value.
    However, it may cause memory problem.

    Is my approach correct? or is there anybody who has a different idea?
    Please let me know. Thanks in advance.

    Won-Chul.

  • #2
    C(k) simply refers to "kmer-coverage" or coverage depth measured in read-kmers per contig kmer.

    If you coverage is 2x, your read length is 101, and your kmer is 31 then your kmer-coverage is 1.4
    C(k) = C*(L-k+1)/L

    You should choose kmer based on specificity/sensitivity judgements.
    --
    Jeremy Leipzig
    Bioinformatics Programmer
    --
    My blog
    Twitter

    Comment


    • #3
      Thanks to your answer.

      As you said, kmer-coverage is 1.4 with the conditions I have.
      However, Velvet manual says that kmer-coverage should be above 10 to start getting decent results.
      If so, I think increasing 'C' is the only way to get kmer-coverage above 10 if I do not want to change other values such as 'L' and 'k'.
      Is that right?

      Won-Chul

      Comment


      • #4
        Ye, of course the only way to actually get deeper coverage is to generate more sequence. Assemblies improve rapidly as you go from 10x to 20x to 30x and slowly from there on.
        --
        Jeremy Leipzig
        Bioinformatics Programmer
        --
        My blog
        Twitter

        Comment


        • #5
          Originally posted by wclee47 View Post
          Hi,
          In choosing the hash length k, I found some tip from the Velvet manual.
          Won-Chul.
          BTW, you want de novo assemble genome? Your standard (nucleotide-wise) coverage C was so low, 2.46. Big genome?

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Recent Advances in Sequencing Analysis Tools
            by seqadmin


            The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
            05-06-2024, 07:48 AM
          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 05-10-2024, 06:35 AM
          0 responses
          19 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-09-2024, 02:46 PM
          0 responses
          22 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-07-2024, 06:57 AM
          0 responses
          21 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-06-2024, 07:17 AM
          0 responses
          21 views
          0 likes
          Last Post seqadmin  
          Working...
          X