Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Choosing MAXKMERLENGTH in Velvet

    Hi everyone,
    I am a newbie in genome sequencing and assembly and I need some help.

    I had to sequence several bacterial genomes (the biggest is about 5.5 Mb) with Illumina MiSeq as part of my PhD project.

    The drafts were assembled by the sequencing company, but I would like to learn the whole process on my own.

    Currently, I am learning to use Velvet. I've read the manual and the associated papers but I still cannot figure out how to choose the proper MAXKMERLENGTH value. My reads are 150 bp long (paired-end).

    Could you give me an advice?

    Thank you for your time in advance.

  • #2
    The MAXKMERLENGTH is the longest kmer size you can use for your analysis,
    and you have to set this when you compile velvet (the default used to be 31). Check the manual, I think setting a higher MAXKMERLENGTH means that velvet uses more memory.

    This has nothing to do with what is the best kmer size to use for your assembly.

    Comment


    • #3
      You may want to look into newer assemblers as well.

      See:


      and:
      De novo genome assembly is the process of reconstructing a complete genomic sequence from countless small sequencing reads. Due to the complexity of this task, numerous genome assemblers have been developed to cope with different requirements and the different kinds of data provided by sequencers within the fast evolving field of next-generation sequencing technologies. In particular, the recently introduced generation of benchtop sequencers, like Illumina's MiSeq and Ion Torrent's Personal Genome Machine (PGM), popularized the easy, fast, and cheap sequencing of bacterial organisms to a broad range of academic and clinical institutions. With a strong pragmatic focus, here, we give a novel insight into the line of assembly evaluation surveys as we benchmark popular de novo genome assemblers based on bacterial data generated by benchtop sequencers. Therefore, single-library assemblies were generated, assembled, and compared to each other by metrics describing assembly contiguity and accuracy, and also by practice-oriented criteria as for instance computing time. In addition, we extensively analyzed the effect of the depth of coverage on the genome assemblies within reasonable ranges and the k-mer optimization problem of de Bruijn Graph assemblers. Our results show that, although both MiSeq and PGM allow for good genome assemblies, they require different approaches. They not only pair with different assembler types, but also affect assemblies differently regarding the depth of coverage where oversampling can become problematic. Assemblies vary greatly with respect to contiguity and accuracy but also by the requirement on the computing power. Consequently, no assembler can be rated best for all preconditions. Instead, the given kind of data, the demands on assembly quality, and the available computing infrastructure determines which assembler suits best. The data sets, scripts and all additional information needed to replicate our results are freely available at ftp://ftp.cebitec.uni-bielefeld.de/pub/GABenchToB.

      Comment


      • #4
        @ mastal: Thank you for your answer. May be I should have written my question in a better way. Let me try again, please.

        As far as I understood from the papers the k-mer length has a big impact on the assembly quality so it is a common practice to use Velvet Optimizer in order to identify the best hash length and cut-off values. Is that correct?

        However, when I install Velvet, the default MAXKMERLENGTH value is 31. I guess it is so, because the manual was written in 2008 and the reading power of Illumina was much lower at that time.

        My question is - which is the MAXKMERLENGTH value I must enter when I compile Velvet as my reads are 150 bp long and what is the principle of choosing it?

        @ fanli: Thank you for the references. I am going to check them out.

        Comment


        • #5
          There are 2 separate concepts here, the kmer lengths you try for your assembly, and the max kmer length you compile velvet with, which has to do with how much memory velvet sets aside for the calculations.

          MAXKMERLENGTH:
          If you have 150 bp reads, then in principle you can use any value for kmer size up to 149 (kmer size has to be an odd number). But velvet will only let you use values up to whatever MAXKMERLENGTH you recompile velvet with (the default is 31), so if you want to try larger values you will have to recompile velvet with a larger MAXKMERLENGTH. And this has no effect on your assembly, except that you won't be able to try out any kmer values higher than the value of MAXKMERLENGTH. And it will affect the amount of memory that velvet uses.

          kmer length and assembly:
          As to what kmer lengths to try for your assembly, there are programs like velvetk and velvet optimiser that try to give you an idea of what values to try. Look at some of the code examples in the velvet manual and try several values, you will see which ones give you better parameters for your assembly, in terms of N50, number of contigs, etc in the log file velvet produces at the end of each run. You can calculate kmer coverage. The higher the kmer length you choose, the lower the coverage. Velvet actually doesn't do so well with very high coverage. You don't have to worry that you have to choose the best value to start off with, you will need to run velvet many times to optimise the assembly.

          Hope this helps.

          Comment


          • #6
            That's exactly the answer I needed, thanks a lot!

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            8 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            8 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            67 views
            0 likes
            Last Post seqadmin  
            Working...
            X