Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by genetics_jo View Post
    My question is this, is this length of time normal for velvetg for assembling such a large dataset or has velvetg just run into a continuous loop and it will never run to completion?
    I used to work on velvet several years back, and discussed the code/parameters in our blog many times in those days.

    e.g.

    Appropriate choice of the ‘exp_cov’ (expected coverage) parameter in Velvet is very important to get an assembly right. In the following figure, we show data from a calculation on a set of reads taken from a 3Kb region of a genome, and reassembling them with varying exp_cov parameters. X-axis in the chart shows the exp_cov and y-axis shows the size of the largest scaffold assembled by Velvet.


    If you used Velvet genome assembler, you possibly have noticed a file named ‘Roadmaps’ being created by the ‘velveth’ program. Here is a brief explanation of the format of ‘Roadmap’ file explained by Daniel Zerbino, the author of Velvet.


    About an year back, we briefly explained the format of Roadmaps file generated by Velvet assembly program. Our explanation was brief and was not very helpful to understand all entries in the Roadmaps file. Reader SRB requested us to provide more details by working on an example.


    The problem with Velvet (especially velvetg) is that it is not at all optimized for large genomes and the time to get to output can be unpredictable. Moreover, you can trust its contig step, but not its scaffolding. However, the contigs produced by Velvet can be easily done by SOAPdenovo2 or Minia in much less time. For example, with the hardware you are describing, SOAPdenovo2 will give you the output in hours, not days.

    I know this is not the answer you asked for and you already mentioned about using other assemblers.
    Last edited by samanta; 04-11-2014, 03:38 PM.
    http://homolog.us

    Comment


    • #17
      Originally posted by genetics_jo View Post
      One other question...I've seen some folks say the paired end fastq files need to be merged together into a single file for "shortPaired" use in Velvet...and seen some say that the two paired end files need to be kept separate and let velvet read and coordinate reads. Which one is it? For example if I have files Humulus_lane1_read1_1.fastq and Humulus_lane1_read1_2.fastq, should these two files be merged together or kept separately for velvet to work properly?
      In the version I worked on two years back for trying to assemble a large genome (~600MB size), the paired reads needed to be merged into one file.

      FASTA Line 1-2 (read1 left)
      FASTA Line 3-4 (read1 right)

      etc.

      The difficulty I faced was that the scaffolds were completely unpredictable based on small changes in input parameters (exp_cov). It is not as if you run everything once, press a button and trust the output.

      That led me to move on to other assemblers. Also, I make sure I understand the code/algorithm of any assembler I use.
      Last edited by samanta; 04-11-2014, 03:41 PM.
      http://homolog.us

      Comment


      • #18
        Originally posted by genetics_jo View Post
        That's also what I've observed with the previous runs of velvet. The program is still "running" but RAM use and % of processor have remained the same now for several days. Would have thought if it wasn't going to work it would have crashed?

        A large part of the work is in removing 'tips' and 'bubbles' and then simplifying the graph. That is when you do not see any input/output.
        http://homolog.us

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X