Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • samanta
    Senior Member
    • Feb 2010
    • 108

    #16
    Originally posted by genetics_jo View Post
    My question is this, is this length of time normal for velvetg for assembling such a large dataset or has velvetg just run into a continuous loop and it will never run to completion?
    I used to work on velvet several years back, and discussed the code/parameters in our blog many times in those days.

    e.g.

    Appropriate choice of the ‘exp_cov’ (expected coverage) parameter in Velvet is very important to get an assembly right. In the following figure, we show data from a calculation on a set of reads taken from a 3Kb region of a genome, and reassembling them with varying exp_cov parameters. X-axis in the chart shows the exp_cov and y-axis shows the size of the largest scaffold assembled by Velvet.


    If you used Velvet genome assembler, you possibly have noticed a file named ‘Roadmaps’ being created by the ‘velveth’ program. Here is a brief explanation of the format of ‘Roadmap’ file explained by Daniel Zerbino, the author of Velvet.


    About an year back, we briefly explained the format of Roadmaps file generated by Velvet assembly program. Our explanation was brief and was not very helpful to understand all entries in the Roadmaps file. Reader SRB requested us to provide more details by working on an example.


    The problem with Velvet (especially velvetg) is that it is not at all optimized for large genomes and the time to get to output can be unpredictable. Moreover, you can trust its contig step, but not its scaffolding. However, the contigs produced by Velvet can be easily done by SOAPdenovo2 or Minia in much less time. For example, with the hardware you are describing, SOAPdenovo2 will give you the output in hours, not days.

    I know this is not the answer you asked for and you already mentioned about using other assemblers.
    Last edited by samanta; 04-11-2014, 03:38 PM.
    http://homolog.us

    Comment

    • samanta
      Senior Member
      • Feb 2010
      • 108

      #17
      Originally posted by genetics_jo View Post
      One other question...I've seen some folks say the paired end fastq files need to be merged together into a single file for "shortPaired" use in Velvet...and seen some say that the two paired end files need to be kept separate and let velvet read and coordinate reads. Which one is it? For example if I have files Humulus_lane1_read1_1.fastq and Humulus_lane1_read1_2.fastq, should these two files be merged together or kept separately for velvet to work properly?
      In the version I worked on two years back for trying to assemble a large genome (~600MB size), the paired reads needed to be merged into one file.

      FASTA Line 1-2 (read1 left)
      FASTA Line 3-4 (read1 right)

      etc.

      The difficulty I faced was that the scaffolds were completely unpredictable based on small changes in input parameters (exp_cov). It is not as if you run everything once, press a button and trust the output.

      That led me to move on to other assemblers. Also, I make sure I understand the code/algorithm of any assembler I use.
      Last edited by samanta; 04-11-2014, 03:41 PM.
      http://homolog.us

      Comment

      • samanta
        Senior Member
        • Feb 2010
        • 108

        #18
        Originally posted by genetics_jo View Post
        That's also what I've observed with the previous runs of velvet. The program is still "running" but RAM use and % of processor have remained the same now for several days. Would have thought if it wasn't going to work it would have crashed?

        A large part of the work is in removing 'tips' and 'bubbles' and then simplifying the graph. That is when you do not see any input/output.
        http://homolog.us

        Comment

        Latest Articles

        Collapse

        • SEQadmin2
          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by SEQadmin2


          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


          Here are nine questions we think about, in roughly the order they matter, before...
          06-18-2026, 07:11 AM
        • SEQadmin2
          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
          by SEQadmin2


          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
          ...
          06-02-2026, 10:05 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, 06-17-2026, 06:09 AM
        0 responses
        30 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-09-2026, 11:58 AM
        0 responses
        96 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-05-2026, 10:09 AM
        0 responses
        116 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-04-2026, 08:59 AM
        0 responses
        109 views
        0 reactions
        Last Post SEQadmin2  
        Working...