Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CLC Genomics Workbench slow in de novo assembly

    Hi!

    I have paired-end Illumina genomic data in 4 libraries with insert sizes 180, 500, 800 and 2kbp. All the libraries are from one sample and they have been trimmed and quality filtered by the sequencing company and they are very high quality.

    However, we got the CLC Genomics Workbench 7 to our computer and we're trying to assemble these libraries together into contigs with no reference sequence. Parameters other than defaults:

    Wordsize: 64
    Bubble size: 133

    Mapping back to contigs

    Perform scaffolding


    However the assembly halts for days into the mapping-phase. Is this normal? The mapping back to contigs should be slow, but how slow should it be? The data is over 10 GB per library.

    Thank you for all the help!

  • #2
    You probably should contact CLC Tech Support for help with this question (http://www.clcbio.com/support/contact/).

    Comment


    • #3
      [QUOTE= However the assembly halts for days into the mapping-phase. Is this normal? The mapping back to contigs should be slow, but how slow should it be? The data is over 10 GB per library. Thank you for all the help![/QUOTE]

      It is advisable to contact CLC as suggested by @Genomax. In my experience, the De Novo assembly using CLC Genomic workbench takes ~2 h for 3 GB data (let a total of 5-7 different samples). I suspect something is going on wrong, try to let the software detects automatically the bubble and word size and see if it can be different.

      Comment


      • #4
        Thank you for the advice!

        I noticed that I had made a simple mistake of importing the libraries with R1 and R2 separately because the sequencing company did not inform us what the minimum and maximum distances for the paired ends are. So, could it be just that the assembly is stuck when the unpaired reads from all the libraries are being mixed together?

        We also contacted CLC and they informed us to do the mapping back to contig separately, use all libraries for the assembly and increase the word size.

        Comment


        • #5
          To be honest I have used CLC aswell and found that Spades gives a much better assembly comparatively! You should try the platforms that performed well in GAGE, just because it is a licensed software doesn't mean its the best.

          Comment


          • #6
            Spades is very good as well - in my assemblies (BAC pools) sometimes CLC performed better sometimes Spades. In most cases I got the best results assembling reads that were error corrected by SPAdes in CLC.

            CLC is the least demanding (with regards to the input data) assembler I have encountered so far; it almost always produces a reasonable assembly no matter which types of data are available. In one of our projects Allpaths completely refused to assemble certain parts of a (heterozygous) genome - CLC did (with the libraries being tailored for Allpaths LG).
            In my limited experience there are always many different factors at play which influence the assembly metrics - among them hitting the right amount of input sequence data. CLC is comparatively tolerant in this regard as well.

            Btw, I always use the maximum word-size now in CLC.

            Originally posted by lucio89 View Post
            To be honest I have used CLC aswell and found that Spades gives a much better assembly comparatively! You should try the platforms that performed well in GAGE, just because it is a licensed software doesn't mean its the best.
            Last edited by luc; 10-02-2014, 09:43 PM.

            Comment


            • #7
              CLC may be fast but stats that I have gotten back even N50 which i rarely rely on are better! (I dont rely on N50 because it can be negated when proper error correction isnt employed!). It depends on the genome you are assembling and also the computational power you have (server or computer) but i would always go for something that was developed by someone that is trying to work out the problem rather than a company that is trying to make money!

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              7 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              7 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              66 views
              0 likes
              Last Post seqadmin  
              Working...
              X