SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Complete Genomics



Similar Threads
Thread Thread Starter Forum Replies Last Post
CLC Genomics Workbench - Windows vs. Linux figure002 Bioinformatics 24 12-06-2013 07:10 AM
CLC Genomics Workbench ECO Bioinformatics 65 03-27-2012 05:05 AM
CLC Genomics Workbench for de novo RNA-seq JQH Bioinformatics 1 07-13-2011 12:17 AM
Mapping RNA seq using CLC Genomics WOrkbench rururara Bioinformatics 1 02-22-2011 12:35 PM
De novo hybrid assembly of 454/illumina : CLC workbench Bardj Bioinformatics 1 11-21-2010 05:14 PM

Reply
 
Thread Tools
Old 05-07-2014, 03:58 AM   #1
jyuems
Junior Member
 
Location: Finland

Join Date: May 2014
Posts: 6
Smile CLC Genomics Workbench slow in de novo assembly

Hi!

I have paired-end Illumina genomic data in 4 libraries with insert sizes 180, 500, 800 and 2kbp. All the libraries are from one sample and they have been trimmed and quality filtered by the sequencing company and they are very high quality.

However, we got the CLC Genomics Workbench 7 to our computer and we're trying to assemble these libraries together into contigs with no reference sequence. Parameters other than defaults:

Wordsize: 64
Bubble size: 133

Mapping back to contigs

Perform scaffolding


However the assembly halts for days into the mapping-phase. Is this normal? The mapping back to contigs should be slow, but how slow should it be? The data is over 10 GB per library.

Thank you for all the help!
jyuems is offline   Reply With Quote
Old 05-07-2014, 08:36 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,800
Default

You probably should contact CLC Tech Support for help with this question (http://www.clcbio.com/support/contact/).
GenoMax is offline   Reply With Quote
Old 05-08-2014, 11:48 PM   #3
Sergioo
Member
 
Location: Japan

Join Date: Oct 2013
Posts: 29
Default

[QUOTE= However the assembly halts for days into the mapping-phase. Is this normal? The mapping back to contigs should be slow, but how slow should it be? The data is over 10 GB per library. Thank you for all the help![/QUOTE]

It is advisable to contact CLC as suggested by @Genomax. In my experience, the De Novo assembly using CLC Genomic workbench takes ~2 h for 3 GB data (let a total of 5-7 different samples). I suspect something is going on wrong, try to let the software detects automatically the bubble and word size and see if it can be different.
Sergioo is offline   Reply With Quote
Old 05-13-2014, 05:36 AM   #4
jyuems
Junior Member
 
Location: Finland

Join Date: May 2014
Posts: 6
Default

Thank you for the advice!

I noticed that I had made a simple mistake of importing the libraries with R1 and R2 separately because the sequencing company did not inform us what the minimum and maximum distances for the paired ends are. So, could it be just that the assembly is stuck when the unpaired reads from all the libraries are being mixed together?

We also contacted CLC and they informed us to do the mapping back to contig separately, use all libraries for the assembly and increase the word size.
jyuems is offline   Reply With Quote
Old 09-26-2014, 08:06 PM   #5
lucio89
Junior Member
 
Location: Ireland

Join Date: Jan 2013
Posts: 7
Default

To be honest I have used CLC aswell and found that Spades gives a much better assembly comparatively! You should try the platforms that performed well in GAGE, just because it is a licensed software doesn't mean its the best.
lucio89 is offline   Reply With Quote
Old 10-02-2014, 10:40 PM   #6
luc
Senior Member
 
Location: US

Join Date: Dec 2010
Posts: 343
Default

Spades is very good as well - in my assemblies (BAC pools) sometimes CLC performed better sometimes Spades. In most cases I got the best results assembling reads that were error corrected by SPAdes in CLC.

CLC is the least demanding (with regards to the input data) assembler I have encountered so far; it almost always produces a reasonable assembly no matter which types of data are available. In one of our projects Allpaths completely refused to assemble certain parts of a (heterozygous) genome - CLC did (with the libraries being tailored for Allpaths LG).
In my limited experience there are always many different factors at play which influence the assembly metrics - among them hitting the right amount of input sequence data. CLC is comparatively tolerant in this regard as well.

Btw, I always use the maximum word-size now in CLC.

Quote:
Originally Posted by lucio89 View Post
To be honest I have used CLC aswell and found that Spades gives a much better assembly comparatively! You should try the platforms that performed well in GAGE, just because it is a licensed software doesn't mean its the best.

Last edited by luc; 10-02-2014 at 10:43 PM.
luc is offline   Reply With Quote
Old 10-03-2014, 04:50 PM   #7
lucio89
Junior Member
 
Location: Ireland

Join Date: Jan 2013
Posts: 7
Default

CLC may be fast but stats that I have gotten back even N50 which i rarely rely on are better! (I dont rely on N50 because it can be negated when proper error correction isnt employed!). It depends on the genome you are assembling and also the computational power you have (server or computer) but i would always go for something that was developed by someone that is trying to work out the problem rather than a company that is trying to make money!
lucio89 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:05 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO