SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
CLC Genomics Workbench - Windows vs. Linux figure002 Bioinformatics 24 12-06-2013 06:10 AM
CLC Genomic workbench -ANNOTATIONS- vanale Bioinformatics 8 06-11-2012 02:49 AM
CLC Genomics Workbench ECO Bioinformatics 65 03-27-2012 04:05 AM
CLC Workbench - Algorithm Question lmp77 Bioinformatics 0 03-30-2011 04:36 AM
CLC Genomic Workbench syambmed RNA Sequencing 0 03-23-2011 11:42 PM

Reply
 
Thread Tools
Old 09-21-2013, 08:09 AM   #1
yaximik
Senior Member
 
Location: Oregon

Join Date: Apr 2011
Posts: 205
Angry CLC workbench sucks ...

I took advantage of CLC's offer last spring to get a free 6 mo trial evaluation of full version of their Genomic Workbench. Now it is coming to end and I need a make decision whether it worth to pay for. So far it was mostly frustrating experience, but I wonder if someone is more satisfied.

The main problem that keeps me avoiding commercial supposedly more user-oriented and conveniently integrated software (that is what you are paying for, right?) was in fact it is totally opposite. Once the juggernaut is assembled and sold to you - you, stupid end user do not even try to question programmers' logic behind its beautiful design as you just not capable to get its beauty from your end user pimple point of view. That was always my experience when I tried to ask - why this way and why not this way? Can I do this? Why not, as this makes a little sense?

For example, why I have to waste the storage space to import gigabytes of my existing databases to create yet another database in a program-specific format that cannot be read by other software? Convenience of associated metadata? OK, but why this convenient format does not allow a reference assembly against selected genome regions, while open-source software allows that using less convenient data formats? No, you have to use the entire genome - it will be slow though. Nice.

Illumina software very rarely, but does throw into reads bizarre letters, like @, F, or Q, which may be found in a few reads out of, say 20 million. There is absolutely no justification why a user-friendly software cannot handle that and upon importing in its wonderful convenient specific format labels the entire 20+ million fasta dataset as protein leaving the end user with no other choice but to search the dataset manually to remove those 1-2 bizarre reads just to make it usable. No, you cannot question the logic behind that - a standard, however stupid, cannot be changed because of end user, but we do strive to meet expectations.

Well, all right, at least you pay for well documented and well maintained software. Yep, the manual is intimidating - hundreds of pages. Not much useful though, as indices at the end are a way off, illustrations and examples provided do not correspond what you see on the screen, selections to be made are missing - you got to be kidding me?

The last thing I was trying to do is just to import an annotated hg19. Not a big deal - you download hg19, and then RefGene.txt, and then connect them to each other so you can see in your reference assembly matches to specific genes, like it is done in a simple and convenient way in 454 GS RefMapper. No way. After downloading overnight hg19 (again!) from somewhere, I ended up only with sequences separated by chromosomes. That is a big help. Why I cannot just use my existing hg19.fna and RefGene.txt the manual does not say, at least I could not find it there - if anything can be ever found in this messed up volume.

Sorry for being long - it accumulated for six months to the point I cannot hold it anymore. But, if someone has a better experience, please give me a few good reasons why this juggernaut is worth paying for.
yaximik is offline   Reply With Quote
Old 09-21-2013, 08:35 AM   #2
JackieBadger
Senior Member
 
Location: Halifax, Nova Scotia

Join Date: Mar 2009
Posts: 381
Default

It isn't worth paying for. You can do everything and more with free software.
JackieBadger is offline   Reply With Quote
Old 09-22-2013, 07:11 PM   #3
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

Commercial software has its place and applications (otherwise such companies would not exist). I know people who are extremely satisfied because CLC is able to easily do what they need for their projects but then I also know of others who have experiences that is similar to yours.

No one can give you a good reason to pay for any commercial software without a complete understanding of your exact requirements/expectations (which would be hard to do via a forum). You need to make the purchase (or not) decision based on the best data you have on hand.
GenoMax is offline   Reply With Quote
Old 09-22-2013, 11:15 PM   #4
HenrivdGeest
Member
 
Location: Arnhem

Join Date: Feb 2012
Posts: 16
Default

like the posters before me, you don't HAVE to buy it

For me and for some colleagues CLC really is accelerating research. Its mostly the visualisation part that helps. I, as a bioinformatician usually use it for some quick and dirty testing, as well as presenting data to other people within our organisation. With CLC we (more than once) encountered some problems in our data just because everything is visualized. If you don't know where to look for, this is quite handy.

But, I also encounter a lot of annoying things/bugs, but I think because its payed software, people tend to get easily angry about it. For people(all researchers, non bioinformaticians) who don't know yet about CLC, I always say give it a try, and you will see if you like it or not.

And yes, everything could also be done by freeware/opensource software, and personally I always choose between CLC and some other software. Sometimes CLC is handy, sometimes commandline.
HenrivdGeest is offline   Reply With Quote
Old 09-23-2013, 09:47 AM   #5
Zigster
(Jeremy Leipzig)
 
Location: Philadelphia, PA

Join Date: May 2009
Posts: 116
Default

One thing that I've learned from all commercial bioinformatics software (not just CLC - they are probably the best of the bunch):

Quote:
It is really hard to build software that makes difficult things easier.
At some point a scientist is just going to have to learn how to code - there are just too many ways for an analysis to go off the beaten path and break a pre-packaged, canned, off-the-shelf suite.
__________________
--
Jeremy Leipzig
Bioinformatics Programmer
--
My blog
Twitter
Zigster is offline   Reply With Quote
Old 09-24-2013, 05:10 AM   #6
jkbonfield
Senior Member
 
Location: Cambridge, UK

Join Date: Jul 2008
Posts: 146
Default

Quote:
Originally Posted by yaximik View Post
For example, why I have to waste the storage space to import gigabytes of my existing databases to create yet another database in a program-specific format that cannot be read by other software? Convenience of associated metadata? OK, but why this convenient format does not allow a reference assembly against selected genome regions, while open-source software allows that using less convenient data formats? No, you have to use the entire genome - it will be slow though. Nice.
Not that I want to defend CLC, and I've no idea precisely why they do this either, but I am also guilty of using my own internal format for Gap5 requiring a full import. Why? Because BAM isn't a good format for editing.

Imagine a scenario where you are working on a denovo assembly and you need to join two contigs together containing 20 million sequences each. For BAM the resulting contig will require updating the position of 20 million sequences, possibly reverse complementing 20 million too if that's what the match indicated. Or... you could use a format that stores all data in a recursive R-Tree that requires only a handful of updates to achieve the same task. [1]

This is for Gap5 which does denovo editing work (and is pretty rubbish for reference based annotation tasks instead). I don't know how that relates to CLC, but sometimes there can be good programmer justification for doing something that seems daft.

James

[1] With hindsight I should have invented an overlay system that allows BAM to be used as the backend and only import and indexing system to permit rapid movement and restructuring of data. But that's another level of complexity.
jkbonfield is offline   Reply With Quote
Old 09-24-2013, 06:09 AM   #7
mcnelson.phd
Senior Member
 
Location: Connecticut

Join Date: Jul 2011
Posts: 162
Default

We've been using CLC heavily for a few years now, and while it's nowhere near perfect, it does have its merits.

Firstly, having a GUI interface allows people who are new to bioinformatics to quickly get started with analyzing data without having to wade into command line processing. I know that using command line programs aren't very difficult, but for most people it's very intimidating when they first start and can take quite a while before they fully understand some of the esoteric error messages or faults that can occur.

Also, there are a number of people who don't want to get that deeply involved in bioinformatics, they just want to analyze their data quickly so they can move their experiments along. For these people, CLC offers a convenient package that lets them do nearly all standard processing methods without having to get bogged down in a lot of details. It's a valid argument that I've given before that if you really want to do good work then you should have a good idea of how the program you use works, but the reality is that most people just want to know the end result and not how the sausage is made.

Second, often times configuring the software for your particular system is not trivial, and CLC provides a multitude of tools all in one complete package that will pretty much work without fail across all three major operating systems. For labs or research groups that have a mix of different computing systems, having one piece of software that looks and acts the same straight out of the box makes it easier to move data around and let people interact. Yes there are non-commerical tools that can achieve the same thing, like Galaxy for instance, but generally they're not as powerful or are more complex to set up and use. Also, for labs that only use Windows, many command line programs are unavailable to them or require a lot more configuration than on Mac or Linux systems. Since Windows is still the dominant OS, particularly because of Office, CLC offers a solution for data analysis that may not be available otherwise.

Third case, because everything is provided in a single package, you have the ability to track how your data was manipulated and can trace back from an analysis file to the original read data. This is something that command line programs don't offer unless you take very good notes or create your own processing logs as to what files were input into a program and what the outputs were. This is particularly useful in situations where you process something multiple ways to see what effect different types of options have on the result. This tracking is also very useful for cases where someone who's left the lab has their data in CLC, and someone new to the lab has to take parts of their data and do something else, which happens quite frequently in a lot of labs.

Now, saying all of that, I do have my fair share of complaints about CLC, and if it were just me I wouldn't consider it worth it. The only commercial software that I purchased using my own funds was Geneious, because it's much better than CLC at doing a lot of the simple sequence and genome editing that I prefer to do with a GUI based program (it's also a heck of a lot cheaper). Outside of that, I mostly use command line programs as I prefer that greater level of control, but then again I also have more experience doing such things than everyone else in my lab, so while that works for me it doesn't work for them.

Bottom line is, CLC has its merits, but based on your rant it seems like you'd rather stick with command line tools. If that's the case, then that's fine, but no one is forcing you to buy CLC or any other commercial software package.
mcnelson.phd is offline   Reply With Quote
Old 09-24-2013, 10:38 AM   #8
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

I totally agree with jkbonfield and mcnelson.phd.

.. as a sidenote, I never found any weird characters like @, F or Q in my Illumina fastq sequence portions ...
sklages is offline   Reply With Quote
Old 09-24-2013, 01:03 PM   #9
newbietonextgen
Member
 
Location: USA

Join Date: Nov 2010
Posts: 56
Default

Hi

I have been using CLC for sometime and was wondering if any one has compared metrics between CLC and other aligners.

I found something interesting and wanted to know if anybody has observed it. We took some RNA-seq data, 100bp paired end reads, and aligned it using the latest CLC and Tophat.

We then took 10,000 bp region from both the BAM files and looked at number of reads aligned and the accuracy's of the alignment.

So far, CLC aligns more reads to the same region compared to tophat (11,200/3500). Now coming to the big question of accuracy, we found twice the number of pairs in CLC than tophat (3346 vs 1640 pair). So the question is how is CLC doing it? Mind you it's only one region..

Can test people can suggest that would be very comprehensive.

cheers
newbie
newbietonextgen is offline   Reply With Quote
Old 09-24-2013, 02:43 PM   #10
mcnelson.phd
Senior Member
 
Location: Connecticut

Join Date: Jul 2011
Posts: 162
Default

Quote:
Originally Posted by newbietonextgen View Post
Hi
So far, CLC aligns more reads to the same region compared to tophat (11,200/3500). Now coming to the big question of accuracy, we found twice the number of pairs in CLC than tophat (3346 vs 1640 pair). So the question is how is CLC doing it? Mind you it's only one region..
CLC put out a white paper not too long ago (past year around when version 6 was released if I remember correctly) that detailed how their read mapper was more accurate and able to map more reads than bowtie and bwa. I never delved into the details, but I can also attest to the fact that CLC does map more reads to a reference sequence than bowtie/bowtie2. In many cases, I find that this is because the reference is circular and bowtie doesn't seem to handle that case very well. They may also have a more greedy algorithm, although that doesn't appear to be the case entirely. Either way, your findings are correct in that CLC maps more reads... the question still may be whether or not they're all mapped accurately?
mcnelson.phd is offline   Reply With Quote
Old 09-24-2013, 03:49 PM   #11
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

Quote:
Originally Posted by mcnelson.phd
Also, there are a number of people who don't want to get that deeply involved in bioinformatics, they just want to analyze their data quickly so they can move their experiments along. For these people, CLC offers a convenient package that lets them do nearly all standard processing methods without having to get bogged down in a lot of details. It's a valid argument that I've given before that if you really want to do good work then you should have a good idea of how the program you use works, but the reality is that most people just want to know the end result and not how the sausage is made.
There is a real danger, particularly when you combine an attitude of "just give me the end result" and easy to use software, of doing it wrong. A lot of people think that simply because they can do it in a program and get a result, that the result must therefore be right. At least when one is forced to learn something about the program, they may be forced to think more critically about it or at least seek out advice from those who do.
chadn737 is offline   Reply With Quote
Old 09-24-2013, 06:18 PM   #12
newbietonextgen
Member
 
Location: USA

Join Date: Nov 2010
Posts: 56
Default

Quote:
Originally Posted by mcnelson.phd View Post
CLC put out a white paper not too long ago (past year around when version 6 was released if I remember correctly) that detailed how their read mapper was more accurate and able to map more reads than bowtie and bwa.

I will look into the white paper. Is there a way to look into accuracy of alignment, as far as metrics etc. Any suite that can be used or work flow..

Thanks
newbie
newbietonextgen is offline   Reply With Quote
Old 09-25-2013, 04:21 AM   #13
mcnelson.phd
Senior Member
 
Location: Connecticut

Join Date: Jul 2011
Posts: 162
Default

Quote:
Originally Posted by newbietonextgen View Post
I will look into the white paper. Is there a way to look into accuracy of alignment, as far as metrics etc. Any suite that can be used or work flow..
To find the white paper, just google "CLC read mapping white paper", it should come up as the first thing.

I don't know off the top of my head of any good single metric to assess accuracy because that requires knowing where the reads should map to. In most cases, looking at the number of multiply mapped reads and the number of differences between the reads and the consensus may give a good indicator of quality, but only if you know there are no repetitive elements and no sequence variants between the reads and the reference. Sequencing noise would complicate things, because in some cases you might rather have noisy reads not mapped than mapped if you're trying to find something like low frequency variants. It's a bit like trying to assess how good an assembly is, you can use the N50 value, but that really doesn't tell you that much and may be misleading...
mcnelson.phd is offline   Reply With Quote
Old 09-25-2013, 05:28 AM   #14
rhinoceros
Senior Member
 
Location: sub-surface moon base

Join Date: Apr 2013
Posts: 372
Default

To be honest, when it comes to bioinformatics, I think all GUI-driven programs suck in comparison to command line alternatives (think e.g. parallelization, piping output from one program into another, and handling of million row tables). I understand the value of e.g. Geneious for people who can't be bothered to learn how to function at the command line, but then, those people aren't very serious bioinformaticians to begin with.
__________________
savetherhino.org
rhinoceros is offline   Reply With Quote
Old 09-25-2013, 05:43 AM   #15
mcnelson.phd
Senior Member
 
Location: Connecticut

Join Date: Jul 2011
Posts: 162
Default

Quote:
Originally Posted by rhinoceros View Post
To be honest, when it comes to bioinformatics, I think all GUI-driven programs suck in comparison to command line alternatives (think e.g. parallelization, piping output from one program into another, and handling of million row tables). I understand the value of e.g. Geneious for people who can't be bothered to learn how to function at the command line, but then, those people aren't very serious bioinformaticians to begin with.
That's a very ignorant position to take. Simply having a GUI front end to make working with and analyzing data easier doesn't make it less complex or powerful. Do you use a GUI based operating system? If so then your comments can't be taken seriously because it's the same difference. Command line programs are great, but they're not perfect simply because they don't have a GUI and are harder to use.

Further, would you say that something like IGV sucks because it provides a GUI interface for looking at mapping files? Where do you draw your limits, if it's a commercial piece of software then it must be bad? As I said earlier, programs like CLC and others can make it too easy for people to do bad analyses, but that's not the fault of the program as there are a lot of good studies that are done using CLC. In fact, it's probably more likely for someone to do bad science with command line programs that aren't very user friendly and have incomplete or incomprehensible documentation. The fact is that high throughput sequencing has become a standard tool like Sanger sequencing before it, and that means a lot more labs and people will be working with such data in the future. It's incumbent upon those of us who are good bioinformaticians to help design and provide tools that allow these newcomers to analyze their data accurately and reliably, and that's what CLC tries to do. You don't blame a car manufacturer for people being bad drivers, so don't do the same with bioinformatics tools.
mcnelson.phd is offline   Reply With Quote
Old 09-25-2013, 07:40 AM   #16
NextGenSeq
Senior Member
 
Location: USA

Join Date: Apr 2009
Posts: 482
Default

I like CLC Bio (except for it's price). It has tools for virtually any type of molecular biology analysis. The NGS tools are very powerful and work extremely well. The microarray analysis tools are weak but I don't know of another software package which does both.
NextGenSeq is offline   Reply With Quote
Old 09-25-2013, 09:18 AM   #17
luc
Senior Member
 
Location: US

Join Date: Dec 2010
Posts: 451
Default

To the OP,

I guess you have to face the fact that CLC, too, has a learning curve.
Yes, you are duplicating your data when importing them into CLC and it would be very nice if one could carry out some more simple data manipulations - like generating a random subset for the data that you have already loaded.
The latest version fortunately has added a batch export tool - which was very much needed.

As mentioned before, the ability to visualize your data immediately at any point of your analysis is the big advantage of tools like CLC and I guess Geneious - even for people comfortable with command line tools.

I have not run into the problems you mentioned and do not understand your question about the assembly.
I did benchmark the CLC aligner with the tools on the www.bioplanet.com/gcat website. With the default settings CLC aligns more reads than both BWA-MEM and Bowtie2, albeit also with slightly higher error rate. Obviously it would be easy to use more stringent aligner settings. BTW, the standard CLC aligner is fast and maps long Moleculo reads just fine (their large gap mapper is rather slow in contrast) .

Last edited by luc; 09-25-2013 at 09:30 AM.
luc is offline   Reply With Quote
Old 09-25-2013, 12:16 PM   #18
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

Quote:
Originally Posted by NextGenSeq View Post
The microarray analysis tools are weak but I don't know of another software package which does both.
GeneSpring and Partek Genomics suite can analyze multiple types of genomic data. They started off as microarray data analysis tools but over the years have added the ability to analyze NGS data. (Small print: In case of GeneSpring the NGS module requires a separate license (there is even a proteomics module), where as Partek has a single license that covers the entire suite).

Just to be clear: GeneSpring and Partek are not end to end solutions (like CLC) (there is a separate program called Partek flow that is like CLC but does need a server to run). One has to do alignments with a suitable aligner independently. GeneSpring and Partek can then ingest the BAM files for further statistical analysis.

Last edited by GenoMax; 09-25-2013 at 12:31 PM.
GenoMax is offline   Reply With Quote
Old 09-26-2013, 12:29 AM   #19
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Quote:
Originally Posted by mcnelson.phd View Post
Further, would you say that something like IGV sucks because it provides a GUI interface for looking at mapping files? Where do you draw your limits,
rhinoceros is talking about "command line alternatives". IGV has no command line alternative. So for (complex) visualization issues GUIs are more or less necessary. For mapping/assembly stuff you are usually more flexible with command line versions.
sklages is offline   Reply With Quote
Old 09-26-2013, 06:25 AM   #20
NextGenSeq
Senior Member
 
Location: USA

Join Date: Apr 2009
Posts: 482
Default

Quote:
Originally Posted by GenoMax View Post
GeneSpring and Partek Genomics suite can analyze multiple types of genomic data. They started off as microarray data analysis tools but over the years have added the ability to analyze NGS data. (Small print: In case of GeneSpring the NGS module requires a separate license (there is even a proteomics module), where as Partek has a single license that covers the entire suite).

Just to be clear: GeneSpring and Partek are not end to end solutions (like CLC) (there is a separate program called Partek flow that is like CLC but does need a server to run). One has to do alignments with a suitable aligner independently. GeneSpring and Partek can then ingest the BAM files for further statistical analysis.
Partek is pretty awful and GeneSpring costs as much as CLC Bio.
NextGenSeq is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:10 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO