SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Getting a full annotation onto a consensus sequence in CLC Genomics Workbench Dapip33 Genomic Resequencing 1 09-19-2013 07:02 AM
CLC Genomics Workbench ECO Bioinformatics 65 03-27-2012 04:05 AM
CLC Genomics Workbench for de novo RNA-seq JQH Bioinformatics 1 07-12-2011 11:17 PM
CLC Genomics Workbench goes hand in hand with Ion Torrent data CLC bio Vendor Forum 0 05-12-2011 05:34 AM
Mapping RNA seq using CLC Genomics WOrkbench rururara Bioinformatics 1 02-22-2011 11:35 AM

Reply
 
Thread Tools
Old 01-25-2011, 02:33 AM   #1
figure002
Member
 
Location: Rotterdam

Join Date: Jan 2011
Posts: 11
Default CLC Genomics Workbench - Windows vs. Linux

Hello everyone. I'm a bioinformatics student from Holland and my internship supervisor just told me he's thinking about ordering a license for CLC Genomics Workbench. He asked me if analyses would run much faster if he'd run it on Linux. I know Linux can be much faster in some situations (e.g. web servers), but I have no idea when it comes data analyses with tools like this.

Do any of you have experience with this? Does Linux have advantages / disadvantages over Windows when it comes to do data analysis with CLC Genomics Workbench (or similar tools)? And if Linux would be significantly faster, would that mean we could purchase a computer with less RAM to save costs?
figure002 is offline   Reply With Quote
Old 01-25-2011, 02:59 AM   #2
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 415
Default

I have Clc for Linux and Windows, but have never benchmarked on the same machines. My feeling is that Linux would not be that much faster, if at all.
Linux might be more efficient with memory and stress the machine a little less.

Linux has the huge advantage for bioinformatics in that most tools are written for it.
colindaven is offline   Reply With Quote
Old 01-25-2011, 03:02 AM   #3
Bruins
Member
 
Location: Groningen

Join Date: Feb 2010
Posts: 78
Default

Hoi figure002,

We run CLC (and more) on an Ubuntu 10.8 x86_64 server with 24 cores and some 47G ram. I am not a CLC user so I can't really give you more details on its performance. People here say that it has trouble with the HiSeq data they feed it because it's just too much, despite the server. They then try to align their reads on one chromosome instead of the entire reference, which I think introduces false positives.

I'd think that the amount of memory is more important than the OS.

Cheers
Bruins is offline   Reply With Quote
Old 01-25-2011, 06:38 AM   #4
NextGenSeq
Senior Member
 
Location: USA

Join Date: Apr 2009
Posts: 482
Default

Yes, the amount of RAM is the important thing. You need a minimum of 16GB and more is better.
NextGenSeq is offline   Reply With Quote
Old 01-25-2011, 07:49 AM   #5
figure002
Member
 
Location: Rotterdam

Join Date: Jan 2011
Posts: 11
Default

Thanks for the replies guys. I mailed the guys at CLC bio as well, and got a similar reply:

"The OS does not matter with regards to speed. It is based on the number of CPU's and RAM of the machine."

So the OS doesn't really matter. I'm sure there are differences in performance, but those are probably minimal. So he's probably going to stick with Windows, since that's what he's used to.

(This is actually the first time I hear about machines with 24 cores and 47G ram. I didn't know such things existed..)
figure002 is offline   Reply With Quote
Old 01-26-2011, 07:45 AM   #6
Bruins
Member
 
Location: Groningen

Join Date: Feb 2010
Posts: 78
Default

Quote:
Originally Posted by figure002 View Post
This is actually the first time I hear about machines with 24 cores and 47G ram. I didn't know such things existed..
Welcome to the wonderful world of NGS!

If you don't have one of those and buying is no option, there are compute clusters out there. Check for example SARA or ask the NBIC for information. You're not the only one dealing with large quantities of data and expensive computations in our region :P

ps I'm also interested in your status as 'bioinformatics student': HBO or master's internship? Which uni? I've barely lost the bioinformatics student status myself...

Last edited by Bruins; 01-26-2011 at 07:47 AM.
Bruins is offline   Reply With Quote
Old 01-26-2011, 01:42 PM   #7
figure002
Member
 
Location: Rotterdam

Join Date: Jan 2011
Posts: 11
Default

Ahh, computer clusters, that's probably one of the things I'll learn about in my specialisation "high throughput" which starts in about 2 weeks. I just finished my internship with an awesome grade.

PS. I'm a junior at the Leiden University of Applied Sciences (Hogeschool Leiden) and I'm working towards my bachelor's degree. Can't wait to finally get started and earn some money. Where did you study?
figure002 is offline   Reply With Quote
Old 01-26-2011, 05:25 PM   #8
DNASTAR
Registered Vendor
 
Location: Madison, WI

Join Date: Aug 2010
Posts: 48
Default

Hi figure002,

I wanted to mention that DNASTAR has a new version of SeqMan NGen that does very fast assemblies of any size genome on a desktop computer. (Bacterial genomes < 1 minute; the whole human genome in <24 hrs).

If you are interested in learning more, you can check out our website, or message me and I can arrange for a free trial of the software.

Thanks,
Anne
DNASTAR is offline   Reply With Quote
Old 01-27-2011, 12:10 AM   #9
Bruins
Member
 
Location: Groningen

Join Date: Feb 2010
Posts: 78
Default

@figure002: I PBed you to avoid slow chat in this thread
Bruins is offline   Reply With Quote
Old 01-27-2011, 12:27 AM   #10
jkbonfield
Senior Member
 
Location: Cambridge, UK

Join Date: Jul 2008
Posts: 146
Default

Quote:
Originally Posted by DNASTAR View Post
Hi figure002,

I wanted to mention that DNASTAR has a new version of SeqMan NGen that does very fast assemblies of any size genome on a desktop computer. (Bacterial genomes < 1 minute; the whole human genome in <24 hrs).
This is perhaps a bit misleading as the website claims "Reference-Guided Human Genome Assembly". Is that basically mapping a la bwa, bowtie, etc, or an actual assembly?

I can see hybrids existing too, which some groups already do. Map the bits that map and then try to extend the bits which don't to identify insertion sequence, and possibly then have a basic denovo assembly algorithm for the rest (but acknowledge it'll most likely be very short contigs).
jkbonfield is offline   Reply With Quote
Old 01-27-2011, 05:42 AM   #11
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 415
Default

Quote:
Originally Posted by figure002 View Post
Thanks for the replies guys. I mailed the guys at CLC bio as well, and got a similar reply:

"The OS does not matter with regards to speed. It is based on the number of CPU's and RAM of the machine."

So the OS doesn't really matter. I'm sure there are differences in performance, but those are probably minimal. So he's probably going to stick with Windows, since that's what he's used to.

(This is actually the first time I hear about machines with 24 cores and 47G ram. I didn't know such things existed..)

Perhaps the OS doesn't matter too much with CLCBio, but I'd stick to Linux since many or even most of the programs in NGS are designed for and tested primarily on Linux.
Also the Linux command line allows easy access to sequence files, which Windows fails miserably at.
colindaven is offline   Reply With Quote
Old 01-27-2011, 09:33 AM   #12
DNASTAR
Registered Vendor
 
Location: Madison, WI

Join Date: Aug 2010
Posts: 48
Default

Quote:
Originally Posted by jkbonfield View Post
This is perhaps a bit misleading as the website claims "Reference-Guided Human Genome Assembly". Is that basically mapping a la bwa, bowtie, etc, or an actual assembly?

I can see hybrids existing too, which some groups already do. Map the bits that map and then try to extend the bits which don't to identify insertion sequence, and possibly then have a basic denovo assembly algorithm for the rest (but acknowledge it'll most likely be very short contigs).
SeqMan NGen generates a fully gapped assembly. This benchmark time for human genome assembly also includes full SNP statistical analysis to the entire dbSNP data base. The output from SeqMan NGen is a BAM file plus accessory files that provide SNP, coverage and feature information that are important for downstream analysis.
DNASTAR is offline   Reply With Quote
Old 01-27-2011, 11:03 AM   #13
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

I suppose you take "assembly" to mean "mapping to a reference", otherwise a BAM file as output wouldn't make any sense.

I prefer the term mapping or alignment, as "assembly" should be reserved for the reconstruction of a genome without a reference. (or perhaps "reference-guided assembly", but then you would expect FASTA files as output, not BAM)
kopi-o is offline   Reply With Quote
Old 01-27-2011, 12:28 PM   #14
DNASTAR
Registered Vendor
 
Location: Madison, WI

Join Date: Aug 2010
Posts: 48
Default

Quote:
Originally Posted by kopi-o View Post
I suppose you take "assembly" to mean "mapping to a reference", otherwise a BAM file as output wouldn't make any sense.

I prefer the term mapping or alignment, as "assembly" should be reserved for the reconstruction of a genome without a reference. (or perhaps "reference-guided assembly", but then you would expect FASTA files as output, not BAM)
Yes, this is an alignment but is not simple mapping to a human genome reference sequence. Algorithms like Bowtie map reads to the reference genome and produce an ungapped BAM file, where the reference sequence cannot be gapped to accept variations. SeqMan NGen creates a gapped BAM file perfectly suitable for SNP variation analysis. Also the SeqMan BAM viewer can display the gapped alignment and easily navigate the genome and variation report. Other BAM viewers (like Tablet) do not display reference gaps, so insertions are missing from the alignment views, and are not suitable for variation analysis.
DNASTAR is offline   Reply With Quote
Old 01-28-2011, 12:23 AM   #15
jkbonfield
Senior Member
 
Location: Cambridge, UK

Join Date: Jul 2008
Posts: 146
Default

Personally I'm happy for BAM to be used as an alignment output format too - it certainly makes sense and isn't only to be reserved for mapping. The logical approach to this is to use the contig consensus sequences in place of the references.

You're right that many mapped alignment viewers do a dismal job of displaying indels (even tview in some cases). For now this appears to be more in the domain of assembly editors. I'm biased of course, but gap5 can handle such things and no doubt CLC's and DNASTAR's own tools too.
jkbonfield is offline   Reply With Quote
Old 01-28-2011, 02:34 AM   #16
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 264
Default

if you want to progress in bioinformatics, go for linux.

There aren't a lot of bioinformatic tools on windows.
NicoBxl is offline   Reply With Quote
Old 02-01-2011, 12:16 PM   #17
danny0085
Junior Member
 
Location: danny0085

Join Date: Feb 2011
Posts: 3
Default best option

No doubt that the best option is ubuntu linux. Specially for servers
danny0085 is offline   Reply With Quote
Old 02-01-2011, 02:21 PM   #18
figure002
Member
 
Location: Rotterdam

Join Date: Jan 2011
Posts: 11
Default

Quote:
Originally Posted by NicoBxl View Post
if you want to progress in bioinformatics, go for linux.

There aren't a lot of bioinformatic tools on windows.
Quote:
Originally Posted by danny0085 View Post
No doubt that the best option is ubuntu linux. Specially for servers
I'm an Ubuntu user myself, and Linux would be my first choice as well. But my (former) internship supervisor is a Windows user, and he told me that many of the (bioinformatics) tools he uses are for Windows. But then again, there may be Linux versions or alternatives (I didn't ask which tools he uses exactly).

But if you are right about most bioinformatics tools being available for Linux, than maybe Linux would be a better choice after all.
figure002 is offline   Reply With Quote
Old 12-14-2012, 08:12 AM   #19
GopalJayaraj
Junior Member
 
Location: New Delhi

Join Date: Dec 2012
Posts: 1
Default Trouble installing CLC genomics on Ubuntu 12.04

Dear all,

I am new to using NGS analysis tools being a biologist with very limited knowledge about programming. I was browsing about NGS analysis tools which were beginner friendly. I came across CLC genomics workbench and downloaded it onto a Ubuntu workstation (64bit) with 16GB ram. However, the rpm package wont install (inspite of using Alien). The error message i get is the following :

~$ sudo alien CLCGenomicsWorkbench_5_5_2_64.rpm
error: incorrect format: unknown tag
Warning: Skipping conversion of scripts in package CLCGenomicsWorkbench: postinst prerm
Warning: Use the --scripts parameter to include the scripts.
cpio: premature end of file
CLCGenomicsWorkbench_5_5_2_64.rpm is for architecture i386 ; the package cannot be built on this system

I would appreciate if someone guides me through the installation.

Thanks and regards
GopalJayaraj is offline   Reply With Quote
Old 12-14-2012, 05:43 PM   #20
lethalfang
Member
 
Location: San Francisco, CA

Join Date: Aug 2011
Posts: 91
Default

Quote:
Originally Posted by GopalJayaraj View Post
Dear all,

I am new to using NGS analysis tools being a biologist with very limited knowledge about programming. I was browsing about NGS analysis tools which were beginner friendly. I came across CLC genomics workbench and downloaded it onto a Ubuntu workstation (64bit) with 16GB ram. However, the rpm package wont install (inspite of using Alien). The error message i get is the following :

~$ sudo alien CLCGenomicsWorkbench_5_5_2_64.rpm
error: incorrect format: unknown tag
Warning: Skipping conversion of scripts in package CLCGenomicsWorkbench: postinst prerm
Warning: Use the --scripts parameter to include the scripts.
cpio: premature end of file
CLCGenomicsWorkbench_5_5_2_64.rpm is for architecture i386 ; the package cannot be built on this system

I would appreciate if someone guides me through the installation.

Thanks and regards
.rpm works for Redhat based Linux distributions (rpm = redhat package management).
Ubuntu is a Debian based distribution. It needs .deb files, or the installation needs to be built from source.

Edit: Sorry, just noticed that you were using "alien." No experience with that.

Last edited by lethalfang; 12-14-2012 at 05:45 PM.
lethalfang is offline   Reply With Quote
Reply

Tags
clc genomics workbench, linux, speed

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:42 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO