Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CLC Genomics Workbench - Windows vs. Linux

    Hello everyone. I'm a bioinformatics student from Holland and my internship supervisor just told me he's thinking about ordering a license for CLC Genomics Workbench. He asked me if analyses would run much faster if he'd run it on Linux. I know Linux can be much faster in some situations (e.g. web servers), but I have no idea when it comes data analyses with tools like this.

    Do any of you have experience with this? Does Linux have advantages / disadvantages over Windows when it comes to do data analysis with CLC Genomics Workbench (or similar tools)? And if Linux would be significantly faster, would that mean we could purchase a computer with less RAM to save costs?

  • #2
    I have Clc for Linux and Windows, but have never benchmarked on the same machines. My feeling is that Linux would not be that much faster, if at all.
    Linux might be more efficient with memory and stress the machine a little less.

    Linux has the huge advantage for bioinformatics in that most tools are written for it.

    Comment


    • #3
      Hoi figure002,

      We run CLC (and more) on an Ubuntu 10.8 x86_64 server with 24 cores and some 47G ram. I am not a CLC user so I can't really give you more details on its performance. People here say that it has trouble with the HiSeq data they feed it because it's just too much, despite the server. They then try to align their reads on one chromosome instead of the entire reference, which I think introduces false positives.

      I'd think that the amount of memory is more important than the OS.

      Cheers

      Comment


      • #4
        Yes, the amount of RAM is the important thing. You need a minimum of 16GB and more is better.

        Comment


        • #5
          Thanks for the replies guys. I mailed the guys at CLC bio as well, and got a similar reply:

          "The OS does not matter with regards to speed. It is based on the number of CPU's and RAM of the machine."

          So the OS doesn't really matter. I'm sure there are differences in performance, but those are probably minimal. So he's probably going to stick with Windows, since that's what he's used to.

          (This is actually the first time I hear about machines with 24 cores and 47G ram. I didn't know such things existed..)

          Comment


          • #6
            Originally posted by figure002 View Post
            This is actually the first time I hear about machines with 24 cores and 47G ram. I didn't know such things existed..
            Welcome to the wonderful world of NGS!

            If you don't have one of those and buying is no option, there are compute clusters out there. Check for example SARA or ask the NBIC for information. You're not the only one dealing with large quantities of data and expensive computations in our region :P

            ps I'm also interested in your status as 'bioinformatics student': HBO or master's internship? Which uni? I've barely lost the bioinformatics student status myself...
            Last edited by Bruins; 01-26-2011, 08:47 AM.

            Comment


            • #7
              Ahh, computer clusters, that's probably one of the things I'll learn about in my specialisation "high throughput" which starts in about 2 weeks. I just finished my internship with an awesome grade.

              PS. I'm a junior at the Leiden University of Applied Sciences (Hogeschool Leiden) and I'm working towards my bachelor's degree. Can't wait to finally get started and earn some money. Where did you study?

              Comment


              • #8
                Hi figure002,

                I wanted to mention that DNASTAR has a new version of SeqMan NGen that does very fast assemblies of any size genome on a desktop computer. (Bacterial genomes < 1 minute; the whole human genome in <24 hrs).

                If you are interested in learning more, you can check out our website, or message me and I can arrange for a free trial of the software.

                Thanks,
                Anne

                Comment


                • #9
                  @figure002: I PBed you to avoid slow chat in this thread

                  Comment


                  • #10
                    Originally posted by DNASTAR View Post
                    Hi figure002,

                    I wanted to mention that DNASTAR has a new version of SeqMan NGen that does very fast assemblies of any size genome on a desktop computer. (Bacterial genomes < 1 minute; the whole human genome in <24 hrs).
                    This is perhaps a bit misleading as the website claims "Reference-Guided Human Genome Assembly". Is that basically mapping a la bwa, bowtie, etc, or an actual assembly?

                    I can see hybrids existing too, which some groups already do. Map the bits that map and then try to extend the bits which don't to identify insertion sequence, and possibly then have a basic denovo assembly algorithm for the rest (but acknowledge it'll most likely be very short contigs).

                    Comment


                    • #11
                      Originally posted by figure002 View Post
                      Thanks for the replies guys. I mailed the guys at CLC bio as well, and got a similar reply:

                      "The OS does not matter with regards to speed. It is based on the number of CPU's and RAM of the machine."

                      So the OS doesn't really matter. I'm sure there are differences in performance, but those are probably minimal. So he's probably going to stick with Windows, since that's what he's used to.

                      (This is actually the first time I hear about machines with 24 cores and 47G ram. I didn't know such things existed..)

                      Perhaps the OS doesn't matter too much with CLCBio, but I'd stick to Linux since many or even most of the programs in NGS are designed for and tested primarily on Linux.
                      Also the Linux command line allows easy access to sequence files, which Windows fails miserably at.

                      Comment


                      • #12
                        Originally posted by jkbonfield View Post
                        This is perhaps a bit misleading as the website claims "Reference-Guided Human Genome Assembly". Is that basically mapping a la bwa, bowtie, etc, or an actual assembly?

                        I can see hybrids existing too, which some groups already do. Map the bits that map and then try to extend the bits which don't to identify insertion sequence, and possibly then have a basic denovo assembly algorithm for the rest (but acknowledge it'll most likely be very short contigs).
                        SeqMan NGen generates a fully gapped assembly. This benchmark time for human genome assembly also includes full SNP statistical analysis to the entire dbSNP data base. The output from SeqMan NGen is a BAM file plus accessory files that provide SNP, coverage and feature information that are important for downstream analysis.

                        Comment


                        • #13
                          I suppose you take "assembly" to mean "mapping to a reference", otherwise a BAM file as output wouldn't make any sense.

                          I prefer the term mapping or alignment, as "assembly" should be reserved for the reconstruction of a genome without a reference. (or perhaps "reference-guided assembly", but then you would expect FASTA files as output, not BAM)

                          Comment


                          • #14
                            Originally posted by kopi-o View Post
                            I suppose you take "assembly" to mean "mapping to a reference", otherwise a BAM file as output wouldn't make any sense.

                            I prefer the term mapping or alignment, as "assembly" should be reserved for the reconstruction of a genome without a reference. (or perhaps "reference-guided assembly", but then you would expect FASTA files as output, not BAM)
                            Yes, this is an alignment but is not simple mapping to a human genome reference sequence. Algorithms like Bowtie map reads to the reference genome and produce an ungapped BAM file, where the reference sequence cannot be gapped to accept variations. SeqMan NGen creates a gapped BAM file perfectly suitable for SNP variation analysis. Also the SeqMan BAM viewer can display the gapped alignment and easily navigate the genome and variation report. Other BAM viewers (like Tablet) do not display reference gaps, so insertions are missing from the alignment views, and are not suitable for variation analysis.

                            Comment


                            • #15
                              Personally I'm happy for BAM to be used as an alignment output format too - it certainly makes sense and isn't only to be reserved for mapping. The logical approach to this is to use the contig consensus sequences in place of the references.

                              You're right that many mapped alignment viewers do a dismal job of displaying indels (even tview in some cases). For now this appears to be more in the domain of assembly editors. I'm biased of course, but gap5 can handle such things and no doubt CLC's and DNASTAR's own tools too.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X