Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Dear Friends,

    Thank you very much for your valuable advices! Peng

    Comment


    • #17
      Hardware configuration for Whole genome analyses

      Hi,

      To do set a whole genome (let say human size ..60-70X ...500GB data) (de novo assembly and then downstream analysis like annotation, SNP call etc) sequencing data nalysis lab (just a research lab), what minimum computer configure you would suggest ?

      Better server or cluster?

      thanks

      Comment


      • #18
        The 15-core chips are Ivy Bridge which is newer but doesn't perform significantly better in integer applications, which includes most bioinformatics programs.
        Hi Brian, All- I'm curious about this statement... Honestly I didn't even know that some CPUs could perform better than others in integer applications. What do you mean by integer applications? What are some bioinformatics tasks which make heavy use of it? And between a poorly performing CPU and a good one, how much difference (in time?) can we expect for real life jobs?

        Thanks!

        Comment


        • #19
          What I meant by that is that the last few generations of Intel processors (since Sandy Bridge) have similar clock frequencies. Their primary architectural improvements, on a per-clock basis, are in their ability to handle wider floating-point vector instructions. This is useful for doing matrix multiplies, common in physics and 3D graphics, but not in bioinformatics (aside from perhaps structural modeling). Therefore, buying a Sandy Bridge, Ivy Bridge, or Haswell processor will not affect bioinformatics algorithms much unless they have different numbers of cores or different clock speeds. For reference, a floating-point operation uses numbers with decimals, like "3.1514*2.7182", while an integer instruction is anything else, such as "1+1" or "if(A>B)" or "if(A==B)" where both A and B are integers.

          Much of bioinformatics depends on instructions like "if(A==B)" or more specifically, "if(A[i]==B[j])", asking the question of whether the integer value (a base) at a given location in an array (a sequence) is equal to that in another array, for alignment. Assembly is another integer-based algorithm, whether using OLC assemblers or DeBruijn-graph assemblers. Floating-point operations are rare in bioinformatics, except when translating quality values to probabilities.

          So, for the last few generations of Intel processors, chips with the same number of cores and the same clock frequency will perform about the same. Generally, performance will be proportional to clock frequency (X GHz). However, in multithreaded programs - which includes alignment, and sometimes other algorithms such as error-correction, normalization, quality-trimming, adapter-trimming, and occasionally assembly, the speed will also be proportional to the number of cores, so a 12-core processor will be 3 times as fast as a 4-core processor. A lot of algorithms are not multithreaded (or not very well multithreaded), though, in which case the number of cores won't matter. Spades, for example, is a good assembler but does not really scale past 3 cores; while Megahit and Ray scale very well to however many cores you give them. Gzip is single-threaded and can only use one core; Pigz, which does exactly the same thing, can use all available cores and is thus a much better choice for compressing fastq files! So, for example, Gzip will take 8 times as long as Pigz on a 8-core machine. I think both of them use only integer instructions.

          There is not much competition at the high end for Intel CPUs right now because they perform so much better than AMD CPUs in most applications. These are a couple articles from Tech Report, where I get a lot of comparative benchmarks:




          Unfortunately, a lot of those benchmarks use floating-point or vector instructions which makes them not very relevant to bioinformatics.

          To answer your question, for a singlethreaded application, in the last few generations of Intel processors, a 3GHz processor will be 50% faster than a 2GHz processor. For a fully multithreaded application (most aligners), a 12-core processor will also be almost 3x as fast as a 4-core processor (though not quite), but for a single-threaded process (like gzip, Trimmomatic, fastx, etc) the number of cores does not matter, and for a poorly multithreaded process (like Spades, and many versions of Blast) it will not go any faster after a few cores.
          Last edited by Brian Bushnell; 04-07-2015, 01:44 AM.

          Comment


          • #20
            Hi Brian, thanks a lot for the thorough reply! I could understand how clock speed and number of cores could affect performance. What I wasn't aware of is the difference in performance between floating point and integer operations and how the two characteristics are differing in their improvement over time (I guess the game industry is driving the floating point interest...?). Very interesting, thanks for clarifying.

            Dario

            Comment


            • #21
              Dario,

              CPUs process integer and floating-point calculations with different physical pieces of hardware. Floating-point operations are generally slower, and thus easier to improve simply by adding more transistors, but this has no impact on the integer performance (though improving integer performance DOES increase FP performance, as it's impossible for a program to be purely floating-point). For a long time, it appeared that games were generally driving the improvement in floating-point performance; but these days, games are more limited by video cards, not CPUs. So, I think the increase in FP performance is for two reasons:

              1) It's easier to improve than integer performance, partly because floating-point code is usually less branchy and easier to vectorize.
              2) Supercomputers are measured by floating-point throughput (flops), and it seems like those metrics trickle down everywhere in industry, even for systems that don't have floating-point workloads.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                Yesterday, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              58 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              54 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              45 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              55 views
              0 likes
              Last Post seqadmin  
              Working...
              X