Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • recommendations on disk storage infrastructure

    Hi
    We are planning the IT for our sequencing project. We are expecting to sequence half a dozen dairy bulls (Bos Taurus) at 30 times coverage a year for the next few years. I plan to thoroughly investigate the read mapping and variant calling process so as not too loose too many SNPs for our gene discovery work. So we'll be consuming a couple of Terabyte in reads and alignments per year. I can rent relatively expensive high quality fibre channel SAN disks for our compute cluster from the folks in IT or I can try and buy much more cost effective SATA disks in a Network attached storage box e.g. a teeny tiny isilon, (it is NZ...) or even build it myself with a Supermicro 4U storage chasis, 24 1TB SATA drives and a linux CD. I expect to use either a lot of MosaikAligner or Bowtie.

    Does anyone have any useful recommendations or experiences on:

    a) the value of FibreChannel/SAN disk for I/O performance?
    b) is NFS based storage for gzipped fastq read files good enough with 1 Gb networking i.e. compute node to SAN attached IO node or compute node to NAS node?
    c) Are 7200 rpm, 1 or 2 TB Sata drives a cost effective way to store reads
    for alignments.

    If I don't have to rent expensive SAN disk we can sequence more animals!

    Any thoughts would be appreciated.

  • #2
    Originally posted by mkeehan View Post
    a) the value of FibreChannel/SAN disk for I/O performance?
    zero, if you first move files to a local disk before analysis.
    Originally posted by mkeehan View Post
    b) is NFS based storage for gzipped fastq read files good enough with 1 Gb networking i.e. compute node to SAN attached IO node or compute node to NAS node?
    Yes, that will work.
    Originally posted by mkeehan View Post
    c) Are 7200 rpm, 1 or 2 TB Sata drives a cost effective way to store reads
    for alignments.
    sure, do be careful with the 2TB drives as they often have large block sizes and formatting needs to be done differently for optimal performance.

    Now a word of caution, a big part of the expense of the SAN you currently use is in maintenance and backups, not the hard disks. The data you intend to store on this machine is incredibly valuable, make sure you have a plan for backups / disaster recovery.

    Comment


    • #3
      Hi,

      just going through this myself.

      Keep in mind

      *1TB SATA is apparently more efficient than 2TB for RAID striping

      *spatially separated backups are crucial (additional NAS servers come in quite cheap, have RAID, cost < 1000 Euros and don't need a fancy cooled server room).

      *1 GB should be good enough. 10 Gbit seems to be a bit tricky with bad linux drivers, no standards etc. Apparently most 1 GB networks can potentially handle ca. 3 Gbit.

      *I think security is more crucial than performance for these applications, especially if you're happy with bowtie.

      *If you want to use faster SAS hard disks maybe put them in your analysis server rather than the storage solution.

      Hope that helps.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      30 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      32 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      28 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Working...
      X