Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Long Term Data Storage

    I am in the process of setting up a NGS core facility. I will be starting with a single HiSeq 1000 with an IlluminaCompute Tier0 analysis server. In a past life, I ran a NGS facility, in which we had a "medium-term" storage server and long-term tape back up system. File sizes have gotten so large, I'm not sure how practical it is to back up data on tape or deal with the hassle putting data on tape -- and retrieving if needed again in the future.

    A few questions for all of you:
    1. What data are you keeping ?
    -- keeping BCL = 330Gb
    -- keeping BAM = 330Gb
    -- total = 660Gb per run (paired end, 2 x 101bp)
    2. What long term data storage media are you using ?
    3. I am a geneticist/biologist --- I'm not an IT professional -- what would be the easiest solution for me ? (at some point, I will be hiring an informaticist/computational biologist)

    4. Would it be easier to store on external drives ?
    5. Do any of you back up data and send to another facility for storage - such as Iron Mountain ?

    Any advice you can give would be appreciated.
    Thank you,
    Michael

  • #2
    I a very few years you save the DNA libraries only.

    Comment


    • #3
      It seems a pragmatic solution to cost in a terabyte disk per sequencing run and use that as backup, assuming you have a place to store the disks.

      You might look into Basespace (illumina cloud solution) which i understand should be available for hiseq.

      Comment


      • #4
        We decided against external drives because of
        a) space
        b) organisation
        c) lack of mirroring (RAID)

        The last point is the most critical because we are required to save data for 10 years at the University. This can (hopefully) be guaranteed by tape and (maintained) RAID backups but not by off the shelf external HDs.

        We also have tape and spatially separate hard drive backups in case the server room burns down.

        Comment


        • #5
          Two (bare) disks, two separate locations?

          Comment


          • #6
            Also, these kind of blanket University data policies don't make sense in context of sequencing. They should understand the problem first, then make a data retention policy.

            Comment


            • #7
              The new floppy disk ...

              These are the new keychain USBs for large data :



              .... >1TB portable hard drives.

              Just buy enough to make 2 or 3 backups. Keep the backup separated and verify.

              This is labor intensive.

              Comment


              • #8
                Richard - not sure I understood your message. You seem to be suggesting these are USB flash solutions, but you actually linked to regular hard disks with USB interfaces. It is true that there are 1TB flash disks, but they are currently about $2000.

                Comment


                • #9
                  Yep. The greater than 1TB portable hard drive is the new floppy disk.

                  Comment


                  • #10
                    Ah right, got confused by the term "keychain" which made me think of flash disks. But yes, I agree, and judicious use of USB disks is a very cost-effective storage solution in my opinion. I am certainly never going back to tape backup!

                    Comment


                    • #11
                      The nice thing about USB disks is that if your sequencer dumps out 1TB of data per run, then cost in 2 x 1TB USB disks per run and you have a resilient backup solution. Given that a HiSeq run might be $10,000 of consumables, $200 more for the disks can be absorbed easily.

                      Comment


                      • #12
                        Contrast that with enterprise-grade solutions and you are talking more like $1000/TB plus all the administrative overhead of keeping these solutions going. Amazon S3 is another option but costs can mount up over time.

                        Comment


                        • #13
                          With a 2.5" hard drive as your file backup, storing the samples may almost end up taking more room than storing the data.

                          I agree with the purchase of 2 hard drives for each run. The university then has a visual idea of how their 10-year policy is working out, and the hard drives won't use any power when they're not plugged into anything (unlike a dedicated network backup, which will consume power on the off chance that you'll want a 5kb file from your 7-year-old sequencing data with latency of less than a second).

                          Comment


                          • #14
                            What about the CIFs and corresponding files? There are situations where there is need to externally re-basecall the data with bustard. With BCLs alone this is not possible.
                            Storing CIF plus corresponding files takes up to 3.5TB per HiSeq flowcell ...

                            IMHO USB disks are not suited for such amount of data (especially when you are running more than one machine).

                            Comment


                            • #15
                              I've heard that the HiSeq autmatically dismisses the image files, isn't that true?

                              Anyways, I don't think it makes sense to save both .bcl and .fastq files as they can easily be converted (at least from bcl to fastq, don't know the other way round)

                              Anf for 330 GB that could easily be saved on a hard disk (would be 3 runs per TB, right?)

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X