Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    I'd have to double check, but I think we're storing CIF files too.

    Comment


    • #17
      Originally posted by GenoMax View Post
      Devon: There is a subtle but significant difference. Google's nearline storage supposedly offers access with just a 3-5 second delay (so you could compute on it via Google compute, Edit: Not 100% certain about this). Glacier is truly meant for long term storage.
      True, but that's more of an added convenience for us than a requirement. Realistically, we'd keep things locally for 3-6 months and then off-load them elsewhere, so the odds of needing to recompute would be quite small and it might prove more convenient to just move the random mucked up dataset back. Obviously as things grow this might change.

      Comment


      • #18
        I am not sure why one would want to save the CIF files (perhaps only if the sample is irreplaceable). This may become a moot point as technology moves along.

        @Sven: Does illumina even allow saving CIF files for V4 chemistry runs?

        Comment


        • #19
          Originally posted by GenoMax View Post
          I am not sure why one would want to save the CIF files (perhaps only if the sample is irreplaceable). This may become a moot point as technology moves along.

          @Sven: Does illumina even allow saving CIF files for V4 chemistry runs?
          In the past we used the CIFs for re-basecalling single lanes. We don't do that anymore (there is no need to). It is just the definition of the term "rawdata", some of us are obligated to store the sequencing "rawdata". This defintion varies vastly ...

          v4 chemistry does not allow for saving CIFs using HCS; you can AFAIK tweak the config to do so. But it makes no sense in my eyes (thinking about OLB/RTA development) and is not recommended (supported) by Illumina. One should especially take care with v4 as there is much more data produced in the same time.

          But we haven't upgraded all HiSeqs :-)

          Comment


          • #20
            We store the library tubes at -20oC.
            Storing all "raw" data consequently is far more expensive than occasionally redo a HiSeq lane.
            We have re-run 2 years old libraries without any problem.

            Comment


            • #21
              That approach seems prone to problems for analyses that consider things like technical replicates, batch effects, cross-contamination, and basically anything involving imperfections in the sequencing process. It would be fine if sequencing was perfect and unbiased, and the platforms and chemistry stable and unchanging, but that's not really the case.

              Comment


              • #22
                Originally posted by Brian Bushnell View Post
                That approach seems prone to problems for analyses that consider things like technical replicates, batch effects, cross-contamination, and basically anything involving imperfections in the sequencing process. It would be fine if sequencing was perfect and unbiased, and the platforms and chemistry stable and unchanging, but that's not really the case.
                Ah Brian,
                I think you need to face that DNA, the natural RAWDATA storage form, is superior to your crummy digital methodologies. Step out from in front of your computer screen, head down to the lab and take a look at what the real meaning of "high tech" is. Nanotechnology! Pfah! DNA encodes information at a sub-nanometer resolution.

                From the earliest automated sanger machine days there were less processed storage forms for the instruments that could be used to clog up your hard drives for as many years as you might keep them. (How many of you have tried to save the initial TIFF image of an ABI377 gel?)

                Better to let the instruments use their brittle embedded systems to convert that massive data glob into something approaching a durable storage format. For Sanger sequencers that ended up being the .ab1 file. For Illumina sequencers -- fastq. Heave everything else into the dumpster.

                Okay, to be fair, I'm a hypocrite. I still have the autorads from all 100+ 35S sequencing gels that I ran back in the day. I have them labelled and indexed. But I don't see myself going back to re-read them, ever.

                Seriously though, have you actually seen technical replicate differences sufficient to swamp biological replicate differences? I mean in cases that were not just the result of loading errors like over clustering?

                --
                Phillip

                Comment


                • #23
                  Originally posted by pmiguel View Post
                  I think you need to face that DNA, the natural RAWDATA storage form, is superior to your crummy digital methodologies.
                  I don't touch that stuff; it's dirty. Data is only real once it's in a computer

                  As for how important these considerations are... hmmm, I don't know. I'm just tossing in something to worry about.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 08:47 AM
                  0 responses
                  13 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  60 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  60 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  54 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X