Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data retention

    Hi Folks -

    How about a survey to see what data people retain from sequencing experiments? We have completed ~ 10 runs on an Illumina GA I, and have stored complete raw data (images, raw intensities, etc... the complete run folder), on external 1TB disks. I don't think everyone stores images long term. We may not be able to keep it up forever.

    Anyone know if it is possible to retain images using GA II?

  • #2
    It's possible to retain images on GA2s of course, but they are a lot bigger... In general it's reasonable to retain the raw intensity and noise files as reprocessing those with new basecallers may be of interest.

    If look at the data retained by the NCBI in the short read archive they are currently storing Raw intensity and noise files, processed intensities, the 4 quality scores and the fastq files (the SRFs also have the settings used to generate the data I believe). I think the short read archive only contains PF (purity filtered data) at the moment.

    Personally I think that's overkill. I'd store the raw intensities (PF and non-PF), 4 quality scores and a basecall. Bare in mind that it is possible to regenerate everything from a complete set of raw intensities.

    Comment


    • #3
      Storing the "raw data" as DNA in the freezer is likely going to be a more cost-effective option...

      What would you expect or hope to be able to achive by reanalysing the images?

      Comment


      • #4
        Originally posted by Chipper View Post
        What would you expect or hope to be able to achive by reanalysing the images?
        My thought was that future improvements in the image segmentation algorithm or intensity extraction could give you different results later on. For example, improvements might be better able to discern individual clusters when density is very high.

        In our microarray experiments, we always save raw images rather than just intensities, since we see variation whenever you do gridding or intensity extraction.

        You're right - saving sample to re-run later is a logical approach. Deleting what we see as raw data may be simply be a mental hurdle to get over.

        Comment


        • #5
          My group has archived all bzip2'd image files to 2Tb USB disks. Currently they are £200 from western digital. Seemed cheap in comparison to reagents/lab time.

          Partly, we also thought better base calling algorithms than Bustard are already in development. e.g altacyclic, others. Can someone perhaps start a thread on this?

          dvh
          Last edited by dvh; 09-16-2008, 02:23 PM.

          Comment


          • #6
            Originally posted by Chipper View Post
            What would you expect or hope to be able to achive by reanalysing the images?
            Well, if I understand correctly, the new pipeline that's on the way is supposed to increase the data output by 15-30% just through image analysis improvements alone. There's a lot of room for improvement in that area, apparently.

            That being said, we're keeping the images on our server only until we have no more room, then they'll be deleted as space is required. Individuals can keep the raw data on external disks if they want it.

            Scott.

            Comment


            • #7
              Originally posted by dvh View Post
              My group has archived all bzip2'd image files to 2Tb USB disks. Currently they are £200 from western digital. Seemed cheap in comparison to reagents/lab time.

              Partly, we also thought better base calling algorithms than Bustard are already in development. e.g altacyclic, others. Can someone perhaps start a thread on this?

              dvh
              With GA2 and read lengths approaching 100bp paired end this probably becomes infeasible. Storing the raw intensity I think gives you the biggest bang for your storage buck. All the new basecallers work from raw intensities, not images.

              It's true there's a lot to be gained back using improved image analysis, perhaps 10 or 20%, it's all a trade off.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              25 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              24 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Working...
              X