Seqanswers Leaderboard Ad

**dpryan** · 04-14-2015, 11:16 PM

I'd have to double check, but I think we're storing CIF files too.

**dpryan** · 04-14-2015, 11:29 PM

Originally posted by GenoMax View Post

Devon: There is a subtle but significant difference. Google's nearline storage supposedly offers access with just a 3-5 second delay (so you could compute on it via Google compute, Edit: Not 100% certain about this). Glacier is truly meant for long term storage.

True, but that's more of an added convenience for us than a requirement. Realistically, we'd keep things locally for 3-6 months and then off-load them elsewhere, so the odds of needing to recompute would be quite small and it might prove more convenient to just move the random mucked up dataset back. Obviously as things grow this might change.

**GenoMax** · 04-15-2015, 01:22 AM

I am not sure why one would want to save the CIF files (perhaps only if the sample is irreplaceable). This may become a moot point as technology moves along.

@Sven: Does illumina even allow saving CIF files for V4 chemistry runs?

**sklages** · 04-15-2015, 02:07 AM

Originally posted by GenoMax View Post

I am not sure why one would want to save the CIF files (perhaps only if the sample is irreplaceable). This may become a moot point as technology moves along.

@Sven: Does illumina even allow saving CIF files for V4 chemistry runs?

In the past we used the CIFs for re-basecalling single lanes. We don't do that anymore (there is no need to). It is just the definition of the term "rawdata", some of us are obligated to store the sequencing "rawdata". This defintion varies vastly ...

v4 chemistry does not allow for saving CIFs using HCS; you can AFAIK tweak the config to do so. But it makes no sense in my eyes (thinking about OLB/RTA development) and is not recommended (supported) by Illumina. One should especially take care with v4 as there is much more data produced in the same time.

But we haven't upgraded all HiSeqs :-)

**HeinKey** · 04-29-2015, 06:20 AM

We store the library tubes at -20oC.
Storing all "raw" data consequently is far more expensive than occasionally redo a HiSeq lane.
We have re-run 2 years old libraries without any problem.

**Brian Bushnell** · 04-29-2015, 09:30 AM

That approach seems prone to problems for analyses that consider things like technical replicates, batch effects, cross-contamination, and basically anything involving imperfections in the sequencing process. It would be fine if sequencing was perfect and unbiased, and the platforms and chemistry stable and unchanging, but that's not really the case.

**pmiguel** · 05-05-2015, 11:02 AM

Originally posted by Brian Bushnell View Post

That approach seems prone to problems for analyses that consider things like technical replicates, batch effects, cross-contamination, and basically anything involving imperfections in the sequencing process. It would be fine if sequencing was perfect and unbiased, and the platforms and chemistry stable and unchanging, but that's not really the case.

Ah Brian,
I think you need to face that DNA, the natural RAWDATA storage form, is superior to your crummy digital methodologies. Step out from in front of your computer screen, head down to the lab and take a look at what the real meaning of "high tech" is. Nanotechnology! Pfah! DNA encodes information at a sub-nanometer resolution.

From the earliest automated sanger machine days there were less processed storage forms for the instruments that could be used to clog up your hard drives for as many years as you might keep them. (How many of you have tried to save the initial TIFF image of an ABI377 gel?)

Better to let the instruments use their brittle embedded systems to convert that massive data glob into something approaching a durable storage format. For Sanger sequencers that ended up being the .ab1 file. For Illumina sequencers -- fastq. Heave everything else into the dumpster.

Okay, to be fair, I'm a hypocrite. I still have the autorads from all 100+ 35S sequencing gels that I ran back in the day. I have them labelled and indexed. But I don't see myself going back to re-read them, ever.

Seriously though, have you actually seen technical replicate differences sufficient to swamp biological replicate differences? I mean in cases that were not just the result of loading errors like over clustering?

--
Phillip

**Brian Bushnell** · 05-05-2015, 12:02 PM

Originally posted by pmiguel View Post

I think you need to face that DNA, the natural RAWDATA storage form, is superior to your crummy digital methodologies.

I don't touch that stuff; it's dirty. Data is only real once it's in a computer

As for how important these considerations are... hmmm, I don't know. I'm just tossing in something to worry about.

Topics	Statistics	Last Post
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, Yesterday, 06:57 AM	0 responses 11 views 0 likes	Last Post by seqadmin Yesterday, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 16 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM
A Close Examination at Probiotic-Related Bacteremia by seqadmin Started by seqadmin, 05-02-2024, 08:06 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-02-2024, 08:06 AM
Expanded Genetic Insights into Blood Pressure Regulation by seqadmin Started by seqadmin, 04-30-2024, 12:17 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-30-2024, 12:17 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News