View Single Post
Old 07-16-2015, 06:53 AM   #6
dgaston
Junior Member
 
Location: Halifax, NS Canada

Join Date: Dec 2012
Posts: 4
Default

As well, you should consider what data you actually need to keep. If you set up your analyses well, with an actual software-defined pipeline of some sort, which you version (along with all software components used in the pipeline) then you can recreate downstream files. Meaning you generally keep/archive:

1)Raw input data (this could be BCL files, but you may reasonably opt to just keep the de-multiplexed FASTQ files). This is generally quite a bit smaller than the complete run output from a MiSeq.

2)Detailed documentation of the workflow that was done on the data. Yous separately archive all your software, pipelines, databases, etc (in a versioned manner)

3)Your final results (and even this isn't absolutely required, particularly for archiving)

You should structure everything so you can recreate your analysis and all downstream results files, exactly, at any time. Granted this is actually harder since you are using commercial software and have little control over version changes an updates often, in terms of keeping around old copies. But you still want to strive towards reproducibility.

Otherwise everything you have set up seems on the right track. The exact specs of your workstation depend on the analyses you will do within CLC workbench. I would go with at least a few TB of RAIDed storage on the workstation itself. If you haven't already bought it, Qiagen/CLCBio has a collaboration with PSSC labs. PSSC builds a workstation that is configurable themselves, but you can also order the whole thing as a turn-key solution from CLC Bio still I believe.
dgaston is offline   Reply With Quote