Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Core Cluster Setup - Linux, Ubuntu, Rocks, Data Storage, BluArc quantrix Bioinformatics 16 10-05-2012 09:46 AM
Long Term Data Storage gendxdoc Bioinformatics 16 01-10-2012 12:45 AM
Data Storage after HiSeq Upgrade sklages Illumina/Solexa 9 06-08-2011 07:48 AM
Huge NGS data storage and transferring himwo Bioinformatics 2 03-24-2011 01:32 AM
Data Storage Space NGS analyst Bioinformatics 1 01-10-2011 08:22 AM

Thread Tools
Old 11-27-2010, 03:29 PM   #1
Location: Toronto

Join Date: Aug 2008
Posts: 42
Default Data storage

In our centre, we have been completely overwhelmed at the amout of data that is kept on spinning disk. I'm curious to find out what others are doing with their data storage.

For our GAIIx runs, we no longer analyze from images and keep thumbnails for troubleshooting. We perform our analysis from intensities on both the GAIIx's and the HiSeq's. We use Illumina's pipeline to generate QC results and sequence (fastq files). It produces a considerable number of intermediate files. I'm curious to know what people keep and what is discarded. Also, what do people backup onto tape for safekeeping?
rdeborja is offline   Reply With Quote
Old 11-27-2010, 06:21 PM   #2
Senior Member
Location: 4117'49"N / 24'42"E

Join Date: Oct 2008
Posts: 323

Keep the whole bustard output until you have validated the data. After that, just keep the BAM file with all the alignments (and the unmapped reads also).
drio is offline   Reply With Quote
Old 11-28-2010, 01:46 AM   #3
Simon Andrews
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871

We keep the whole of the run folder, minus the images. For quite a while now Illumina have put their temp files into folders called Temp, so it's easy enough to get rid of these when moving the run folder to the permanent backup.

tar cf - /run_folder --exclude=Temp --exclude=Images | (cd /backup && tar xf -)

Although we only really need the fastq files for the backup we're still keeping intensities until we're completely sure that no journal is going to require these when publishing results. Frankly a statement from the sequence archives and the journals that only BAM/SAM/FastQ files will be required would save ridiculous amounts of money at sequencing facilities around the world.
simonandrews is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 05:29 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO