SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
GeneProf - Next-Gen Analysis for Next-Gen Data florian Bioinformatics 0 01-30-2012 03:21 AM
Managing Your NGS Pipeline Workshop Pavolga Events / Conferences 0 04-21-2011 08:53 AM
next gen data NicoBxl General 1 12-19-2010 10:18 AM
Chromosome assembly with next-gen data? Linnea De novo discovery 0 08-30-2010 04:16 AM
From Sample to Results: Managing Illumina Data Workflow with FinchLab todd Events / Conferences 1 04-14-2008 10:14 AM

Reply
 
Thread Tools
Old 07-29-2010, 03:55 AM   #1
rdeborja
Member
 
Location: Toronto

Join Date: Aug 2008
Posts: 42
Default Managing next gen data, HELP!

With higher density runs, I'm curious to know how people are handling data. We have Illumina and AB instruments and handling the data off the instruments is becoming very cumbersome. We currently have 1.2PB of storage which is constantly being pruned for space since we're ramping up production.

We perform the analysis of Illumina runs from Intensities (I.e. Running Bustard from Intensities) and do not work from the on instrument Basecalls. No Images are copied over other than thumbnails for troubleshooting purposes. Prior to backing up our data we perform a cleanup (delete the on instrument Basecall directory, leave the export, sequence and Summary.htm related files in the Gerald directory).

For AB data, we keep only the csfasta and qual files and the analyzed data output. All other intermediate files are deleted prior to backup.

Any advice from others on handling all the data?
rdeborja is offline   Reply With Quote
Old 07-29-2010, 08:08 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Could you store CSFASTQ instead of CSFASTA and QUAL files? Pre-compression that should save a lot of disk space (in fact, even CSFASTA + CSFASTQ should still take less space than CSFASTA + QUAL).

The reason is simply QUAL needs 2 or 3 bytes per score (separator and one or two digits), while (colour space) FASTQ just needs 1 byte per score.

Assuming you are compressing the files anyway this may not make such a big difference, but it might be worth trying.
maubp is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:43 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO