SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics
Similar Threads
Thread Thread Starter Forum Replies Last Post
ChIP-Seq: Enabling Data Analysis on High-Throughput Data in Large Data Depository Usi Newsbot! Literature Watch 1 04-18-2018 09:50 PM
[NGS - analysis of gene expression data] Machine Learning + RNAseq data Chuckytah Bioinformatics 7 03-05-2012 03:16 AM
sff_extract: combining data from 454 Flx and Titanium data sets agroster Bioinformatics 7 01-14-2010 10:19 AM
Data retention policy for Solexa bioinfosm Illumina/Solexa 1 10-17-2008 02:25 AM
ZOOM released (supporting both Illumina data and ABI SOLiD data) spirit Bioinformatics 2 08-21-2008 06:48 AM

Reply
 
Thread Tools
Old 09-16-2008, 08:23 AM   #1
dsturgill
Junior Member
 
Location: Washington, DC

Join Date: May 2008
Posts: 4
Default Data retention

Hi Folks -

How about a survey to see what data people retain from sequencing experiments? We have completed ~ 10 runs on an Illumina GA I, and have stored complete raw data (images, raw intensities, etc... the complete run folder), on external 1TB disks. I don't think everyone stores images long term. We may not be able to keep it up forever.

Anyone know if it is possible to retain images using GA II?
dsturgill is offline   Reply With Quote
Old 09-16-2008, 11:40 AM   #2
new300
Member
 
Location: northern hemisphere

Join Date: Mar 2008
Posts: 50
Default

It's possible to retain images on GA2s of course, but they are a lot bigger... In general it's reasonable to retain the raw intensity and noise files as reprocessing those with new basecallers may be of interest.

If look at the data retained by the NCBI in the short read archive they are currently storing Raw intensity and noise files, processed intensities, the 4 quality scores and the fastq files (the SRFs also have the settings used to generate the data I believe). I think the short read archive only contains PF (purity filtered data) at the moment.

Personally I think that's overkill. I'd store the raw intensities (PF and non-PF), 4 quality scores and a basecall. Bare in mind that it is possible to regenerate everything from a complete set of raw intensities.
new300 is offline   Reply With Quote
Old 09-16-2008, 12:58 PM   #3
Chipper
Senior Member
 
Location: Sweden

Join Date: Mar 2008
Posts: 324
Default

Storing the "raw data" as DNA in the freezer is likely going to be a more cost-effective option...

What would you expect or hope to be able to achive by reanalysing the images?
Chipper is offline   Reply With Quote
Old 09-16-2008, 01:14 PM   #4
dsturgill
Junior Member
 
Location: Washington, DC

Join Date: May 2008
Posts: 4
Default

Quote:
Originally Posted by Chipper View Post
What would you expect or hope to be able to achive by reanalysing the images?
My thought was that future improvements in the image segmentation algorithm or intensity extraction could give you different results later on. For example, improvements might be better able to discern individual clusters when density is very high.

In our microarray experiments, we always save raw images rather than just intensities, since we see variation whenever you do gridding or intensity extraction.

You're right - saving sample to re-run later is a logical approach. Deleting what we see as raw data may be simply be a mental hurdle to get over.
dsturgill is offline   Reply With Quote
Old 09-16-2008, 01:58 PM   #5
dvh
Member
 
Location: london, uk

Join Date: Jul 2008
Posts: 35
Default

My group has archived all bzip2'd image files to 2Tb USB disks. Currently they are 200 from western digital. Seemed cheap in comparison to reagents/lab time.

Partly, we also thought better base calling algorithms than Bustard are already in development. e.g altacyclic, others. Can someone perhaps start a thread on this?

dvh

Last edited by dvh; 09-16-2008 at 02:23 PM.
dvh is offline   Reply With Quote
Old 09-16-2008, 03:16 PM   #6
ScottC
Senior Member
 
Location: Monash University, Melbourne, Australia.

Join Date: Jan 2008
Posts: 246
Default

Quote:
Originally Posted by Chipper View Post
What would you expect or hope to be able to achive by reanalysing the images?
Well, if I understand correctly, the new pipeline that's on the way is supposed to increase the data output by 15-30% just through image analysis improvements alone. There's a lot of room for improvement in that area, apparently.

That being said, we're keeping the images on our server only until we have no more room, then they'll be deleted as space is required. Individuals can keep the raw data on external disks if they want it.

Scott.
ScottC is offline   Reply With Quote
Old 09-16-2008, 11:18 PM   #7
new300
Member
 
Location: northern hemisphere

Join Date: Mar 2008
Posts: 50
Default

Quote:
Originally Posted by dvh View Post
My group has archived all bzip2'd image files to 2Tb USB disks. Currently they are 200 from western digital. Seemed cheap in comparison to reagents/lab time.

Partly, we also thought better base calling algorithms than Bustard are already in development. e.g altacyclic, others. Can someone perhaps start a thread on this?

dvh
With GA2 and read lengths approaching 100bp paired end this probably becomes infeasible. Storing the raw intensity I think gives you the biggest bang for your storage buck. All the new basecallers work from raw intensities, not images.

It's true there's a lot to be gained back using improved image analysis, perhaps 10 or 20%, it's all a trade off.
new300 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



All times are GMT -8. The time now is 07:45 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO