SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
SRA - SRR*.lite.sra adrian Bioinformatics 2 03-19-2012 09:43 AM
RNA-Seq: PASSion: A Pattern Growth Algorithm Based Pipeline for Splice Junction Detec Newsbot! Literature Watch 0 01-06-2012 04:40 AM
Biomatters Opens Two US Offices To Support Geneious Customer Growth Geneious Vendor Forum 0 09-12-2011 11:48 AM
RNA-Seq: Global Transcriptome Changes Underlying Colony Growth in the Opportunistic H Newsbot! Literature Watch 0 07-05-2011 02:00 AM
error rate der_eiskern Illumina/Solexa 0 12-11-2009 02:51 PM

Reply
 
Thread Tools
Old 09-14-2011, 04:43 AM   #1
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default Why is the growth rate of the SRA decreasing?

Growth of the short read archive at EMBL appears to be plateauing:



That is the doubling time is trending upwards:



This is well below the doubling time for raw megabases/$ -- which is around 6 months.

Is some other archive for raw data being used? Or is the raw data simply not being submitted to archives any longer?

--
Phillip
pmiguel is offline   Reply With Quote
Old 09-14-2011, 04:52 AM   #2
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 264
Default

the scale is logarithmic
NicoBxl is offline   Reply With Quote
Old 09-14-2011, 05:27 AM   #3
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

People are hitting ISP data caps trying to upload data
GenoMax is offline   Reply With Quote
Old 09-14-2011, 06:25 AM   #4
mgogol
Senior Member
 
Location: Kansas City

Join Date: Mar 2008
Posts: 197
Default

I think people are slowing down a little on generating data after they realized how much it takes to analyze it.

Or maybe just not sharing.
mgogol is offline   Reply With Quote
Old 09-14-2011, 10:11 AM   #5
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Quote:
Originally Posted by NicoBxl View Post
the scale is logarithmic
Yeah, I know. But I would expect the doubling time to be similar to the doubling time for megabases/$--currently about 6 months. Instead is appears to be at 14 months and is trending upwards.

--
Phillip
pmiguel is offline   Reply With Quote
Old 09-15-2011, 01:02 AM   #6
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by GenoMax View Post
People are hitting ISP data caps trying to upload data
Or their Institute's bandwidth isn't up to it

Some of our local sequencing providers can submit direct to the ENA/SRR on your behalf - the only tricky bit is providing the metadata.
maubp is offline   Reply With Quote
Old 09-15-2011, 04:57 AM   #7
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

Perhaps people are realizing there isn't sufficient value in archiving every scrap of raw sequence data produced to justify the cost. I think there is an argument to made that as the cost for each Gbp of sequence decreases so does the value. Not long ago when producing even a Mbp of sequence meant a substantial investment in both dollars and person hours you made sure that every bp of DNA you sequenced was meaningful and to protect that investment by having your data safely stored for posterity. Now one can produce hundreds of Gbp for orders of magnitude less effort and money so researchers are a somewhat less choosey about what and how much they sequence.

Let's be honest, how much raw sequence is ever downloaded from the ENA or SRA for research purposes. I agree with the NCBI's current stance on submission of raw sequence to the SRA. They will accept submissions of raw sequence that are directly reported on in a publication or that correlate to an analyzed data set in some other repository at NCBI (e.g. GEO, Genome, etc.)
kmcarr is offline   Reply With Quote
Old 09-15-2011, 11:34 AM   #8
james hadfield
Moderator
Cambridge, UK
Community Forum
 
Location: Cambridge, UK

Join Date: Feb 2008
Posts: 221
Default

We have been sequencing like this for three to four years (see the jump in 2008) and thats about as long as most PhDs and post-docs work on a project before moving on. Maybe everyone is enjoying a long summer after a crazy time in the lab and before writing all this data up!
james hadfield is offline   Reply With Quote
Old 09-15-2011, 07:21 PM   #9
srasdk
Member
 
Location: Maryland, USA

Join Date: Jun 2011
Posts: 19
Default

A growing portion of sequencing capacity is occupied by human disease studies(cancer, diabetes,etc..) and private medical/pharma sequencing. The former is not exchanged between archives due to differences in privacy laws, the latter stays private.
srasdk is offline   Reply With Quote
Old 09-16-2011, 11:03 AM   #10
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

"differences in privacy laws".

How come there's not much public cancer data sets?

Are there laws preventing people from making their genome public? I imagine the motivation to help others suffering from a disease that is killing them might be pretty strong.

If ethics is the problem, perhaps the ethics needs to be over-hauled. A thousand eyeballs looking at some of the problems in cancer might bring a lot of solutions, particularly if patients are willing to let their genomes out.
Richard Finney is offline   Reply With Quote
Old 09-16-2011, 02:25 PM   #11
srasdk
Member
 
Location: Maryland, USA

Join Date: Jun 2011
Posts: 19
Default

A researcher can get access to cancer data through an application process, where he/she is effectively promising not to use it for non-consented research. The data is public, but with concent-based limitations.
What I was pointing out is that the data is not exchanged between archives due to different application processes which is due differences in privacy laws. So ENA does not count NCBI cancer data and vise versa. As a result, it is hard to calculate how much data is currently produced and archived.
srasdk is offline   Reply With Quote
Old 09-28-2011, 01:50 AM   #12
damiankao
Member
 
Location: UK

Join Date: Jan 2010
Posts: 49
Default

People just aren't submitting to the SRA because its a pain in the ass honestly. Sequences are being generated in such huge volumes and speed, I think it's hard for users to keep up with submissions.
damiankao is offline   Reply With Quote
Old 09-28-2011, 09:10 AM   #13
samanta
Senior Member
 
Location: Seattle

Join Date: Feb 2010
Posts: 109
Default

We saw the same trend (slowing down of SRA growth) -
http://www.homolog.us/CI/index.php/charts/growth_sra

There are three possibilities -

i) The exponential spike was due to US stimulus spending. Now we are seeing Tea Party decline.

ii) SRA scared people about shutting down early last year, and that may have forced some to change submission style.

iii) Everyone has too much data and they are down to analysis.
__________________
http://homolog.us
samanta is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:02 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO