Unconfigured Ad

**GenoMax** · 06-05-2014, 10:17 AM

Following is speculation.

For a while many people were under the impression (including me) that SRA@NCBI was closing down due to lack of funding. That is NOT the case. Apparently various NIH Institute Directors got together and decided that SRA was important and had to be kept going.

This fact has not been widely publicized (as opposed to the original closure that was). Perhaps this is reflected in the numbers you are seeing.

**Bukowski** · 06-05-2014, 10:46 AM

The following is also speculation. The drive to deposit data in publicly available archives isn't nearly as strong for NGS data as it was for e.g. microarrays. And there probably isn't as much of an appetite to be the worlds dumping ground for terabytes of poorly curated data.

See also: privacy concerns of having peoples genetic data splurged all over the internet.

**ShaunMahony** · 06-06-2014, 07:32 PM

Hi Jamie,

I had a look at "SRP*" entries in the following metadata file:
ftp://ftp-trace.ncbi.nlm.nih.gov/sra...Accessions.tab
Maybe this isn't the same as what you called "top-level" studies, but the SRA project entries should give an idea of distinct project uploads.

Broken down by "received" date, the counts I saw are as follows:
2008| 378
2009| 1129
2010| 3618
2011| 4872
2012| 7697
2013| 17142
2014| 8124

So, an acceleration in submissions in 2013 rather than a drop-off!

More speculation, but there could be a couple of things happening here:
- Are you sure that you had a recent update of the SQLite data dump from the SRAdb package?
- Do you know how often that SQLite file is updated by the SRAdb folks? Maybe you could look for the exact date of the last 2013 entry in your copy of the SQLite file?

Finally, I disagree with Bukowski's rationale above... there is just as much of a drive to deposit NGS data as there was for microarrays; you can't publish in most journals without submitting data to the SRA. And poorly curated this type of data may sometimes be, but it's often useful.

**Bukowski** · 06-07-2014, 01:30 PM

Originally posted by ShaunMahony View Post

Finally, I disagree with Bukowski's rationale above... there is just as much of a drive to deposit NGS data as there was for microarrays; you can't publish in most journals without submitting data to the SRA. And poorly curated this type of data may sometimes be, but it's often useful.

I was being a little bit facetious I must admit, but I do wonder how that rate of deposition corresponds with deployment of machines and how divergent they are.

I don't work in academia, but I did when microarrays were at their peak. Pretty much every paper I'm credited on with arrays is in GEO. None of my NGS papers are in the SRA - but I'm in clinical genomics, so it might be a reflection on the privacy issues - but it is most certainly possible to publish, in high-quality journals, without releasing NGS data.

**ShaunMahony** · 06-07-2014, 05:15 PM

Ah, I guess privacy issues do complicate things for clinical sequencing data. But are you telling me that you don't even submit variations to dbSNP or dbGap? I guess I should have clarified my statement to say it's not *supposed* to be possible to publish without submitting any described sequence data to public repositories. I know of very few journals that don't explicitly stipulate exactly this in their author guides. Whether they always enforce the rule is another story.

**JamieWizard** · 06-08-2014, 09:17 AM

SRA study numbers from 2013 - Bioconductor response

Hi everyone,

Thank you all for your thoughts. My initial thought based on another query was that a large proportion of the undated records could potentially be from 2013 (in light of the increasing production of NGS data inspite of it's potential closure a while back).

I posted the question to the Biocondutor forum and have just recieved this significant reply which I am sharing below:

MESSAGE BELOW FORWARDED FROM BIOCONDUCTOR FORUM:

Hi all,

Regarding missing studies by submission_date for 2013 and 2014 in the
SRAdb SQLite database, I did some investigation and found the reason.
The metadata in the SRAdb is mainly parsed from the XML files of the
SRA submissions and it is true with the submission table. But I see
quite some submission xml files don't have submission date, e.g.

ftp://ftp-trace.ncbi.nih.gov/sra/Sub...157/SRA157949/

SRA157949.experiment.xml
SRA157949.submission.xml

So it seem all the study and submission records are there, but some
submission records just don't submission date. I am looking into the
possibility of adding dates for those records.

Jamie, thanks for the finding and I will keep you updated.

Jack

Thanks again,
Jamie

**Bukowski** · 06-09-2014, 12:16 AM

Originally posted by ShaunMahony View Post

Ah, I guess privacy issues do complicate things for clinical sequencing data. But are you telling me that you don't even submit variations to dbSNP or dbGap?

That's an interesting point, I feel that is largely up to my collaborators, as they are the people who are effectively responsible for dissseminating the data. I am sure that it ends up in HGMD, but I don't know about dbSNP - really they should be in ClinVar.

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, Yesterday, 11:08 AM	0 responses 6 views 0 reactions	Last Post by SEQadmin2 Yesterday, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 11 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 19 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 53 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

SRA top level studies counts, why is 2013 so low?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News