Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SRA top level studies counts, why is 2013 so low?

    Hi all,

    I've been looking at the Sequence Read Archive (SRA) short read meta-data using the Bioconductor extracted SQLite data. (available from
    http://www.bioconductor.org/packages...tml/SRAdb.html)

    One thing that is quite puzzling is out of All of the top-level studies why there are so few for 2013?

    SQL Queries for the bioconductor data extracted from the SRA as of December 2013 show the following top-level study counts: -

    2005|64
    2006|38
    2007|94
    2008|269
    2009|893
    2010|2631
    2011|4077
    2012|5208
    2013|724

    One can see the increasing trend then fall-off from 2012. Wondering if anyone has any ideas why this might be?

    Best regards,
    Jamie

  • #2
    Following is speculation.

    For a while many people were under the impression (including me) that SRA@NCBI was closing down due to lack of funding. That is NOT the case. Apparently various NIH Institute Directors got together and decided that SRA was important and had to be kept going.

    This fact has not been widely publicized (as opposed to the original closure that was). Perhaps this is reflected in the numbers you are seeing.

    Comment


    • #3
      The following is also speculation. The drive to deposit data in publicly available archives isn't nearly as strong for NGS data as it was for e.g. microarrays. And there probably isn't as much of an appetite to be the worlds dumping ground for terabytes of poorly curated data.

      See also: privacy concerns of having peoples genetic data splurged all over the internet.

      Comment


      • #4
        Hi Jamie,

        I had a look at "SRP*" entries in the following metadata file:
        ftp://ftp-trace.ncbi.nlm.nih.gov/sra...Accessions.tab
        Maybe this isn't the same as what you called "top-level" studies, but the SRA project entries should give an idea of distinct project uploads.

        Broken down by "received" date, the counts I saw are as follows:
        2008| 378
        2009| 1129
        2010| 3618
        2011| 4872
        2012| 7697
        2013| 17142
        2014| 8124

        So, an acceleration in submissions in 2013 rather than a drop-off!

        More speculation, but there could be a couple of things happening here:
        - Are you sure that you had a recent update of the SQLite data dump from the SRAdb package?
        - Do you know how often that SQLite file is updated by the SRAdb folks? Maybe you could look for the exact date of the last 2013 entry in your copy of the SQLite file?


        Finally, I disagree with Bukowski's rationale above... there is just as much of a drive to deposit NGS data as there was for microarrays; you can't publish in most journals without submitting data to the SRA. And poorly curated this type of data may sometimes be, but it's often useful.

        Comment


        • #5
          Originally posted by ShaunMahony View Post
          Finally, I disagree with Bukowski's rationale above... there is just as much of a drive to deposit NGS data as there was for microarrays; you can't publish in most journals without submitting data to the SRA. And poorly curated this type of data may sometimes be, but it's often useful.
          I was being a little bit facetious I must admit, but I do wonder how that rate of deposition corresponds with deployment of machines and how divergent they are.

          I don't work in academia, but I did when microarrays were at their peak. Pretty much every paper I'm credited on with arrays is in GEO. None of my NGS papers are in the SRA - but I'm in clinical genomics, so it might be a reflection on the privacy issues - but it is most certainly possible to publish, in high-quality journals, without releasing NGS data.

          Comment


          • #6
            Ah, I guess privacy issues do complicate things for clinical sequencing data. But are you telling me that you don't even submit variations to dbSNP or dbGap? I guess I should have clarified my statement to say it's not *supposed* to be possible to publish without submitting any described sequence data to public repositories. I know of very few journals that don't explicitly stipulate exactly this in their author guides. Whether they always enforce the rule is another story.

            Comment


            • #7
              SRA study numbers from 2013 - Bioconductor response

              Hi everyone,

              Thank you all for your thoughts. My initial thought based on another query was that a large proportion of the undated records could potentially be from 2013 (in light of the increasing production of NGS data inspite of it's potential closure a while back).

              I posted the question to the Biocondutor forum and have just recieved this significant reply which I am sharing below:


              MESSAGE BELOW FORWARDED FROM BIOCONDUCTOR FORUM:

              Hi all,

              Regarding missing studies by submission_date for 2013 and 2014 in the
              SRAdb SQLite database, I did some investigation and found the reason.
              The metadata in the SRAdb is mainly parsed from the XML files of the
              SRA submissions and it is true with the submission table. But I see
              quite some submission xml files don't have submission date, e.g.

              ftp://ftp-trace.ncbi.nih.gov/sra/Sub...157/SRA157949/

              SRA157949.experiment.xml
              SRA157949.submission.xml

              So it seem all the study and submission records are there, but some
              submission records just don't submission date. I am looking into the
              possibility of adding dates for those records.

              Jamie, thanks for the finding and I will keep you updated.

              Jack


              Thanks again,
              Jamie

              Comment


              • #8
                Originally posted by ShaunMahony View Post
                Ah, I guess privacy issues do complicate things for clinical sequencing data. But are you telling me that you don't even submit variations to dbSNP or dbGap?
                That's an interesting point, I feel that is largely up to my collaborators, as they are the people who are effectively responsible for dissseminating the data. I am sure that it ends up in HGMD, but I don't know about dbSNP - really they should be in ClinVar.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                51 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                67 views
                0 likes
                Last Post seqadmin  
                Working...
                X