Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • JamieWizard
    Member
    • Sep 2013
    • 10

    SRA top level studies counts, why is 2013 so low?

    Hi all,

    I've been looking at the Sequence Read Archive (SRA) short read meta-data using the Bioconductor extracted SQLite data. (available from
    http://www.bioconductor.org/packages...tml/SRAdb.html)

    One thing that is quite puzzling is out of All of the top-level studies why there are so few for 2013?

    SQL Queries for the bioconductor data extracted from the SRA as of December 2013 show the following top-level study counts: -

    2005|64
    2006|38
    2007|94
    2008|269
    2009|893
    2010|2631
    2011|4077
    2012|5208
    2013|724

    One can see the increasing trend then fall-off from 2012. Wondering if anyone has any ideas why this might be?

    Best regards,
    Jamie
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    Following is speculation.

    For a while many people were under the impression (including me) that SRA@NCBI was closing down due to lack of funding. That is NOT the case. Apparently various NIH Institute Directors got together and decided that SRA was important and had to be kept going.

    This fact has not been widely publicized (as opposed to the original closure that was). Perhaps this is reflected in the numbers you are seeing.

    Comment

    • Bukowski
      Senior Member
      • Jan 2010
      • 388

      #3
      The following is also speculation. The drive to deposit data in publicly available archives isn't nearly as strong for NGS data as it was for e.g. microarrays. And there probably isn't as much of an appetite to be the worlds dumping ground for terabytes of poorly curated data.

      See also: privacy concerns of having peoples genetic data splurged all over the internet.

      Comment

      • ShaunMahony
        Member
        • Apr 2008
        • 27

        #4
        Hi Jamie,

        I had a look at "SRP*" entries in the following metadata file:
        ftp://ftp-trace.ncbi.nlm.nih.gov/sra...Accessions.tab
        Maybe this isn't the same as what you called "top-level" studies, but the SRA project entries should give an idea of distinct project uploads.

        Broken down by "received" date, the counts I saw are as follows:
        2008| 378
        2009| 1129
        2010| 3618
        2011| 4872
        2012| 7697
        2013| 17142
        2014| 8124

        So, an acceleration in submissions in 2013 rather than a drop-off!

        More speculation, but there could be a couple of things happening here:
        - Are you sure that you had a recent update of the SQLite data dump from the SRAdb package?
        - Do you know how often that SQLite file is updated by the SRAdb folks? Maybe you could look for the exact date of the last 2013 entry in your copy of the SQLite file?


        Finally, I disagree with Bukowski's rationale above... there is just as much of a drive to deposit NGS data as there was for microarrays; you can't publish in most journals without submitting data to the SRA. And poorly curated this type of data may sometimes be, but it's often useful.

        Comment

        • Bukowski
          Senior Member
          • Jan 2010
          • 388

          #5
          Originally posted by ShaunMahony View Post
          Finally, I disagree with Bukowski's rationale above... there is just as much of a drive to deposit NGS data as there was for microarrays; you can't publish in most journals without submitting data to the SRA. And poorly curated this type of data may sometimes be, but it's often useful.
          I was being a little bit facetious I must admit, but I do wonder how that rate of deposition corresponds with deployment of machines and how divergent they are.

          I don't work in academia, but I did when microarrays were at their peak. Pretty much every paper I'm credited on with arrays is in GEO. None of my NGS papers are in the SRA - but I'm in clinical genomics, so it might be a reflection on the privacy issues - but it is most certainly possible to publish, in high-quality journals, without releasing NGS data.

          Comment

          • ShaunMahony
            Member
            • Apr 2008
            • 27

            #6
            Ah, I guess privacy issues do complicate things for clinical sequencing data. But are you telling me that you don't even submit variations to dbSNP or dbGap? I guess I should have clarified my statement to say it's not *supposed* to be possible to publish without submitting any described sequence data to public repositories. I know of very few journals that don't explicitly stipulate exactly this in their author guides. Whether they always enforce the rule is another story.

            Comment

            • JamieWizard
              Member
              • Sep 2013
              • 10

              #7
              SRA study numbers from 2013 - Bioconductor response

              Hi everyone,

              Thank you all for your thoughts. My initial thought based on another query was that a large proportion of the undated records could potentially be from 2013 (in light of the increasing production of NGS data inspite of it's potential closure a while back).

              I posted the question to the Biocondutor forum and have just recieved this significant reply which I am sharing below:


              MESSAGE BELOW FORWARDED FROM BIOCONDUCTOR FORUM:

              Hi all,

              Regarding missing studies by submission_date for 2013 and 2014 in the
              SRAdb SQLite database, I did some investigation and found the reason.
              The metadata in the SRAdb is mainly parsed from the XML files of the
              SRA submissions and it is true with the submission table. But I see
              quite some submission xml files don't have submission date, e.g.

              ftp://ftp-trace.ncbi.nih.gov/sra/Sub...157/SRA157949/

              SRA157949.experiment.xml
              SRA157949.submission.xml

              So it seem all the study and submission records are there, but some
              submission records just don't submission date. I am looking into the
              possibility of adding dates for those records.

              Jamie, thanks for the finding and I will keep you updated.

              Jack


              Thanks again,
              Jamie

              Comment

              • Bukowski
                Senior Member
                • Jan 2010
                • 388

                #8
                Originally posted by ShaunMahony View Post
                Ah, I guess privacy issues do complicate things for clinical sequencing data. But are you telling me that you don't even submit variations to dbSNP or dbGap?
                That's an interesting point, I feel that is largely up to my collaborators, as they are the people who are effectively responsible for dissseminating the data. I am sure that it ends up in HGMD, but I don't know about dbSNP - really they should be in ClinVar.

                Comment

                Latest Articles

                Collapse

                • GATTACAT
                  Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by GATTACAT
                  Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                  07-01-2026, 11:43 AM
                • SEQadmin2
                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by SEQadmin2


                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                  Here are nine questions we think about, in roughly the order they matter, before...
                  06-18-2026, 07:11 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, Yesterday, 11:08 AM
                0 responses
                6 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-30-2026, 05:37 AM
                0 responses
                11 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-26-2026, 11:10 AM
                0 responses
                19 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-17-2026, 06:09 AM
                0 responses
                53 views
                0 reactions
                Last Post SEQadmin2  
                Working...