Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ersgupta
    Member
    • Jun 2011
    • 26

    SRA study metadata download

    Hi Guys,

    I am trying to download the metadata of all the studies submitted in SRA but I am not able to find a complete list. Can anybody help me out with this.

    I want metadata(mainly abstract and description) (preferably in xml format) of all the studies/samples in SRA till date.

    Thnx in advance.
  • laura
    Senior Member
    • Sep 2008
    • 151

    #2
    You might try

    ftp://ftp-trace.ncbi.nlm.nih.gov/sra...Accessions.tab

    Comment

    • ersgupta
      Member
      • Jun 2011
      • 26

      #3
      Thanx for the reply.

      I have checked it but it doesnt contain the information I want i.e. study abstract and description. It only contains IDs, which I already have.

      I am trying few other things lets hope if it works.

      Comment

      • vadim
        Member
        • Sep 2009
        • 37

        #4
        Grab the full XML dump and parse it:
        ftp://ftp-trace.ncbi.nlm.nih.gov/sra...0111101.tar.gz

        Comment

        • ersgupta
          Member
          • Jun 2011
          • 26

          #5
          @vadim... thnx a lot...

          This is wht I was looking for. But in this also one issue is that not every SRA id is having study.xml, but its ok. I can live with that.

          Comment

          • vadim
            Member
            • Sep 2009
            • 37

            #6
            What do you mean by SRA id? Each SRA run should be associated with a study through SRA experiment. The XML schema might be useful:

            or


            Also have a look here for a complete XML dump (including EBI SRA):

            Comment

            • ersgupta
              Member
              • Jun 2011
              • 26

              #7
              Both link to same page.

              I understand SRA run, they have ID as SRR.

              For eg: take this case.



              Out of these only SRA028059 folder in the SRA Metadata is having *.study.xml.

              SRP = Study
              SRX = Experiment
              SRS = Sample
              SRR = Run

              But what basically is SRA for?? I am confused here.

              Comment

              • ersgupta
                Member
                • Jun 2011
                • 26

                #8
                This is a reply I got from a person in SRA.

                The SRA number acts as a collector for the information. This means that when a center submits metadata or data they create a submission (SRAXXXXXX), but the data or metadata in the submission links to another submission.

                This is ok with me, but I don't understand the fact that why have separate SRA ids for same study, even if one has to submit more samples to the same study at a later stage.

                Comment

                • vadim
                  Member
                  • Sep 2009
                  • 37

                  #9
                  SRA* accessions are NCBI submission accessions, similarly ERA* accessions are EBI submission accessions and DRA* are DDBJ submission accessions.
                  Sometimes study is submitted before the run data, but since metadata dumps are organized by submission accession in such cases run and study metadata end up in separate folders. To get proper association use the livelists:
                  NCBI: ftp://ftp-trace.ncbi.nlm.nih.gov/sra...Accessions.tab
                  EBI: ftp://ftp.sra.ebi.ac.uk/meta/list/livelist.gz

                  As for the submissions you are asking, it appears that the run was submitted in SRA028225, experiment in SRA028192 and the study in SRA028059.

                  Comment

                  • ersgupta
                    Member
                    • Jun 2011
                    • 26

                    #10
                    yeah... got it... That *.tab file is very useful indeed... thanx...

                    Comment

                    • ersgupta
                      Member
                      • Jun 2011
                      • 26

                      #11
                      I guess I can conclude this:

                      "NCBI-SRA doesn't have a single ID using which we can get everything related to it i.e. study, run, experiment, sample", the only way to go about it is use the SRA_Accessions.tab, using study_id *RP* get the *RA*(s) then using *RA*(s) get the *RR*(s)... there is no direct way of getting *RR* from *RP*."

                      RA = SRA Accessions
                      RR = Run
                      RP = Study

                      Comment

                      • nickloman
                        Senior Member
                        • Jul 2009
                        • 355

                        #12
                        Maybe not relevant for this question but am finding the DNAnexus SRA interface much nicer than the NCBI's: http://sra.dnanexus.com/

                        Comment

                        • ersgupta
                          Member
                          • Jun 2011
                          • 26

                          #13
                          Originally posted by nickloman View Post
                          Maybe not relevant for this question but am finding the DNAnexus SRA interface much nicer than the NCBI's: http://sra.dnanexus.com/
                          @nickloman...

                          ok... will have a look at it as well...

                          Comment

                          • ersgupta
                            Member
                            • Jun 2011
                            • 26

                            #14
                            can anybody explain me dis...

                            many SRA(accession) ids having same SRP(study) id this is ok... as justified by an answer above...
                            but many SRP(study) ids having same SRA(accession) id...?

                            Comment

                            • sdavis
                              Member
                              • Jan 2010
                              • 14

                              #15
                              You might take a look at:

                              The Sequence Read Archive (SRA) is the largest public repository of sequencing data from the next generation of sequencing platforms including Roche 454 GS System, Illumina Genome Analyzer, Applied Biosystems SOLiD System, Helicos Heliscope, and others. However, finding data of interest can be challenging using current tools. SRAdb is an attempt to make access to the metadata associated with submission, study, sample, experiment and run much more feasible. This is accomplished by parsing all the NCBI SRA metadata into a SQLite database that can be stored and queried locally. Fulltext search in the package make querying metadata very flexible and powerful. fastq and sra files can be downloaded for doing alignment locally. Beside ftp protocol, the SRAdb has funcitons supporting fastp protocol (ascp from Aspera Connect) for faster downloading large data files over long distance. The SQLite database is updated regularly as new data is added to SRA and can be downloaded at will for the most up-to-date metadata.


                              While this is an R/Bioconductor package, the underlying data are stored in a SQLite database that can be downloaded separately and used directly or from any language with a SQLite driver (most languages). What is done to create the database is to download all the SRA XML files containing metadata, parse those files, and then load them into a relational database. This makes bulk operations on the data easier and more flexible since SQL can be used. Some full-text searching capabilities are also included since SQLite supports that in later versions.
                              Last edited by sdavis; 12-02-2011, 10:21 AM.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Pathogen Surveillance with Advanced Genomic Tools
                                by seqadmin




                                The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                                03-24-2025, 11:48 AM
                              • seqadmin
                                New Genomics Tools and Methods Shared at AGBT 2025
                                by seqadmin


                                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                The Headliner
                                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                03-03-2025, 01:39 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 10:17 AM
                              0 responses
                              7 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-20-2025, 05:03 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-19-2025, 07:27 AM
                              0 responses
                              59 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-18-2025, 12:50 PM
                              0 responses
                              50 views
                              0 reactions
                              Last Post seqadmin  
                              Working...