Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SRA study metadata download

    Hi Guys,

    I am trying to download the metadata of all the studies submitted in SRA but I am not able to find a complete list. Can anybody help me out with this.

    I want metadata(mainly abstract and description) (preferably in xml format) of all the studies/samples in SRA till date.

    Thnx in advance.

  • #2
    You might try

    ftp://ftp-trace.ncbi.nlm.nih.gov/sra...Accessions.tab

    Comment


    • #3
      Thanx for the reply.

      I have checked it but it doesnt contain the information I want i.e. study abstract and description. It only contains IDs, which I already have.

      I am trying few other things lets hope if it works.

      Comment


      • #4
        Grab the full XML dump and parse it:
        ftp://ftp-trace.ncbi.nlm.nih.gov/sra...0111101.tar.gz

        Comment


        • #5
          @vadim... thnx a lot...

          This is wht I was looking for. But in this also one issue is that not every SRA id is having study.xml, but its ok. I can live with that.

          Comment


          • #6
            What do you mean by SRA id? Each SRA run should be associated with a study through SRA experiment. The XML schema might be useful:

            or


            Also have a look here for a complete XML dump (including EBI SRA):

            Comment


            • #7
              Both link to same page.

              I understand SRA run, they have ID as SRR.

              For eg: take this case.



              Out of these only SRA028059 folder in the SRA Metadata is having *.study.xml.

              SRP = Study
              SRX = Experiment
              SRS = Sample
              SRR = Run

              But what basically is SRA for?? I am confused here.

              Comment


              • #8
                This is a reply I got from a person in SRA.

                The SRA number acts as a collector for the information. This means that when a center submits metadata or data they create a submission (SRAXXXXXX), but the data or metadata in the submission links to another submission.

                This is ok with me, but I don't understand the fact that why have separate SRA ids for same study, even if one has to submit more samples to the same study at a later stage.

                Comment


                • #9
                  SRA* accessions are NCBI submission accessions, similarly ERA* accessions are EBI submission accessions and DRA* are DDBJ submission accessions.
                  Sometimes study is submitted before the run data, but since metadata dumps are organized by submission accession in such cases run and study metadata end up in separate folders. To get proper association use the livelists:
                  NCBI: ftp://ftp-trace.ncbi.nlm.nih.gov/sra...Accessions.tab
                  EBI: ftp://ftp.sra.ebi.ac.uk/meta/list/livelist.gz

                  As for the submissions you are asking, it appears that the run was submitted in SRA028225, experiment in SRA028192 and the study in SRA028059.

                  Comment


                  • #10
                    yeah... got it... That *.tab file is very useful indeed... thanx...

                    Comment


                    • #11
                      I guess I can conclude this:

                      "NCBI-SRA doesn't have a single ID using which we can get everything related to it i.e. study, run, experiment, sample", the only way to go about it is use the SRA_Accessions.tab, using study_id *RP* get the *RA*(s) then using *RA*(s) get the *RR*(s)... there is no direct way of getting *RR* from *RP*."

                      RA = SRA Accessions
                      RR = Run
                      RP = Study

                      Comment


                      • #12
                        Maybe not relevant for this question but am finding the DNAnexus SRA interface much nicer than the NCBI's: http://sra.dnanexus.com/

                        Comment


                        • #13
                          Originally posted by nickloman View Post
                          Maybe not relevant for this question but am finding the DNAnexus SRA interface much nicer than the NCBI's: http://sra.dnanexus.com/
                          @nickloman...

                          ok... will have a look at it as well...

                          Comment


                          • #14
                            can anybody explain me dis...

                            many SRA(accession) ids having same SRP(study) id this is ok... as justified by an answer above...
                            but many SRP(study) ids having same SRA(accession) id...?

                            Comment


                            • #15
                              You might take a look at:

                              The Sequence Read Archive (SRA) is the largest public repository of sequencing data from the next generation of sequencing platforms including Roche 454 GS System, Illumina Genome Analyzer, Applied Biosystems SOLiD System, Helicos Heliscope, and others. However, finding data of interest can be challenging using current tools. SRAdb is an attempt to make access to the metadata associated with submission, study, sample, experiment and run much more feasible. This is accomplished by parsing all the NCBI SRA metadata into a SQLite database that can be stored and queried locally. Fulltext search in the package make querying metadata very flexible and powerful. fastq and sra files can be downloaded for doing alignment locally. Beside ftp protocol, the SRAdb has funcitons supporting fastp protocol (ascp from Aspera Connect) for faster downloading large data files over long distance. The SQLite database is updated regularly as new data is added to SRA and can be downloaded at will for the most up-to-date metadata.


                              While this is an R/Bioconductor package, the underlying data are stored in a SQLite database that can be downloaded separately and used directly or from any language with a SQLite driver (most languages). What is done to create the database is to download all the SRA XML files containing metadata, parse those files, and then load them into a relational database. This makes bulk operations on the data easier and more flexible since SQL can be used. Some full-text searching capabilities are also included since SQLite supports that in later versions.
                              Last edited by sdavis; 12-02-2011, 10:21 AM.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Recent Innovations in Spatial Biology
                                by seqadmin


                                Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

                                3D Genomics
                                While spatial biology often involves studying proteins and RNAs in their...
                                01-01-2025, 07:30 PM
                              • seqadmin
                                Advancing Precision Medicine for Rare Diseases in Children
                                by seqadmin




                                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                                12-16-2024, 07:57 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 01-09-2025, 04:04 PM
                              0 responses
                              431 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 01-09-2025, 09:42 AM
                              0 responses
                              440 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 01-08-2025, 03:17 PM
                              0 responses
                              452 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 01-03-2025, 11:18 AM
                              1 response
                              50 views
                              1 like
                              Last Post Tonia
                              by Tonia
                               
                              Working...
                              X