Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SRA study metadata download

    Hi Guys,

    I am trying to download the metadata of all the studies submitted in SRA but I am not able to find a complete list. Can anybody help me out with this.

    I want metadata(mainly abstract and description) (preferably in xml format) of all the studies/samples in SRA till date.

    Thnx in advance.

  • #2
    You might try

    ftp://ftp-trace.ncbi.nlm.nih.gov/sra...Accessions.tab

    Comment


    • #3
      Thanx for the reply.

      I have checked it but it doesnt contain the information I want i.e. study abstract and description. It only contains IDs, which I already have.

      I am trying few other things lets hope if it works.

      Comment


      • #4
        Grab the full XML dump and parse it:
        ftp://ftp-trace.ncbi.nlm.nih.gov/sra...0111101.tar.gz

        Comment


        • #5
          @vadim... thnx a lot...

          This is wht I was looking for. But in this also one issue is that not every SRA id is having study.xml, but its ok. I can live with that.

          Comment


          • #6
            What do you mean by SRA id? Each SRA run should be associated with a study through SRA experiment. The XML schema might be useful:

            or


            Also have a look here for a complete XML dump (including EBI SRA):

            Comment


            • #7
              Both link to same page.

              I understand SRA run, they have ID as SRR.

              For eg: take this case.



              Out of these only SRA028059 folder in the SRA Metadata is having *.study.xml.

              SRP = Study
              SRX = Experiment
              SRS = Sample
              SRR = Run

              But what basically is SRA for?? I am confused here.

              Comment


              • #8
                This is a reply I got from a person in SRA.

                The SRA number acts as a collector for the information. This means that when a center submits metadata or data they create a submission (SRAXXXXXX), but the data or metadata in the submission links to another submission.

                This is ok with me, but I don't understand the fact that why have separate SRA ids for same study, even if one has to submit more samples to the same study at a later stage.

                Comment


                • #9
                  SRA* accessions are NCBI submission accessions, similarly ERA* accessions are EBI submission accessions and DRA* are DDBJ submission accessions.
                  Sometimes study is submitted before the run data, but since metadata dumps are organized by submission accession in such cases run and study metadata end up in separate folders. To get proper association use the livelists:
                  NCBI: ftp://ftp-trace.ncbi.nlm.nih.gov/sra...Accessions.tab
                  EBI: ftp://ftp.sra.ebi.ac.uk/meta/list/livelist.gz

                  As for the submissions you are asking, it appears that the run was submitted in SRA028225, experiment in SRA028192 and the study in SRA028059.

                  Comment


                  • #10
                    yeah... got it... That *.tab file is very useful indeed... thanx...

                    Comment


                    • #11
                      I guess I can conclude this:

                      "NCBI-SRA doesn't have a single ID using which we can get everything related to it i.e. study, run, experiment, sample", the only way to go about it is use the SRA_Accessions.tab, using study_id *RP* get the *RA*(s) then using *RA*(s) get the *RR*(s)... there is no direct way of getting *RR* from *RP*."

                      RA = SRA Accessions
                      RR = Run
                      RP = Study

                      Comment


                      • #12
                        Maybe not relevant for this question but am finding the DNAnexus SRA interface much nicer than the NCBI's: http://sra.dnanexus.com/

                        Comment


                        • #13
                          Originally posted by nickloman View Post
                          Maybe not relevant for this question but am finding the DNAnexus SRA interface much nicer than the NCBI's: http://sra.dnanexus.com/
                          @nickloman...

                          ok... will have a look at it as well...

                          Comment


                          • #14
                            can anybody explain me dis...

                            many SRA(accession) ids having same SRP(study) id this is ok... as justified by an answer above...
                            but many SRP(study) ids having same SRA(accession) id...?

                            Comment


                            • #15
                              You might take a look at:

                              The Sequence Read Archive (SRA) is the largest public repository of sequencing data from the next generation of sequencing platforms including Roche 454 GS System, Illumina Genome Analyzer, Applied Biosystems SOLiD System, Helicos Heliscope, and others. However, finding data of interest can be challenging using current tools. SRAdb is an attempt to make access to the metadata associated with submission, study, sample, experiment and run much more feasible. This is accomplished by parsing all the NCBI SRA metadata into a SQLite database that can be stored and queried locally. Fulltext search in the package make querying metadata very flexible and powerful. fastq and sra files can be downloaded for doing alignment locally. Beside ftp protocol, the SRAdb has funcitons supporting fastp protocol (ascp from Aspera Connect) for faster downloading large data files over long distance. The SQLite database is updated regularly as new data is added to SRA and can be downloaded at will for the most up-to-date metadata.


                              While this is an R/Bioconductor package, the underlying data are stored in a SQLite database that can be downloaded separately and used directly or from any language with a SQLite driver (most languages). What is done to create the database is to download all the SRA XML files containing metadata, parse those files, and then load them into a relational database. This makes bulk operations on the data easier and more flexible since SQL can be used. Some full-text searching capabilities are also included since SQLite supports that in later versions.
                              Last edited by sdavis; 12-02-2011, 10:21 AM.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM
                              • seqadmin
                                The Impact of AI in Genomic Medicine
                                by seqadmin



                                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                02-26-2024, 02:07 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-14-2024, 06:13 AM
                              0 responses
                              33 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-08-2024, 08:03 AM
                              0 responses
                              72 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-07-2024, 08:13 AM
                              0 responses
                              81 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-06-2024, 09:51 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X