SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Downloading 'RunInfo Table' from SRA Run Selector (http://seqanswers.com/forums/showthread.php?t=87488)

roliwilhelm 02-02-2019 04:36 PM

Downloading 'RunInfo Table' from SRA Run Selector
 
Hello,

I would like to download the metadata for a given BioProject from the SRA. I am able to get exactly what I need by hitting the download 'RunInfo Table' through the SRA Run Selector web interface (example). It should be relatively straightforward to perform this action from the command line using "wget".

By clicking on the 'RunInfo Table' button, the page loads the following address, which is stable link to download the information:

https://www.ncbi.nlm.nih.gov/Traces/...416ada018b1ea1

BUT, I have no idea where that hash information is coming from. Can anyone help there?

Alternatively, I've tried a series of efetch commands, but none provide me a '.tsv' (or '.csv' would be fine) of the complete BioProject metadata.

This command provides only the information about sequencing:
wget -O PRJNA308986.csv 'http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?save=efetch&db=sra&rettype=runinfo&term=PRJNA308986'

This command provides the full BioProject information sought, but in an .xml format which I haven't been able to parse.

wget -O PRJNA496337.xml 'http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?save=efetch&db=sra&rettype=bioproject&term=PRJNA496337'

Thanks in advance,
Roli

vkkodali 02-04-2019 04:58 AM

In general, for downloading NCBI data from the Unix command line, I recommend using Entrez Direct.

Specifically, to download the runinfo table, you can use the following command:
Code:

esearch -db sra -q 'PRJNA308986' | efetch -format runinfo
This will produce a comma separated table with the following fields:
Code:

                  Run [  1]: SRR3108728
          ReleaseDate [  2]: 2017-02-16 00:00:00
            LoadDate [  3]: 2016-01-21 03:15:18
                spots [  4]: 98100
                bases [  5]: 49246200
    spots_with_mates [  6]: 98100
            avgLength [  7]: 502
              size_MB [  8]: 28
        AssemblyName [  9]:
        download_path [ 10]: https://sra-download.ncbi.nlm.nih.gov/traces/sra37/SRR/003035/SRR3108728
          Experiment [ 11]: SRX1537041
          LibraryName [ 12]: mdbk110
      LibraryStrategy [ 13]: AMPLICON
    LibrarySelection [ 14]: PCR
        LibrarySource [ 15]: METAGENOMIC
        LibraryLayout [ 16]: PAIRED
          InsertSize [ 17]: 0
            InsertDev [ 18]: 0
            Platform [ 19]: ILLUMINA
                Model [ 20]: Illumina MiSeq
            SRAStudy [ 21]: SRP068618
          BioProject [ 22]: PRJNA308986
      Study_Pubmed_id [ 23]:
            ProjectID [ 24]: 308986
              Sample [ 25]: SRS1253892
            BioSample [ 26]: SAMN04419133
          SampleType [ 27]: simple
                TaxID [ 28]: 410658
      ScientificName [ 29]: soil metagenome
          SampleName [ 30]: mdbk110
        g1k_pop_code [ 31]:
              source [ 32]:
  g1k_analysis_group [ 33]:
          Subject_ID [ 34]:
                  Sex [ 35]:
              Disease [ 36]:
                Tumor [ 37]: no
    Affection_Status [ 38]:
        Analyte_Type [ 39]:
    Histological_Type [ 40]:
            Body_Site [ 41]:
          CenterName [ 42]: UNIVERSITY OF MINNESOTA
          Submission [ 43]: SRA336468
dbgap_study_accession [ 44]:
              Consent [ 45]: public
              RunHash [ 46]: 4B63AAF2295927A2EAEB798FCF9FC7DA
            ReadHash [ 47]: FB1226CB8B5FEBC85B053718D4C1BBFA

You can download the same table in XML format by making a small change as follows:
Code:

esearch -db sra -q 'PRJNA308986' | efetch -format runinfo -mode xml
You can then parse this XML using the command "xtract" that comes with the Entrez Direct tools to extract only specific columns of interest to you.


All times are GMT -8. The time now is 07:37 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.