SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
new usage of SRA toolkit/ SRA archive data download Retro Bioinformatics 7 08-20-2015 02:05 PM
Use esearch/efetch to output relationship table of GSM to SRR (SRA file names) apredeus Bioinformatics 1 06-07-2014 04:20 PM
Downloading multiple SRA runs Susanna5 Bioinformatics 2 05-30-2013 07:01 AM
Downloading SRA Study using Aspera - Error carmeyeii Bioinformatics 3 12-02-2012 01:56 PM
Halo Genomics Selector Technology BTS Sample Prep / Library Generation 8 12-08-2011 10:43 AM

Reply
 
Thread Tools
Old 02-02-2019, 04:36 PM   #1
roliwilhelm
Member
 
Location: Ithaca, NY

Join Date: Jun 2012
Posts: 38
Question Downloading 'RunInfo Table' from SRA Run Selector

Hello,

I would like to download the metadata for a given BioProject from the SRA. I am able to get exactly what I need by hitting the download 'RunInfo Table' through the SRA Run Selector web interface (example). It should be relatively straightforward to perform this action from the command line using "wget".

By clicking on the 'RunInfo Table' button, the page loads the following address, which is stable link to download the information:

https://www.ncbi.nlm.nih.gov/Traces/...416ada018b1ea1

BUT, I have no idea where that hash information is coming from. Can anyone help there?

Alternatively, I've tried a series of efetch commands, but none provide me a '.tsv' (or '.csv' would be fine) of the complete BioProject metadata.

This command provides only the information about sequencing:
wget -O PRJNA308986.csv 'http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?save=efetch&db=sra&rettype=runinfo&term=PRJNA308986'

This command provides the full BioProject information sought, but in an .xml format which I haven't been able to parse.

wget -O PRJNA496337.xml 'http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?save=efetch&db=sra&rettype=bioproject&term=PRJNA496337'

Thanks in advance,
Roli
roliwilhelm is offline   Reply With Quote
Old 02-04-2019, 04:58 AM   #2
vkkodali
Junior Member
 
Location: MD, USA

Join Date: Jul 2018
Posts: 1
Default

In general, for downloading NCBI data from the Unix command line, I recommend using Entrez Direct.

Specifically, to download the runinfo table, you can use the following command:
Code:
esearch -db sra -q 'PRJNA308986' | efetch -format runinfo
This will produce a comma separated table with the following fields:
Code:
                  Run [  1]: SRR3108728
          ReleaseDate [  2]: 2017-02-16 00:00:00
             LoadDate [  3]: 2016-01-21 03:15:18
                spots [  4]: 98100
                bases [  5]: 49246200
     spots_with_mates [  6]: 98100
            avgLength [  7]: 502
              size_MB [  8]: 28
         AssemblyName [  9]: 
        download_path [ 10]: https://sra-download.ncbi.nlm.nih.gov/traces/sra37/SRR/003035/SRR3108728
           Experiment [ 11]: SRX1537041
          LibraryName [ 12]: mdbk110
      LibraryStrategy [ 13]: AMPLICON
     LibrarySelection [ 14]: PCR
        LibrarySource [ 15]: METAGENOMIC
        LibraryLayout [ 16]: PAIRED
           InsertSize [ 17]: 0
            InsertDev [ 18]: 0
             Platform [ 19]: ILLUMINA
                Model [ 20]: Illumina MiSeq
             SRAStudy [ 21]: SRP068618
           BioProject [ 22]: PRJNA308986
      Study_Pubmed_id [ 23]: 
            ProjectID [ 24]: 308986
               Sample [ 25]: SRS1253892
            BioSample [ 26]: SAMN04419133
           SampleType [ 27]: simple
                TaxID [ 28]: 410658
       ScientificName [ 29]: soil metagenome
           SampleName [ 30]: mdbk110
         g1k_pop_code [ 31]: 
               source [ 32]: 
   g1k_analysis_group [ 33]: 
           Subject_ID [ 34]: 
                  Sex [ 35]: 
              Disease [ 36]: 
                Tumor [ 37]: no
     Affection_Status [ 38]: 
         Analyte_Type [ 39]: 
    Histological_Type [ 40]: 
            Body_Site [ 41]: 
           CenterName [ 42]: UNIVERSITY OF MINNESOTA
           Submission [ 43]: SRA336468
dbgap_study_accession [ 44]: 
              Consent [ 45]: public
              RunHash [ 46]: 4B63AAF2295927A2EAEB798FCF9FC7DA
             ReadHash [ 47]: FB1226CB8B5FEBC85B053718D4C1BBFA
You can download the same table in XML format by making a small change as follows:
Code:
esearch -db sra -q 'PRJNA308986' | efetch -format runinfo -mode xml
You can then parse this XML using the command "xtract" that comes with the Entrez Direct tools to extract only specific columns of interest to you.
vkkodali is offline   Reply With Quote
Reply

Tags
command line, csv, runtable info, sra

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:50 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO