SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
SRA - SRR*.lite.sra adrian Bioinformatics 2 03-19-2012 09:43 AM
Download from SRA archive SongLi Bioinformatics 4 04-22-2011 09:55 AM
Study Design gavin.oliver General 2 02-25-2011 02:26 AM
SRF metadata Nick Bioinformatics 2 09-03-2010 12:24 AM
metadata for SRA Sequencing Illumina/Solexa 0 08-05-2010 03:43 AM

Reply
 
Thread Tools
Old 11-15-2011, 10:37 PM   #1
ersgupta
Member
 
Location: India

Join Date: Jun 2011
Posts: 26
Default SRA study metadata download

Hi Guys,

I am trying to download the metadata of all the studies submitted in SRA but I am not able to find a complete list. Can anybody help me out with this.

I want metadata(mainly abstract and description) (preferably in xml format) of all the studies/samples in SRA till date.

Thnx in advance.
ersgupta is offline   Reply With Quote
Old 11-16-2011, 10:12 AM   #2
laura
Senior Member
 
Location: Cambridge UK

Join Date: Sep 2008
Posts: 151
Default

You might try

ftp://ftp-trace.ncbi.nlm.nih.gov/sra...Accessions.tab
laura is offline   Reply With Quote
Old 11-16-2011, 08:26 PM   #3
ersgupta
Member
 
Location: India

Join Date: Jun 2011
Posts: 26
Default

Thanx for the reply.

I have checked it but it doesnt contain the information I want i.e. study abstract and description. It only contains IDs, which I already have.

I am trying few other things lets hope if it works.
ersgupta is offline   Reply With Quote
Old 11-17-2011, 01:36 AM   #4
vadim
Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 37
Default

Grab the full XML dump and parse it:
ftp://ftp-trace.ncbi.nlm.nih.gov/sra...0111101.tar.gz
vadim is offline   Reply With Quote
Old 11-17-2011, 01:40 AM   #5
ersgupta
Member
 
Location: India

Join Date: Jun 2011
Posts: 26
Default

@vadim... thnx a lot...

This is wht I was looking for. But in this also one issue is that not every SRA id is having study.xml, but its ok. I can live with that.
ersgupta is offline   Reply With Quote
Old 11-17-2011, 02:11 AM   #6
vadim
Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 37
Default

What do you mean by SRA id? Each SRA run should be associated with a study through SRA experiment. The XML schema might be useful:
http://www.ncbi.nlm.nih.gov/viewvc/v...a/doc/SRA_1-3/
or
http://ftp.sra.ebi.ac.uk/meta/xsd/sra_1_3/

Also have a look here for a complete XML dump (including EBI SRA):
http://ftp.sra.ebi.ac.uk/meta/xml/xml.all.tar.gz
vadim is offline   Reply With Quote
Old 11-17-2011, 02:36 AM   #7
ersgupta
Member
 
Location: India

Join Date: Jun 2011
Posts: 26
Default

Both link to same page.

I understand SRA run, they have ID as SRR.

For eg: take this case.
http://www.ncbi.nlm.nih.gov/sra?term=SRA028225
http://www.ncbi.nlm.nih.gov/sra?term=SRA028192
http://www.ncbi.nlm.nih.gov/sra?term=SRA028059
Out of these only SRA028059 folder in the SRA Metadata is having *.study.xml.

SRP = Study
SRX = Experiment
SRS = Sample
SRR = Run

But what basically is SRA for?? I am confused here.
ersgupta is offline   Reply With Quote
Old 11-17-2011, 02:55 AM   #8
ersgupta
Member
 
Location: India

Join Date: Jun 2011
Posts: 26
Default

This is a reply I got from a person in SRA.

The SRA number acts as a collector for the information. This means that when a center submits metadata or data they create a submission (SRAXXXXXX), but the data or metadata in the submission links to another submission.

This is ok with me, but I don't understand the fact that why have separate SRA ids for same study, even if one has to submit more samples to the same study at a later stage.
ersgupta is offline   Reply With Quote
Old 11-17-2011, 03:14 AM   #9
vadim
Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 37
Default

SRA* accessions are NCBI submission accessions, similarly ERA* accessions are EBI submission accessions and DRA* are DDBJ submission accessions.
Sometimes study is submitted before the run data, but since metadata dumps are organized by submission accession in such cases run and study metadata end up in separate folders. To get proper association use the livelists:
NCBI: ftp://ftp-trace.ncbi.nlm.nih.gov/sra...Accessions.tab
EBI: ftp://ftp.sra.ebi.ac.uk/meta/list/livelist.gz

As for the submissions you are asking, it appears that the run was submitted in SRA028225, experiment in SRA028192 and the study in SRA028059.
vadim is offline   Reply With Quote
Old 11-18-2011, 12:06 AM   #10
ersgupta
Member
 
Location: India

Join Date: Jun 2011
Posts: 26
Default

yeah... got it... That *.tab file is very useful indeed... thanx...
ersgupta is offline   Reply With Quote
Old 11-18-2011, 02:34 AM   #11
ersgupta
Member
 
Location: India

Join Date: Jun 2011
Posts: 26
Default

I guess I can conclude this:

"NCBI-SRA doesn't have a single ID using which we can get everything related to it i.e. study, run, experiment, sample", the only way to go about it is use the SRA_Accessions.tab, using study_id *RP* get the *RA*(s) then using *RA*(s) get the *RR*(s)... there is no direct way of getting *RR* from *RP*."

RA = SRA Accessions
RR = Run
RP = Study
ersgupta is offline   Reply With Quote
Old 11-18-2011, 02:52 AM   #12
nickloman
Senior Member
 
Location: Birmingham, UK

Join Date: Jul 2009
Posts: 356
Default

Maybe not relevant for this question but am finding the DNAnexus SRA interface much nicer than the NCBI's: http://sra.dnanexus.com/
nickloman is offline   Reply With Quote
Old 11-18-2011, 02:55 AM   #13
ersgupta
Member
 
Location: India

Join Date: Jun 2011
Posts: 26
Default

Quote:
Originally Posted by nickloman View Post
Maybe not relevant for this question but am finding the DNAnexus SRA interface much nicer than the NCBI's: http://sra.dnanexus.com/
@nickloman...

ok... will have a look at it as well...
ersgupta is offline   Reply With Quote
Old 11-18-2011, 10:47 PM   #14
ersgupta
Member
 
Location: India

Join Date: Jun 2011
Posts: 26
Default

can anybody explain me dis...

many SRA(accession) ids having same SRP(study) id this is ok... as justified by an answer above...
but many SRP(study) ids having same SRA(accession) id...?
ersgupta is offline   Reply With Quote
Old 12-02-2011, 09:17 AM   #15
sdavis
Member
 
Location: Maryland

Join Date: Jan 2010
Posts: 14
Default

You might take a look at:

http://www.bioconductor.org/packages...tml/SRAdb.html

While this is an R/Bioconductor package, the underlying data are stored in a SQLite database that can be downloaded separately and used directly or from any language with a SQLite driver (most languages). What is done to create the database is to download all the SRA XML files containing metadata, parse those files, and then load them into a relational database. This makes bulk operations on the data easier and more flexible since SQL can be used. Some full-text searching capabilities are also included since SQLite supports that in later versions.

Last edited by sdavis; 12-02-2011 at 09:21 AM.
sdavis is offline   Reply With Quote
Old 12-05-2011, 12:53 AM   #16
vadim
Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 37
Default

Quote:
Originally Posted by ersgupta View Post
can anybody explain me dis...

many SRA(accession) ids having same SRP(study) id this is ok... as justified by an answer above...
but many SRP(study) ids having same SRA(accession) id...?
Could you please provide an example?
vadim is offline   Reply With Quote
Reply

Tags
sra

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:19 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO