SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
data set from NCBI SRA masylichu RNA Sequencing 2 10-27-2015 04:27 PM
Convert fastq from NCBI SRA to fasta and qual? kmkocot Bioinformatics 7 10-09-2012 10:15 AM
ncbi sra cburger Bioinformatics 0 02-02-2011 09:04 AM
GEO(NCBI) database data integration Seq_g Bioinformatics 0 08-30-2010 02:42 PM
454 /NCBI SRA & traceinfo v_kisand Bioinformatics 7 12-28-2009 02:13 AM

Reply
 
Thread Tools
Old 01-29-2010, 11:29 AM   #1
bair
Member
 
Location: London

Join Date: Jan 2010
Posts: 65
Default NCBI SRA database

Hello,

Is someone familay with NCBI SRA database: http://www.ncbi.nlm.nih.gov/sites/entrez

searching SRA for SRP000607 about Korean genome study, got 5 experiments,

What's the relation about experiment, runs and spots?

These 5 experiment sampled from the same person, all supposed to have paired reads, but SRX002757 does not have paired data.

Under SRX002761, the reads files are strange to me, like:

06/11/2009 12:00AM 239 SRR016027.fastq.gz
06/11/2009 12:00AM 788,821,597 SRR016027_1.fastq.gz
06/11/2009 12:00AM 797,621,364 SRR016027_2.fastq.gz
06/11/2009 12:00AM 22,470 SRR016028.fastq.gz
06/11/2009 12:00AM 809,891,610 SRR016028_1.fastq.gz
06/11/2009 12:00AM 810,659,524 SRR016028_2.fastq.gz

SRR016027_1.fastq.gz mates to SRR016027_2.fastq.gz, how about SRR016027.fastq.gz?

I want to play with this datasets, can I just use all the paired files in these 5 experiments and ignore the unpaired files like SRR016027.fastq.gz?

Lots experts here, any help will be appriciated!
bair is offline   Reply With Quote
Old 01-29-2010, 08:03 PM   #2
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

This link may be helpful to you (it really should be featured more prominently on the SRA)
http://www.ncbi.nlm.nih.gov/bookshel...cbi&part=Aug09

Excerpt (I've added linebreaks for clarity). One might think that in your case each Experiment had different instrument parameters or library characteristics and somewhere it would be documented, but as far as I can tell these were all 80x1 runs. Wierd.
An Experiment describes specifically what was sequenced and the method used. It includes information about the source of the DNA, the Sample, the sequencing platform, and the processing of the data.

Each Experiment is made up of one or more instrument Runs.

A Run contains the results or reads from each spot in the instrument run.

In the future, some data will also have an associated Analysis. These Analyses may include assemblies of the short reads into genomic or transcript contigs and alignment to existing genomes or alignments with SRA data.

Records at each level have unique accession identifiers with a specific three letter prefix that indicates the type of record: ERP or SRP for Studies, SRS for samples, SRX for Experiments, and SRR for Runs.
krobison is offline   Reply With Quote
Old 01-31-2010, 09:49 AM   #3
bair
Member
 
Location: London

Join Date: Jan 2010
Posts: 65
Smile

Thank you, krobison

That information is quite helpful.
bair is offline   Reply With Quote
Old 01-19-2012, 06:13 PM   #4
dvanic
Member
 
Location: Sydney, Australia

Join Date: Jan 2012
Posts: 61
Question

Quote:
SRR016027_1.fastq.gz mates to SRR016027_2.fastq.gz, how about SRR016027.fastq.gz?
Hi! I've actually got the same question, albeit for a different dataset. If SRR123456_1.fastq mates with SRR123456_2.fastq, then what is the (much smaller), but still "properly" formatted and reasonably sized (~25 Mb in my case) SRR123456.fastq file???
Thanks in advance!
dvanic is offline   Reply With Quote
Old 01-20-2012, 03:01 AM   #5
vadim
Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 37
Default

Quote:
Originally Posted by dvanic View Post
Hi! I've actually got the same question, albeit for a different dataset. If SRR123456_1.fastq mates with SRR123456_2.fastq, then what is the (much smaller), but still "properly" formatted and reasonably sized (~25 Mb in my case) SRR123456.fastq file???
Thanks in advance!
I believe SRR123456.fastq contains the "leftovers": reads with missing mates (due to filtering etc. )
vadim is offline   Reply With Quote
Old 10-27-2015, 03:07 PM   #6
VC87
Member
 
Location: Portugal

Join Date: Oct 2015
Posts: 18
Default

Hi!can someonde tell me how can i search SRA files trouhgh metadata features (wether in GEO, ENA..)?thanks in advance!
VC87 is offline   Reply With Quote
Old 10-27-2015, 03:11 PM   #7
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,080
Default

Quote:
Originally Posted by VC87 View Post
Hi!can someonde tell me how can i search SRA files trouhgh metadata features (wether in GEO, ENA..)?thanks in advance!
Not sure what exactly you are looking for but have you tried the advanced search: http://www.ncbi.nlm.nih.gov/sra/advanced
GenoMax is offline   Reply With Quote
Old 10-27-2015, 04:19 PM   #8
VC87
Member
 
Location: Portugal

Join Date: Oct 2015
Posts: 18
Default

Yes i have.I want to search all SRA files from Bisulfite seq library fixing certain features such as organism, tissue, age, sex etc..thanks anyway for your reply!
VC87 is offline   Reply With Quote
Old 10-27-2015, 04:39 PM   #9
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,080
Default

A search found this: http://sra.dbcls.jp/search

Project here: https://github.com/inutano/soylatte

R-solution: https://www.bioconductor.org/package...tml/SRAdb.html
GenoMax is offline   Reply With Quote
Old 10-28-2015, 03:42 AM   #10
VC87
Member
 
Location: Portugal

Join Date: Oct 2015
Posts: 18
Default

Genomax thanks for your reply!i'll check that out
VC87 is offline   Reply With Quote
Old 11-02-2015, 01:12 PM   #11
VC87
Member
 
Location: Portugal

Join Date: Oct 2015
Posts: 18
Default

Does anyone know how to get the raw SRA files associated with the samples that we can search in the browser from the epigenomics database of NCBI? i suppose it should be possible to gte them from the sample ID but i dont know how to...
VC87 is offline   Reply With Quote
Old 11-02-2015, 02:07 PM   #12
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,080
Default

Do you want the SRA files or the fastq files?
GenoMax is offline   Reply With Quote
Old 11-02-2015, 02:15 PM   #13
VC87
Member
 
Location: Portugal

Join Date: Oct 2015
Posts: 18
Default

SRA, for now
VC87 is offline   Reply With Quote
Old 11-02-2015, 02:21 PM   #14
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,080
Default

SRAtoolkit makes it easy to download the actual fastq data since you would have to uncompress the SRA files locally anyway. The toolkit saves you a step. You are most likely going to use the "fastq-dump" program. Help here: http://www.ncbi.nlm.nih.gov/Traces/s...ew=toolkit_doc
GenoMax is offline   Reply With Quote
Old 11-03-2015, 01:38 PM   #15
VC87
Member
 
Location: Portugal

Join Date: Oct 2015
Posts: 18
Default

Thanks again.By the way, do you know if it is possible to convert wig to fasta (or SRA)?
VC87 is offline   Reply With Quote
Old 11-03-2015, 02:06 PM   #16
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,080
Default

Quote:
Originally Posted by VC87 View Post
Thanks again.By the way, do you know if it is possible to convert wig to fasta (or SRA)?
https://www.biostars.org/p/48165/
http://seqanswers.com/forums/showthread.php?t=21347

SRA Format is only for their internal use.
GenoMax is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:04 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO