SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
fastq-dump on SRA files harlock0083 Bioinformatics 14 10-18-2018 04:19 AM
Convert fastq from NCBI SRA to fasta and qual? kmkocot Bioinformatics 7 10-09-2012 10:15 AM
How convert multiple .sra files into .fastq in one go? TuA Bioinformatics 5 05-27-2011 09:32 AM
sra-lite to fastq problem: no output pickrell Bioinformatics 0 02-03-2011 12:26 PM
Why are Illumina paired-end SRA datasets made up of 3 FASTQ files? Bio.X2Y Illumina/Solexa 9 12-21-2010 12:36 PM

Reply
 
Thread Tools
Old 03-29-2012, 07:56 PM   #1
efoss
Member
 
Location: Seattle

Join Date: Jul 2011
Posts: 98
Default problem understanding NCBI SRA fastq files

I downloaded some sra files from NCBI's short read archive and converted them to fastq format. The experiment is described as paired end reads, so I expected to get two fastq files from each sra file. Instead, I only got one fastq file from each. Then I thought that I could find which reads were read1 reads and which ones were read2 reads, but I couldn't see anything to indicate whether it's a read1 or a read2. Here are some lines from one of the files:


@SRR254172.11 ILLUMINA-20A1B2_0004_FC6282EAAXX:6:1:1921:953 length=160
NACAAAGGTAATTGCAAGTCCCTTCGTGCCAAAACGTCCAGCCCTTCCAACCCTGTGCAAATAAGTATCAGCTGAGTCTGAATCTGCATTCATTCTGGAATGACTCAGGAAGAAAGGCTAACAAGATATAAGAACTTCAAGGAAGGCCACAAGAGAATTC
+SRR254172.11 ILLUMINA-20A1B2_0004_FC6282EAAXX:6:1:1921:953 length=160
#)0+)**2,,@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@:3:::@@@22:<<:8@@:@@@@@@@IIHIIIIIIII?HHIIFIGIIIIIIEGIGHIIIIFAIBDIHHGEHDBEFIIB<IIHHI3EFEDFC@HH@F@2;8<>@0??

You get one line starting with @, then a line with the sequence, then a line essentially identical to the @ line except starting with + rather than @, and then a line with base quality scores.

Does anyone understand this format and how I can get fastq files for both read1 and read2?

Thank you.

Eric
efoss is offline   Reply With Quote
Old 03-30-2012, 12:49 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

The basics of FASTQ are described here http://nar.oxfordjournals.org/conten...r.gkp1137.full and http://en.wikipedia.org/wiki/FASTQ_format

How did you do the conversion? I recall there are extra switches needed at the command line for paired end data...

Last edited by maubp; 03-30-2012 at 01:23 AM.
maubp is offline   Reply With Quote
Old 03-30-2012, 07:46 AM   #3
efoss
Member
 
Location: Seattle

Join Date: Jul 2011
Posts: 98
Default

Quote:
Originally Posted by maubp View Post
How did you do the conversion? I recall there are extra switches needed at the command line for paired end data...
Hi Maubp,

I'll bet that that's where I'm making a mistake. I did the conversion in two ways, neither of which gave me the paired end reads I wanted:

fastq-dump *.sra

fastq-dump.2 *.sra

Eric
efoss is offline   Reply With Quote
Old 03-30-2012, 08:15 AM   #4
jrm5100
Junior Member
 
Location: PA

Join Date: Mar 2012
Posts: 1
Default

You need to use the --split-3 option.

fastq-dump --split-3 *.sra
jrm5100 is offline   Reply With Quote
Old 03-30-2012, 08:17 AM   #5
efoss
Member
 
Location: Seattle

Join Date: Jul 2011
Posts: 98
Default

Quote:
Originally Posted by jrm5100 View Post
You need to use the --split-3 option.

fastq-dump --split-3 *.sra
Thanks so very much!!

Eric
efoss is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:50 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO