I downloaded some sra files from NCBI's short read archive and converted them to fastq format. The experiment is described as paired end reads, so I expected to get two fastq files from each sra file. Instead, I only got one fastq file from each. Then I thought that I could find which reads were read1 reads and which ones were read2 reads, but I couldn't see anything to indicate whether it's a read1 or a read2. Here are some lines from one of the files:
@SRR254172.11 ILLUMINA-20A1B2_0004_FC6282EAAXX:6:1:1921:953 length=160
NACAAAGGTAATTGCAAGTCCCTTCGTGCCAAAACGTCCAGCCCTTCCAACCCTGTGCAAATAAGTATCAGCTGAGTCTGAATCTGCATTCATTCTGGAATGACTCAGGAAGAAAGGCTAACAAGATATAAGAACTTCAAGGAAGGCCACAAGAGAATTC
+SRR254172.11 ILLUMINA-20A1B2_0004_FC6282EAAXX:6:1:1921:953 length=160
#)0+)**2,,@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@:3:::@@@22:<<:8@@:@@@@@@@IIHIIIIIIII?HHIIFIGIIIIIIEGIGHIIIIFAIBDIHHGEHDBEFIIB<IIHHI3EFEDFC@HH@F@2;8<>@0??
You get one line starting with @, then a line with the sequence, then a line essentially identical to the @ line except starting with + rather than @, and then a line with base quality scores.
Does anyone understand this format and how I can get fastq files for both read1 and read2?
Thank you.
Eric
@SRR254172.11 ILLUMINA-20A1B2_0004_FC6282EAAXX:6:1:1921:953 length=160
NACAAAGGTAATTGCAAGTCCCTTCGTGCCAAAACGTCCAGCCCTTCCAACCCTGTGCAAATAAGTATCAGCTGAGTCTGAATCTGCATTCATTCTGGAATGACTCAGGAAGAAAGGCTAACAAGATATAAGAACTTCAAGGAAGGCCACAAGAGAATTC
+SRR254172.11 ILLUMINA-20A1B2_0004_FC6282EAAXX:6:1:1921:953 length=160
#)0+)**2,,@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@:3:::@@@22:<<:8@@:@@@@@@@IIHIIIIIIII?HHIIFIGIIIIIIEGIGHIIIIFAIBDIHHGEHDBEFIIB<IIHHI3EFEDFC@HH@F@2;8<>@0??
You get one line starting with @, then a line with the sequence, then a line essentially identical to the @ line except starting with + rather than @, and then a line with base quality scores.
Does anyone understand this format and how I can get fastq files for both read1 and read2?
Thank you.
Eric
Comment