View Single Post
Old 12-23-2015, 02:23 AM   #1
vaibhavvsk
Member
 
Location: Pune

Join Date: Sep 2011
Posts: 14
Question How to know whether its Read1(Forward) or Read2(Reverse) from fastq contents.

As per fastq file description on wikipedia(https://en.wikipedia.org/wiki/FASTQ_format) Illumina Sequence Identifier format as :
Case A. Standard Illumina Format
Read Identifier : @HWUSI-EAS100R:6:73:941:1973#0/1
/1 indicates it is R1 i.e. Forward Read and
/2 indicates it is R2 i.e. Reverse Read

Case B. Illumina with Casava 1.8
Read Identifier : @EAS139:136:FC706VJ:2:2104:15343:197393 1:Y:18:ATCACG
1:Y:18:ATCACG i.e substring 1: indicates it is R1
2:Y:18:ATCACG i.e substring 2: indicates it is R2 Case C : NCBI Sequence Read Archive(SRA)
Read Identifier: @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36

Case C: NCBI SRA fastq format
Read Identifier :
@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36

I'm pasting 4 lines from the paired end data as :

==> SRR1583191_1.fastq <==
@SRR1583191.1 SN7001163:87:C1ME6ACXX:1:1101:1176:2038 length=101
NATCCAGTAGCCTCCTCCCCATCATCTCCCATTTCTTCTACAGGGGGACTCCCCCAGGTCTGGTAGCCCAAAGCTGCTGCTACAGCCGCCATGGGGGGGTG
+SRR1583191.1 SN7001163:87:C1ME6ACXX:1:1101:1176:2038 length=101
#1=DDFFFHHGHGIIIIIIIBFHCHIIIIIEHIIGIIGIIIIHIIIIGIIIIIIIIGHCHFEFFFCEEECBBCCCCCCCCCCCCCCCCBB9@ACABBCB09

==> SRR1583191_2.fastq <==
@SRR1583191.1 SN7001163:87:C1ME6ACXX:1:1101:1176:2038 length=101
TCCTGTTCTCCCTGCTTGGAGTCTTGGTTGCCTGTGGAAATATCAGGCATGTGAATGGGAAGGCAGGAGTAGACAGTGAATGTGGCCTACTTGATTTGAGG
+SRR1583191.1 SN7001163:87:C1ME6ACXX:1:1101:1176:2038 length=101
CCCFFFFFGHHGHJJJJIICGFGHHGGHIIIIIGFCG9CGHEHIIJJJHIGHJIIIJJIHIIIJIJJIHCEEHCEFEF3@C@CCCDBDCDDDDCCCDDDDD

Here from the Case C identifier its not clear that which substring from Read Identifier can be used to distinguish R1 & R2.
I tried looking into paired end files from SRA but I could not observe R1 or R2 identifier.

I would like to know about getting R1 R2 information from fastq file contents. Apart from the three cases I would also like to know if there are any such sub strings in other fastq read identifier formats which provides R1 R2 information.
__________________
Vaibhav Kulkarni

Last edited by vaibhavvsk; 12-23-2015 at 02:26 AM.
vaibhavvsk is offline   Reply With Quote