SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
BWA aligned pairs read1=XT:A:U, read2=XT:A:R jflowers Bioinformatics 1 07-29-2014 10:06 PM
How do the Nextera Read1 and Read2 primers both work Phage434 Illumina/Solexa 6 12-22-2013 05:29 AM
Read1 and Read2 are not consistent priya Illumina/Solexa 2 07-16-2013 02:42 AM
Miseq Fastq Format: Forward and Reverse Issue Arupsss Bioinformatics 3 06-19-2012 12:21 PM
Current read1, index, and read2 primers for HiSeq2000 SeqVicious Illumina/Solexa 1 09-26-2011 02:12 PM

Reply
 
Thread Tools
Old 12-23-2015, 02:23 AM   #1
vaibhavvsk
Member
 
Location: Pune

Join Date: Sep 2011
Posts: 14
Question How to know whether its Read1(Forward) or Read2(Reverse) from fastq contents.

As per fastq file description on wikipedia(https://en.wikipedia.org/wiki/FASTQ_format) Illumina Sequence Identifier format as :
Case A. Standard Illumina Format
Read Identifier : @HWUSI-EAS100R:6:73:941:1973#0/1
/1 indicates it is R1 i.e. Forward Read and
/2 indicates it is R2 i.e. Reverse Read

Case B. Illumina with Casava 1.8
Read Identifier : @EAS139:136:FC706VJ:2:2104:15343:197393 1:Y:18:ATCACG
1:Y:18:ATCACG i.e substring 1: indicates it is R1
2:Y:18:ATCACG i.e substring 2: indicates it is R2 Case C : NCBI Sequence Read Archive(SRA)
Read Identifier: @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36

Case C: NCBI SRA fastq format
Read Identifier :
@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36

I'm pasting 4 lines from the paired end data as :

==> SRR1583191_1.fastq <==
@SRR1583191.1 SN7001163:87:C1ME6ACXX:1:1101:1176:2038 length=101
NATCCAGTAGCCTCCTCCCCATCATCTCCCATTTCTTCTACAGGGGGACTCCCCCAGGTCTGGTAGCCCAAAGCTGCTGCTACAGCCGCCATGGGGGGGTG
+SRR1583191.1 SN7001163:87:C1ME6ACXX:1:1101:1176:2038 length=101
#1=DDFFFHHGHGIIIIIIIBFHCHIII[email protected]ACABBCB09

==> SRR1583191_2.fastq <==
@SRR1583191.1 SN7001163:87:C1ME6ACXX:1:1101:1176:2038 length=101
TCCTGTTCTCCCTGCTTGGAGTCTTGGTTGCCTGTGGAAATATCAGGCATGTGAATGGGAAGGCAGGAGTAGACAGTGAATGTGGCCTACTTGATTTGAGG
+SRR1583191.1 SN7001163:87:C1ME6ACXX:1:1101:1176:2038 length=101
CCCFFFFFGHHGHJJJ[email protected][email protected]

Here from the Case C identifier its not clear that which substring from Read Identifier can be used to distinguish R1 & R2.
I tried looking into paired end files from SRA but I could not observe R1 or R2 identifier.

I would like to know about getting R1 R2 information from fastq file contents. Apart from the three cases I would also like to know if there are any such sub strings in other fastq read identifier formats which provides R1 R2 information.
__________________
Vaibhav Kulkarni

Last edited by vaibhavvsk; 12-23-2015 at 02:26 AM.
vaibhavvsk is offline   Reply With Quote
Old 12-23-2015, 06:50 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,495
Default

If you use

Quote:
-F | --origfmt Defline contains only original sequence name.
option when extracting the fastq files from SRA you would potentially recover original Illumina fastq header.
GenoMax is online now   Reply With Quote
Old 12-23-2015, 08:17 AM   #3
Jessica_L
Senior Member
 
Location: Washington, D.C. metro area

Join Date: Feb 2010
Posts: 116
Default

None of the information in this string:

SN7001163:87:C1ME6ACXX:1:1101:1176:2038 length=101

can be used as an identifier for R1 vs R2. The fields are things like the instrument serial number, flow cell ID, lane number, tile number and X/Y coordinates of the cluster.

Genomax's suggestion to recover the original header would be the best option to get the data you're looking for.
Jessica_L is offline   Reply With Quote
Old 12-24-2015, 03:23 AM   #4
vaibhavvsk
Member
 
Location: Pune

Join Date: Sep 2011
Posts: 14
Thumbs up

Quote:
Originally Posted by GenoMax View Post
If you use



option when extracting the fastq files from SRA you would potentially recover original Illumina fastq header.
Hey GenoMax it worked for me. Thanks Jessica_L too!
__________________
Vaibhav Kulkarni
vaibhavvsk is offline   Reply With Quote
Reply

Tags
fastq, fastq files, fastq format, fastq read identifier, read identifier

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:18 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO