Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to know whether its Read1(Forward) or Read2(Reverse) from fastq contents.

    As per fastq file description on wikipedia(https://en.wikipedia.org/wiki/FASTQ_format) Illumina Sequence Identifier format as :
    Case A. Standard Illumina Format
    Read Identifier : @HWUSI-EAS100R:6:73:941:1973#0/1
    /1 indicates it is R1 i.e. Forward Read and
    /2 indicates it is R2 i.e. Reverse Read

    Case B. Illumina with Casava 1.8
    Read Identifier : @EAS139:136:FC706VJ:2:2104:15343:197393 1:Y:18:ATCACG
    1:Y:18:ATCACG i.e substring 1: indicates it is R1
    2:Y:18:ATCACG i.e substring 2: indicates it is R2 Case C : NCBI Sequence Read Archive(SRA)
    Read Identifier: @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36

    Case C: NCBI SRA fastq format
    Read Identifier :
    @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36

    I'm pasting 4 lines from the paired end data as :

    ==> SRR1583191_1.fastq <==
    @SRR1583191.1 SN7001163:87:C1ME6ACXX:1:1101:1176:2038 length=101
    NATCCAGTAGCCTCCTCCCCATCATCTCCCATTTCTTCTACAGGGGGACTCCCCCAGGTCTGGTAGCCCAAAGCTGCTGCTACAGCCGCCATGGGGGGGTG
    +SRR1583191.1 SN7001163:87:C1ME6ACXX:1:1101:1176:2038 length=101
    #1=DDFFFHHGHGIIIIIIIBFHCHIIIIIEHIIGIIGIIIIHIIIIGIIIIIIIIGHCHFEFFFCEEECBBCCCCCCCCCCCCCCCCBB9@ACABBCB09

    ==> SRR1583191_2.fastq <==
    @SRR1583191.1 SN7001163:87:C1ME6ACXX:1:1101:1176:2038 length=101
    TCCTGTTCTCCCTGCTTGGAGTCTTGGTTGCCTGTGGAAATATCAGGCATGTGAATGGGAAGGCAGGAGTAGACAGTGAATGTGGCCTACTTGATTTGAGG
    +SRR1583191.1 SN7001163:87:C1ME6ACXX:1:1101:1176:2038 length=101
    CCCFFFFFGHHGHJJJJIICGFGHHGGHIIIIIGFCG9CGHEHIIJJJHIGHJIIIJJIHIIIJIJJIHCEEHCEFEF3@C@CCCDBDCDDDDCCCDDDDD

    Here from the Case C identifier its not clear that which substring from Read Identifier can be used to distinguish R1 & R2.
    I tried looking into paired end files from SRA but I could not observe R1 or R2 identifier.

    I would like to know about getting R1 R2 information from fastq file contents. Apart from the three cases I would also like to know if there are any such sub strings in other fastq read identifier formats which provides R1 R2 information.
    Last edited by vaibhavvsk; 12-23-2015, 03:26 AM.
    Vaibhav Kulkarni

  • #2
    If you use

    -F | --origfmt Defline contains only original sequence name.
    option when extracting the fastq files from SRA you would potentially recover original Illumina fastq header.

    Comment


    • #3
      None of the information in this string:

      SN7001163:87:C1ME6ACXX:1:1101:1176:2038 length=101

      can be used as an identifier for R1 vs R2. The fields are things like the instrument serial number, flow cell ID, lane number, tile number and X/Y coordinates of the cluster.

      Genomax's suggestion to recover the original header would be the best option to get the data you're looking for.

      Comment


      • #4
        Originally posted by GenoMax View Post
        If you use



        option when extracting the fastq files from SRA you would potentially recover original Illumina fastq header.
        Hey GenoMax it worked for me. Thanks Jessica_L too!
        Vaibhav Kulkarni

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Advances in Sequencing Analysis Tools
          by seqadmin


          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
          Yesterday, 07:48 AM
        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 06:57 AM
        0 responses
        7 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 07:17 AM
        0 responses
        13 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-02-2024, 08:06 AM
        0 responses
        19 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-30-2024, 12:17 PM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Working...
        X