Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fastq file format for paired end sequences

    Hi,

    I have got my sequencing data from a sequencing-core-facility. It has been done with illumina paired end sequencing. But the reads identifiers for the forward and reverse read of one sequence is not match at all. In addition the second part of identifier (related to the paired number) is always one.
    The other problem is with the indexes, they are not same some times in an individual file.

    e.g.
    Read 1:
    @HWI-ST1018:135:H0A9YADXX:1:1101:1124:1996 1:N:0:GGCTAT

    @HWI-ST1018:135:H0A9YADXX:1:1101:2172:1979 1:N:0:GGCTAC

    @HWI-ST1018:135:H0A9YADXX:1:1101:2146:1994 1:N:0:GGCTAC




    Read 2:
    @HWI-ST1018:135:H0A9YADXX:2:1101:1400:1999 1:N:0:GGCTAC

    @HWI-ST1018:135:H0A9YADXX:2:1101:1657:1985 1:N:0:GGCTAC

    @HWI-ST1018:135:H0A9YADXX:2:1101:1612:1996 1:N:0:GGCTAC


    Could you please help me with identifying the format of my files?

    Thanks,
    Rozita

  • #2
    Contact your core facility to find out what they've done.

    Both those files are read denoted as being read 1. See http://en.wikipedia.org/wiki/FASTQ_format for the header description.

    @HWI-ST1018:135:H0A9YADXX:1:1101:1124:1996 1:N:0:GGCTAT

    I would also recommend they use 0 errors when demultiplexing. The error should be low enough not to have to include indexes with errors unless there was a problem during the run (BMS during index read).

    Comment


    • #3
      Thanks Tony. Yes both files have read 1. and they don't have the same identifier. I just wanted to be sure that there wouldn't be any other format for fastq apart from the one which you also have mentioned.

      Comment


      • #4
        Something is up with that data. Your sequencing facility should be able to help you out.

        Comment


        • #5
          Originally posted by rozitaa View Post
          Hi,

          I have got my sequencing data from a sequencing-core-facility. It has been done with illumina paired end sequencing. But the reads identifiers for the forward and reverse read of one sequence is not match at all. In addition the second part of identifier (related to the paired number) is always one.
          The other problem is with the indexes, they are not same some times in an individual file.

          e.g.
          Read 1:
          @HWI-ST1018:135:H0A9YADXX:1:1101:1124:1996 1:N:0:GGCTAT

          @HWI-ST1018:135:H0A9YADXX:1:1101:2172:1979 1:N:0:GGCTAC

          @HWI-ST1018:135:H0A9YADXX:1:1101:2146:1994 1:N:0:GGCTAC




          Read 2:
          @HWI-ST1018:135:H0A9YADXX:2:1101:1400:1999 1:N:0:GGCTAC

          @HWI-ST1018:135:H0A9YADXX:2:1101:1657:1985 1:N:0:GGCTAC

          @HWI-ST1018:135:H0A9YADXX:2:1101:1612:1996 1:N:0:GGCTAC


          Could you please help me with identifying the format of my files?

          Thanks,
          Rozita
          Those sets of reads come from two different lanes; lane 1 and lane 2 as indicated by the number shown in red.

          Comment


          • #6
            Yes I see. Thanks. But they are in a same file representing 2 reads of one seq. I should contact them and figure it out.

            Comment


            • #7
              Fastq file format for paired end sequences

              The R1 and R2 reads of a pair are usually in different files.

              How many files did you get from the sequence provider, and what were the files called?

              Comment


              • #8
                Actually, I got one file for each sample (e.g. "P424_101_index11"). inside that there are two different files ("130419_AH02WFADXX", "130423_AH0A9YADXX") and based on their words only one of them is the experiment which is valid (the red one). In the inner directory I can file two fastq files ("1_130423_AH0A9YADXX_P424_101_index11_1.fastq" and "2_130423_AH0A9YADXX_P424_101_index11_1.fastq"). Some of the lines of each files are presented previously as examples.

                Comment


                • #9
                  Fastq file format for paired end sequences

                  You need to contact the sequence provider and find out what they did.

                  If they ran a paired-end experiment, then you should have files with the R2 reads matching the R1 reads that you already have.

                  You appear to have two files for the same sample, run on lane 1 and lane 2, and from what you showed previously, both files are R1.

                  Running the samples in more than one lane would be expected if you woudn't get enough reads from one lane of sequencing, or if you have several multiplexed samples, and you want to run each sample in the same lanes so as to avoid lane effects.

                  You need to find out whether the sequencing center performed a single-end or paired-end run with your samples, and if they did do a paired-end run, what have they done with the R2 files.

                  Comment


                  • #10
                    Yeah, Thanks all.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Advancing Precision Medicine for Rare Diseases in Children
                      by seqadmin




                      Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                      12-16-2024, 07:57 AM
                    • seqadmin
                      Recent Advances in Sequencing Technologies
                      by seqadmin



                      Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                      Long-Read Sequencing
                      Long-read sequencing has seen remarkable advancements,...
                      12-02-2024, 01:49 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 12-17-2024, 10:28 AM
                    0 responses
                    23 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-13-2024, 08:24 AM
                    0 responses
                    42 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-12-2024, 07:41 AM
                    0 responses
                    28 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-11-2024, 07:45 AM
                    0 responses
                    42 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X