Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fastq file format for paired end sequences

    Hi,

    I have got my sequencing data from a sequencing-core-facility. It has been done with illumina paired end sequencing. But the reads identifiers for the forward and reverse read of one sequence is not match at all. In addition the second part of identifier (related to the paired number) is always one.
    The other problem is with the indexes, they are not same some times in an individual file.

    e.g.
    Read 1:
    @HWI-ST1018:135:H0A9YADXX:1:1101:1124:1996 1:N:0:GGCTAT

    @HWI-ST1018:135:H0A9YADXX:1:1101:2172:1979 1:N:0:GGCTAC

    @HWI-ST1018:135:H0A9YADXX:1:1101:2146:1994 1:N:0:GGCTAC




    Read 2:
    @HWI-ST1018:135:H0A9YADXX:2:1101:1400:1999 1:N:0:GGCTAC

    @HWI-ST1018:135:H0A9YADXX:2:1101:1657:1985 1:N:0:GGCTAC

    @HWI-ST1018:135:H0A9YADXX:2:1101:1612:1996 1:N:0:GGCTAC


    Could you please help me with identifying the format of my files?

    Thanks,
    Rozita

  • #2
    Contact your core facility to find out what they've done.

    Both those files are read denoted as being read 1. See http://en.wikipedia.org/wiki/FASTQ_format for the header description.

    @HWI-ST1018:135:H0A9YADXX:1:1101:1124:1996 1:N:0:GGCTAT

    I would also recommend they use 0 errors when demultiplexing. The error should be low enough not to have to include indexes with errors unless there was a problem during the run (BMS during index read).

    Comment


    • #3
      Thanks Tony. Yes both files have read 1. and they don't have the same identifier. I just wanted to be sure that there wouldn't be any other format for fastq apart from the one which you also have mentioned.

      Comment


      • #4
        Something is up with that data. Your sequencing facility should be able to help you out.

        Comment


        • #5
          Originally posted by rozitaa View Post
          Hi,

          I have got my sequencing data from a sequencing-core-facility. It has been done with illumina paired end sequencing. But the reads identifiers for the forward and reverse read of one sequence is not match at all. In addition the second part of identifier (related to the paired number) is always one.
          The other problem is with the indexes, they are not same some times in an individual file.

          e.g.
          Read 1:
          @HWI-ST1018:135:H0A9YADXX:1:1101:1124:1996 1:N:0:GGCTAT

          @HWI-ST1018:135:H0A9YADXX:1:1101:2172:1979 1:N:0:GGCTAC

          @HWI-ST1018:135:H0A9YADXX:1:1101:2146:1994 1:N:0:GGCTAC




          Read 2:
          @HWI-ST1018:135:H0A9YADXX:2:1101:1400:1999 1:N:0:GGCTAC

          @HWI-ST1018:135:H0A9YADXX:2:1101:1657:1985 1:N:0:GGCTAC

          @HWI-ST1018:135:H0A9YADXX:2:1101:1612:1996 1:N:0:GGCTAC


          Could you please help me with identifying the format of my files?

          Thanks,
          Rozita
          Those sets of reads come from two different lanes; lane 1 and lane 2 as indicated by the number shown in red.

          Comment


          • #6
            Yes I see. Thanks. But they are in a same file representing 2 reads of one seq. I should contact them and figure it out.

            Comment


            • #7
              Fastq file format for paired end sequences

              The R1 and R2 reads of a pair are usually in different files.

              How many files did you get from the sequence provider, and what were the files called?

              Comment


              • #8
                Actually, I got one file for each sample (e.g. "P424_101_index11"). inside that there are two different files ("130419_AH02WFADXX", "130423_AH0A9YADXX") and based on their words only one of them is the experiment which is valid (the red one). In the inner directory I can file two fastq files ("1_130423_AH0A9YADXX_P424_101_index11_1.fastq" and "2_130423_AH0A9YADXX_P424_101_index11_1.fastq"). Some of the lines of each files are presented previously as examples.

                Comment


                • #9
                  Fastq file format for paired end sequences

                  You need to contact the sequence provider and find out what they did.

                  If they ran a paired-end experiment, then you should have files with the R2 reads matching the R1 reads that you already have.

                  You appear to have two files for the same sample, run on lane 1 and lane 2, and from what you showed previously, both files are R1.

                  Running the samples in more than one lane would be expected if you woudn't get enough reads from one lane of sequencing, or if you have several multiplexed samples, and you want to run each sample in the same lanes so as to avoid lane effects.

                  You need to find out whether the sequencing center performed a single-end or paired-end run with your samples, and if they did do a paired-end run, what have they done with the R2 files.

                  Comment


                  • #10
                    Yeah, Thanks all.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin


                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                      Yesterday, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    45 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    46 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    39 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    55 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X