Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Bio.X2Y
    Member
    • Apr 2010
    • 46

    Why are Illumina paired-end SRA datasets made up of 3 FASTQ files?

    I'm looking at some NCBI SRA datasets for Paired-End Illumina Rna-seq.

    In each case, the dataset is made up of 3 fastq files, even though I would only expect 2 (one for each end).

    Example:

    SRR018256.fastq (2,048,908 lines)
    SRR018256_1.fastq (50,313,152 lines)
    SRR018256_2.fastq (50,313,152 lines)

    All files look OK, and the _1 and _2 files have the same number of lines, as I would expect.

    Does anyone have any idea what the third file might be?

    Thanks.
  • Pepe
    Member
    • Mar 2009
    • 30

    #2
    What i do with my paired end reads is to filter out the ones that have adapters or bad quality. Then I take the pairs of the removed ones and I put them in a separate file, so the 2 paired end files have the same number of reads and in the same order but I can still use the 'pairless' reads in the analysis.
    Maybe they did the same?

    Comment

    • Bio.X2Y
      Member
      • Apr 2010
      • 46

      #3
      Thanks Pepe, that makes sense.

      Does anyone else have other possible explanations? Cheers

      Comment

      • Chipper
        Senior Member
        • Mar 2008
        • 323

        #4
        Unpaired reads.

        Comment

        • Bio.X2Y
          Member
          • Apr 2010
          • 46

          #5
          Hi, is it possible to get unpaired reads from a paired-end experiment? I'm not very familiar with the procedure.

          Comment

          • GW_OK
            Senior Member
            • Sep 2009
            • 411

            #6
            I should think so. it is possible that something went wrong with one read or the other, leaving a lonely, unpaired read.

            Comment

            • simonandrews
              Simon Andrews
              • May 2009
              • 870

              #7
              I'm not sure the Illumina pipeline can create unpaired reads. The basis for the sequencing is an initial identification of regions followed by tracking those regions to determine sequence. When you do a paired end read there is no separate cluster detection in the second read, meaning that you use exactly the same regions as the first read.

              For the output from the pipeline you get only two sequence files, one for each read, which always contain the same number of sequences and always come in the same order so you can match up pairs of sequences. If stuff goes wrong you'll just end up with a bunch of sequences full of poly-N.

              If the file is for unpaired sequences then it must have been something which the researchers created from the original data, as the pipeline itself won't create this.

              Could it be a trial run before the main sequencing run? We do this routinely with our libraries - doing 10% of a lane with them to see if they look OK before going on to do a full run.

              Comment

              • Bio.X2Y
                Member
                • Apr 2010
                • 46

                #8
                Hi Simon, thanks for this. Out of interest, when you say you do 10% of a lane, how is this done? I'm not very familiar with the sequencing procedure itself, but I imagined it was an all-or-nothing, and you couldn't back out after 10%. Do you mean you are watching some results in real time (i.e. nothing to do with the GAPipeline), and making a decision to abandon if necessary after 10%? If you do abandon, does this mean the flowcell is effectively wasted? If that's what happened here (in the decribed experiment), would the authors have needed to run image analysis, etc. on partially complete reads? Wouldn't that mean that (a) they wouldn't have full length reads, e.g. they would only have 5 bases per read out of 50 potential bases and (b) they would still be paired? Thanks for your help!

                Comment

                • simonandrews
                  Simon Andrews
                  • May 2009
                  • 870

                  #9
                  Originally posted by Bio.X2Y View Post
                  Hi Simon, thanks for this. Out of interest, when you say you do 10% of a lane, how is this done? I'm not very familiar with the sequencing procedure itself, but I imagined it was an all-or-nothing, and you couldn't back out after 10%.
                  We still run a control lane on each flowcell because of the nature of many of our libraries. What we can therefore do is to mix in 10% of another sample alongside the PhiX and then extract out everything which doesn't map to PhiX at the end of the run to get a small scale view of the other library.

                  Comment

                  • spadejac
                    Junior Member
                    • Sep 2009
                    • 4

                    #10
                    No other explanations. Here is NCBI documentation about it:

                    SRR000001.fastq – Fragment library data, or unpaired mates from a paired library.
                    SRR000001_1.fastq – First mate sequence.
                    SRR000001_2.fastq – Second mate sequence in the submitted orientation.

                    Comment

                    Latest Articles

                    Collapse

                    • SEQadmin2
                      Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                      by SEQadmin2


                      I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                      Here are nine questions we think about, in roughly the order they matter, before...
                      Yesterday, 07:11 AM
                    • SEQadmin2
                      From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                      by SEQadmin2


                      Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                      The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                      ...
                      06-02-2026, 10:05 AM
                    • SEQadmin2
                      Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                      by SEQadmin2


                      With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                      Introduction

                      Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                      05-22-2026, 06:42 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, 06-17-2026, 06:09 AM
                    0 responses
                    20 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-09-2026, 11:58 AM
                    0 responses
                    38 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-05-2026, 10:09 AM
                    0 responses
                    44 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-04-2026, 08:59 AM
                    0 responses
                    49 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...