Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Gazaldeep
    Junior Member
    • Nov 2016
    • 6

    Illumina paired end adapter contamination problem

    Hello everyone!

    I have rna-seq Illumina paired end reads and want to proceed with adapter trimming.
    I have some confusions:

    1. Does the 5' end of both the forward and reverse reads start from the first base of the insert? Or could there be some adapter contamination also at 5' end?
    From whatever I have read online, there shouldn't be any adapter present at 5' end. But, the data I am analyzing has around 75 reads (out of 7 million for forward read file) with adapter at 5' end. 75 sequences isn't much, but I want to know what causes this..

    2. For the forward reads, some 3' ends may have indexed adapter. In cases where this indexed adapter occurs within the sequence, I should delete the adapter and the following sequence, right? Even if the indexed primer is present at 5' end?? In which case the whole read should be deleted. (Because this was due to absence of insert between two adapters)

    3. Do the 5' ends of reverse reads have barcode sequences or any part of the indexed adapter?? I have 12,399 reads (out of 7 million) that have complete or a part of indexed adapter at 5' end, with a few of them within the reads.


    I am new to rna-seq data analysis, and have gone through lots of tutorials and explanations online, but everything seems to be really confusing at this moment.

    My main concern is: where to expect adapters in illumina forward and reverse reads respectively, and what to do upon encountering unexpected adapters.
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    Originally posted by Gazaldeep View Post
    Hello everyone!

    I have rna-seq Illumina paired end reads and want to proceed with adapter trimming.
    I have some confusions:

    1. Does the 5' end of both the forward and reverse reads start from the first base of the insert? Or could there be some adapter contamination also at 5' end?
    From whatever I have read online, there shouldn't be any adapter present at 5' end. But, the data I am analyzing has around 75 reads (out of 7 million for forward read file) with adapter at 5' end. 75 sequences isn't much, but I want to know what causes this..
    There should be no contamination on 5'-end if you are using standard Illumina kits.

    2. For the forward reads, some 3' ends may have indexed adapter. In cases where this indexed adapter occurs within the sequence, I should delete the adapter and the following sequence, right? Even if the indexed primer is present at 5' end?? In which case the whole read should be deleted. (Because this was due to absence of insert between two adapters)
    Barcodes/Tag reads are never part of the actual read in Illumina sequencing. If you have tags in your sequence then there is something wrong. If you have some reads with no inserts they should be taken care of during trimming.

    My main concern is: where to expect adapters in illumina forward and reverse reads respectively, and what to do upon encountering unexpected adapters.
    Use bbduk from BBMap suite. Search for that thread here. It is straight forward to use and @Brian includes all commercially used adapters in a file included in the package. Just point bbduk to that file and scan/trim your data.

    Comment

    • Gazaldeep
      Junior Member
      • Nov 2016
      • 6

      #3
      Thanks for your reply!!

      Originally posted by GenoMax View Post
      There should be no contamination on 5'-end if you are using standard Illumina kits.
      So, the 72 reads with 5' adapter contamination should be deleted, right?

      Originally posted by GenoMax View Post
      Barcodes/Tag reads are never part of the actual read in Illumina sequencing. If you have tags in your sequence then there is something wrong. If you have some reads with no inserts they should be taken care of during trimming.
      The paired-end data I am trying to analyze was downloaded from DDBJ.

      After searching online and through your answer, I'm sure that I should delete the reads that have any adapter at 5' end (be it the 5' adapter or 3' adapter), and perform trimming for reads with adapter at 3' end or within the read.

      But I'm actually a bit confused about the Illumina sequencing steps.

      Are the barcodes removed after sorting the reads into different files based on different barcodes?? So the files we get in the end cannot have the barcodes, but may they have the constant part of the indexed adapter (which occurs before/after the barcode) or are the constant parts also removed with the barcodes?
      I want to be clear about the process.

      Comment

      • Gazaldeep
        Junior Member
        • Nov 2016
        • 6

        #4
        I could just use a tool for trimming, but before that, I want to be clear about what's happening. Maybe I've got it all wrong?

        Comment

        • GenoMax
          Senior Member
          • Feb 2008
          • 7142

          #5
          Originally posted by Gazaldeep View Post
          Thanks for your reply!!

          But I'm actually a bit confused about the Illumina sequencing steps.
          Check this video out for clarification: https://www.youtube.com/watch?v=HMyCqWhwB8E

          Are the barcodes removed after sorting the reads into different files based on different barcodes?? So the files we get in the end cannot have the barcodes, but may they have the constant part of the indexed adapter (which occurs before/after the barcode) or are the constant parts also removed with the barcodes?
          I want to be clear about the process.
          Illumina sequencing actually proceeds in four separate steps (for 2D barcodes, 3 for 1 D barcodes).

          Code:
          R1 --> R2 (index 1) --> R3 (index 2) --> R4.
          Illumina software keeps tracks of every cluster over R1 through R4. During base calling (conversion of BCL to FASTQ) index read sequences are extracted from R2 (and R3) and are transferred to the header of the FASTQ record to complete demultiplexing (you thus end up with R1/R2 files).

          It is possible to generate files with index reads in individual files so you end up with 4 files per sample. This is only needed for some applications (e.g. QIIME).

          Comment

          • Gazaldeep
            Junior Member
            • Nov 2016
            • 6

            #6
            Thanks!!! Really helpful!!

            In my reads, I have 5' end contaminated with 5' adapter (75 reads). Also, in 12,000 reads out of 7 million, 5' adapter is present with the reads.. what do you suggest? Should I delete those reads? Or should I just trim the adapter and the sequence preceeding it at 5'?? I'm using Cutadapt at present. But in any adapter removal tool, I will have to specify if I want to trim these reads and in what way..

            Sorry if my questions are naive!

            Comment

            Latest Articles

            Collapse

            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


              Here are nine questions we think about, in roughly the order they matter, before...
              06-18-2026, 07:11 AM
            • SEQadmin2
              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
              by SEQadmin2


              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
              ...
              06-02-2026, 10:05 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 06-17-2026, 06:09 AM
            0 responses
            26 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-09-2026, 11:58 AM
            0 responses
            43 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-05-2026, 10:09 AM
            0 responses
            48 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-04-2026, 08:59 AM
            0 responses
            49 views
            0 reactions
            Last Post SEQadmin2  
            Working...