Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • removing adapter sequences

    Hello,

    I am working with Illumina Hiseq data (100-bp PE). I am trying to remove adapter sequences using Trimmomatic. I've got adapter sequences from the sequencing core I used. But some of adapter sequences still remain after running Trimmomatic when I checked them using FastQC. Any suggestions would be great. Thanks.

  • #2
    Can you give us an example of what you ran and what you're getting as an output? We can't really help you unless we get some background....

    Comment


    • #3
      I am having essentially the same problem originally psoted above. I want to remove adapter sequences from Illumina 100 bp PE reads. I run the following with Trimmomatic:

      java -classpath ~/Scripts/Trimmomatic-0.30/trimmomatic-0.30.jar org.usadellab.trimmomatic.TrimmomaticPE -phred33 -trimlog trim_2.log R1.fastq R2.fastq T1.fastq T1.unpaired.fastq T2.fastq T2.unpaired.fastq ILLUMINACLIP:adapters.fa:3:40:15 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:15

      My adapters.fa file includes the following (in addition to others):
      >DNA_primers_1
      AGGGAGGACGATGCGG
      >DNA_primers_2
      CCGCTGGAAGTGACTGACAC
      >RNA_linkers_2
      GTGTCAGTCACTTCCAGCGG

      I ran FastQC prior to trimming the adapters and the over-represented sequences include
      Sequence Count Percentage
      CCGCTGGAAGTGAC... 115468 0.44
      AGGGAGGACGATGC... 112267 0.427
      CCGCTGGAAGTGAC... 109341 0.416
      AGGGAGGACGATGC... 105312 0.401
      CCGCTGGAAGTGAC... etc etc
      CCGCTGGAAGTGAC...

      After running Trimmomatic, run FastQC on the new fastq's and get the following for overrpresented sequences:
      Sequence Count Percentage
      CCGCTGGAAGTGAC... 109151 0.426
      AGGGAGGACGATGC... 106128 0.414
      CCGCTGGAAGTGAC... 102985 0.402
      AGGGAGGACGATGC... 99644 0.389
      CCGCTGGAAGTGAC... etc etc
      CCGCTGGAAGTGAC...

      Is this not surprising? I think a lot of the remaining adapter sequences are adapters linked to each other, so they are well represented in the first 10 bps.

      - Andrew

      Comment


      • #4
        Hi Andrew (ahnguyen),

        I think the adapter sequences you are using (for example the 16 bp DNA_primers_1 and the 20 bp DNA_primers_2) are not long enough for Trimmomatic to recognise a match, given the thresholds you are using
        (3:40:15, so 40 for palindrome clipping and 15 for simple clipping).

        If you look at the Trimmomatic web page,



        on the last paragraph of the section titled 'The Adapter Fasta', it explains that
        'Each matching base adds just over 0.6' to the score, so even if your read matches the adapter sequence perfectly, it would score only 20 X 0.6 = 12.
        You have set the threshold for simple clipping to 15, a score which none of your reads will reach, so trimmomatic will not recognize any of the reads as having the adapter sequence you want to trim.

        Comment


        • #5
          how do you create the adapters.fa file?
          I have the same problem

          Comment


          • #6
            The more recent versions of Trimmomatic include adapters.fa files for Illumina Truseq v2 and v3.

            See the link to the Trimmomatic web page that I gave in the post above if you don't already have Trimmomatic installed on your computer.

            Have a look at that, and then if you want to use other adapter sequences you can either add them to the file, or make your own file.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin


              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
              Today, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            37 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            41 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            35 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            54 views
            0 likes
            Last Post seqadmin  
            Working...
            X