Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • removing adapter sequences

    Hello,

    I am working with Illumina Hiseq data (100-bp PE). I am trying to remove adapter sequences using Trimmomatic. I've got adapter sequences from the sequencing core I used. But some of adapter sequences still remain after running Trimmomatic when I checked them using FastQC. Any suggestions would be great. Thanks.

  • #2
    Can you give us an example of what you ran and what you're getting as an output? We can't really help you unless we get some background....

    Comment


    • #3
      I am having essentially the same problem originally psoted above. I want to remove adapter sequences from Illumina 100 bp PE reads. I run the following with Trimmomatic:

      java -classpath ~/Scripts/Trimmomatic-0.30/trimmomatic-0.30.jar org.usadellab.trimmomatic.TrimmomaticPE -phred33 -trimlog trim_2.log R1.fastq R2.fastq T1.fastq T1.unpaired.fastq T2.fastq T2.unpaired.fastq ILLUMINACLIP:adapters.fa:3:40:15 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:15

      My adapters.fa file includes the following (in addition to others):
      >DNA_primers_1
      AGGGAGGACGATGCGG
      >DNA_primers_2
      CCGCTGGAAGTGACTGACAC
      >RNA_linkers_2
      GTGTCAGTCACTTCCAGCGG

      I ran FastQC prior to trimming the adapters and the over-represented sequences include
      Sequence Count Percentage
      CCGCTGGAAGTGAC... 115468 0.44
      AGGGAGGACGATGC... 112267 0.427
      CCGCTGGAAGTGAC... 109341 0.416
      AGGGAGGACGATGC... 105312 0.401
      CCGCTGGAAGTGAC... etc etc
      CCGCTGGAAGTGAC...

      After running Trimmomatic, run FastQC on the new fastq's and get the following for overrpresented sequences:
      Sequence Count Percentage
      CCGCTGGAAGTGAC... 109151 0.426
      AGGGAGGACGATGC... 106128 0.414
      CCGCTGGAAGTGAC... 102985 0.402
      AGGGAGGACGATGC... 99644 0.389
      CCGCTGGAAGTGAC... etc etc
      CCGCTGGAAGTGAC...

      Is this not surprising? I think a lot of the remaining adapter sequences are adapters linked to each other, so they are well represented in the first 10 bps.

      - Andrew

      Comment


      • #4
        Hi Andrew (ahnguyen),

        I think the adapter sequences you are using (for example the 16 bp DNA_primers_1 and the 20 bp DNA_primers_2) are not long enough for Trimmomatic to recognise a match, given the thresholds you are using
        (3:40:15, so 40 for palindrome clipping and 15 for simple clipping).

        If you look at the Trimmomatic web page,



        on the last paragraph of the section titled 'The Adapter Fasta', it explains that
        'Each matching base adds just over 0.6' to the score, so even if your read matches the adapter sequence perfectly, it would score only 20 X 0.6 = 12.
        You have set the threshold for simple clipping to 15, a score which none of your reads will reach, so trimmomatic will not recognize any of the reads as having the adapter sequence you want to trim.

        Comment


        • #5
          how do you create the adapters.fa file?
          I have the same problem

          Comment


          • #6
            The more recent versions of Trimmomatic include adapters.fa files for Illumina Truseq v2 and v3.

            See the link to the Trimmomatic web page that I gave in the post above if you don't already have Trimmomatic installed on your computer.

            Have a look at that, and then if you want to use other adapter sequences you can either add them to the file, or make your own file.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            17 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            22 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            16 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            46 views
            0 likes
            Last Post seqadmin  
            Working...
            X