Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Kmers remain even after trimmomatic trimming

    Hi Guys,

    I have Illumina NGS DNA 150 BP paired end reads.
    My initial Fasqc report indicated the presence of Kmers towards the end of the reads at 145-147.
    I used trimmomatic to trim them off. I trimmed 5 BP from the start of the read and 5 BP at the end of the read which makes my read length 140 bp (and should remove the kmers). However, when I looked at the Fastqc report post filtering, it showed that the kmers still exist but are now in the position 135-136. I have attached the pre and post filtering Fastqc reports if it helps to visualize them.


    My trimmomatic trimming command was as follows:
    java -Xmx15g -classpath trimmomatic-0.22.jar org.usadellab.trimmomatic.TrimmomaticPE -threads 4 -phred33 -trimlog trimmlog_log.txt input_R1.fastq input_R2.fastq output_R1.fq unpaired_output1.fq output_R2.fq unpaired_output2.fq HEADCROP:5 CROP:140 LEADING:20 TRAILING:20 SLIDINGWINDOW:4:20 MINLEN:60

    I would really appreciate it if any one can guide me through this issue as I couldn't figure it out.
    Attached Files

  • #2
    You should do some sequence-specific adapter trimming using the ILLUMINACLIP command.

    Comment


    • #3
      Dear Mastal,

      Thank you for your prompt reply.
      I honestly thought that ILLUMINACLIP command was only for Overrepresented sequences e.g Illumina adapters. I will try using it and see what I get.
      Thank you
      Originally posted by mastal View Post
      You should do some sequence-specific adapter trimming using the ILLUMINACLIP command.

      Comment


      • #4
        It is only for the Illumina adapters.

        The adapters are not in the same position in each read, and they are not present in every read, only when the DNA insert is shorter than the length of one read, so that you read through into some of the adapter sequence.

        Comment


        • #5
          The pre-filter FASTQC kmer plot shows the presence of these enriched kmers are about 50-fold enriched even at base 110 of your reads.
          As mentioned better try a normal adapter trimming and (depending on the purpose of teh experiments) the FASTQC reports also indicate that the reads could use some quality trimming or filtering.

          Comment


          • #6
            I have used the ILLUMINACLIP command to remove the kmers. My command was as follows:
            java -Xmx15g -classpath trimmomatic-0.22.jar org.usadellab.trimmomatic.TrimmomaticPE -threads 4 -phred33 -trimlog trimmlog_log.txt input_R1.fastq input_R2.fastq output_R1.fq unpaired_output1.fq output_R2.fq unpaired_output2.fq ILLUMINACLIP:adapters3.fasta:2:30:10 HEADCROP:5 CROP:140 LEADING:20 TRAILING:20 SLIDINGWINDOW:4:20 MINLEN:60

            I also attached the adapters3.fasta file in which I specified the kmers.
            The program ran at first for a few minutes but its progress has stopped for six hours now even though its running on 4 cores.

            Do you think that my command and adapters format correct?

            Originally posted by mastal View Post
            It is only for the Illumina adapters.

            The adapters are not in the same position in each read, and they are not present in every read, only when the DNA insert is shorter than the length of one read, so that you read through into some of the adapter sequence.
            Attached Files

            Comment


            • #7
              Your trimmomatic command looks OK, but it looks like you are using Illumina barcodes instead of adapter sequences.

              Just use the adapters.fasta file that comes with trimmomatic.

              Do you know what type of Illumina kit was used for library prep?

              Comment


              • #8
                Dear Mastal,

                After following your advise I am glad to say that it worked! I have attached the output of Fastqc herein.
                I have basically used the file TruSeq3-PE-2.fa as adapters sequences and it worked wonderfully. I have asked the fellows who did the library prep about the kit but haven't gotten a reply yet. If they did I will let you know. I am just glad it worked now

                Again, thank you so much for your follow up with me
                Originally posted by mastal View Post
                Your trimmomatic command looks OK, but it looks like you are using Illumina barcodes instead of adapter sequences.

                Just use the adapters.fasta file that comes with trimmomatic.

                Do you know what type of Illumina kit was used for library prep?
                Attached Files

                Comment


                • #9
                  Dear Luc,

                  Thank you for your advise. I have actually trimmed up to 110 previously but the presence of kmers in the Fastqc report persisted. When I used the adapters set supplied by trimmomatic for TruSeq3-PE-2.fa, it helped remove them completely. I have attached the result of that run on my previous reply to mastal.

                  Again thank you so much
                  Originally posted by luc View Post
                  The pre-filter FASTQC kmer plot shows the presence of these enriched kmers are about 50-fold enriched even at base 110 of your reads.
                  As mentioned better try a normal adapter trimming and (depending on the purpose of teh experiments) the FASTQC reports also indicate that the reads could use some quality trimming or filtering.

                  Comment


                  • #10
                    it didnt worked for me

                    I tried to trim with Truseq3 fasta file and even after that I got the first 9bp with kmers. I tried to crop that but after that I got overrepresented reads (which I didn't have before).

                    any help highly appreciated.

                    Comment


                    • #11
                      Perhaps you could post your fastqc results? The first 9 bp are unlikely to be related to adapters, but some systems (like Nextera) will have highly biased initial bases.

                      Comment


                      • #12
                        It didnt worked for me

                        sure! sorry about that.
                        Attached Files

                        Comment


                        • #13
                          You need to ask what kind of library this was. The spikiness at the beginning is similar to Nextera libraries (in which case you should not trim the first 9bp), but I'm not quite sure.

                          If it is a Nextera library, then it makes sense that trimming Truseq adapters will not change anything. I don't understand fastqc's relative enrichment graph, though, so I won't comment on that.

                          The Nextera adapter sequences are included with BBTools in the resources directory (nextera.fa.gz). But I recommend that you consult the source of the library to find out what adapters were used before simply trying that to see if it works.

                          Comment


                          • #14
                            This is RNA-seq data from wheat, Could be an adapter that I don't have?

                            Comment


                            • #15
                              Yes. The organism and data-source are generally independent of the adapter type. Although, there is an RNA-specific set of TruSeq adapters, which is also included with BBTools as truseq_rna.fa.gz.
                              Last edited by Brian Bushnell; 03-13-2015, 08:50 AM.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-27-2024, 06:37 PM
                              0 responses
                              12 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-27-2024, 06:07 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              69 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X