Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trimmomatic paired end- dropped reverse reads

    Hi,
    I am working on a re-sequencing project and have sequenced some whole genomes using Illumina HiSeq 2000 (150 bp paired end reads), which I hope to later align to an existing reference genome. I would like to remove any possible adapter contamination with Trimmomatic, but have run into the problem that in 70- 80% of my reads the reverse read is being dropped and the forward only is surviving. When I use the "keep both reads" parameter, then both pairs survive for about 97% of reads. So my question is... does this mean that more than 70% of my reads have "adapter read through", or have I done something wrong in my adapter file?

    The adapters used were:

    P5 adapter: AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
    P7 adapter: CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT


    The adapter file I created looks as follows:

    >PrefixPE/1
    AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
    >PrefixPE/2
    CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
    >P5
    AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
    >P7
    CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT


    An example of my script (using purposefully lenient quality control)...

    java -jar ~/bin/trimmomatic.jar PE -phred33 -trimlog ten_trimLog 10_R1.gz 10_R2.gz Ten_out_1P.fq.gz Ten_out_1U.fq.gz Ten_out_2P.fq.gz Ten_out_2U.fq.gz ILLUMINACLIP:P5_P7.fa:2:30:10 LEADING:2 TRAILING:2 MAXINFO:40:0.2 MINLEN:36

    ... and the resulting output:

    ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences

    Input Read Pairs: 41173498 Both Surviving: 9680654 (23.51%) Forward Only Surviving: 30355811 (73.73%) Reverse Only Surviving: 26285 (0.06%) Dropped: 1110748 (2.70%)
    TrimmomaticPE: Completed successfully


    I'm new to Trimmomatic, so apologize in advance if this is something obvious!
    Thanks!
    Meli

  • #2
    As a guess I think that the use of Ns in the adapter file would cause problems since the program could match anything to those bases. But I am not sure about this.

    Comment


    • #3
      @Meli: What are you not using the adapter file provided with Trimmomatic?

      Did FastQC analysis show presence of adapter contamination in your data (indicative of shorter than expected inserts)?

      Comment


      • #4
        I would just limit your P7 to GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT. If Trimmomatic encounters this sequence it will clip the read, so no need to specify the rest, and the barcode with Ns (which might complicate things as suggested earlier).

        Comment


        • #5
          Thanks all! Sorry I'm not seeing your replies until now, because I thought I'd get notifications and just assumed no one had answered. I also had my suspicions that it might be the N's, but wasn't sure if the "keep both reads" parameter forces the program to keep both reads regardless of the reason they've been cut (adapter contamination, low quality etc) or only if they are being thrown out because they are redundant to the forward read because of adapter read-through. I will try using the shortened adapter sequence as suggested... thanks!

          Comment


          • #6
            keep both reads refers to adapter read through. In the earlier versions of trimmomatic the 'keep both reads' option didn't exist, and the default behaviour was to drop R2 entirely when adapters were trimmed because of read through, the reasoning being that in those cases R2 did not provide any additional information since the insert for that read pair was shorter than the length of one read. hope this helps.

            Comment


            • #7
              Originally posted by mastal View Post
              keep both reads refers to adapter read through. In the earlier versions of trimmomatic the 'keep both reads' option didn't exist, and the default behaviour was to drop R2 entirely when adapters were trimmed because of read through, the reasoning being that in those cases R2 did not provide any additional information since the insert for that read pair was shorter than the length of one read. hope this helps.
              OK, that is also what I understood from the manual... but in that case, if 70% of my reverse reads are rescued when using this option, that must mean that the reason they were dropped in the first place was because of read through and NOT because of the N's in my adapter sequence if I understand correctly?

              Comment


              • #8
                Originally posted by wdecoster View Post
                I would just limit your P7 to GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT. If Trimmomatic encounters this sequence it will clip the read, so no need to specify the rest, and the barcode with Ns (which might complicate things as suggested earlier).

                Using the truncated adapter sequence just led to even more reads being dropped:

                TrimmomaticPE: Started with arguments: -phred33 -trimlog five_trunc_Log R1_zcat.gz R2_zcat.gz 5_trunc_1P.fq.gz 5_trunc_1U.fq.gz 5_trunc_2P.fq.gz 5_trunc_2U.fq.gz ILLUMINACLIP:P5_P7_trunc.fa:2:30:10 LEADING:2 TRAILING:2 MAXINFO:40:0.2 MINLEN:36
                Multiple cores found: Using 16 threads
                Using PrefixPair: 'AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
                Using Long Clipping Sequence: 'AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT'
                Using Long Clipping Sequence: 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
                ILLUMINACLIP: Using 1 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
                Input Read Pairs: 130953595 Both Surviving: 21891860 (16.72%) Forward Only Surviving: 105313891 (80.42%) Reverse Only Surviving: 5086 (0.00%) Dropped: 3742758 (2.86%)
                TrimmomaticPE: Completed successfully

                Comment


                • #9
                  Do one run of trimmomatic using only the Illuminaclip trimming, and then you will know how many of the reads are being dropped because of adapters and not because of other quality issues.

                  Comment


                  • #10
                    Originally posted by mastal View Post
                    Do one run of trimmomatic using only the Illuminaclip trimming, and then you will know how many of the reads are being dropped because of adapters and not because of other quality issues.
                    I still got 83% of forward only surviving when I ran just the IlluminaClip

                    Comment


                    • #11
                      OK, so it looks like a large percentage of your reads have adapter read-through.

                      Comment


                      • #12
                        Originally posted by wdecoster View Post
                        I would just limit your P7 to GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT. If Trimmomatic encounters this sequence it will clip the read, so no need to specify the rest, and the barcode with Ns (which might complicate things as suggested earlier).
                        Using the OTHER half of the adapter saved the reverse reads, but I'm not sure if it is at all correct to do so...

                        Comment


                        • #13
                          Not sure if this thread can be of use to anyone else, but just in case...

                          I finally figured out what was wrong with my adapter file by running the "identify adapters" tool in AdapterRemoval:

                          AdapterRemoval v2 - rapid adapter trimming, identification, and read merging - File not found · MikkelSchubert/adapterremoval


                          Found that the adapters in my sequence were the reverse complement of what I'd been provided and also that they were on the opposite read (fwd<-->rev). Now trimmomatic seems to be running smoothly and isn't throwing away the reverse reads.

                          Thanks for all your help!

                          Comment


                          • #14
                            Yes, the fasta adapter sequences in trimmomatic are designed to work that way for paired-end mode.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM
                            • seqadmin
                              Techniques and Challenges in Conservation Genomics
                              by seqadmin



                              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                              Avian Conservation
                              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                              03-08-2024, 10:41 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Yesterday, 06:37 PM
                            0 responses
                            8 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, Yesterday, 06:07 PM
                            0 responses
                            8 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-22-2024, 10:03 AM
                            0 responses
                            49 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-21-2024, 07:32 AM
                            0 responses
                            67 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X