Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hi GeoMax,

    Thank you so much for all the help.

    Yes, I saw that and for that reason I posted the percent surviving report.
    Yes, I ran FastQC before processing, but after processing FastQC was giving me lots of troubles. Anyway,
    Before Timming, total sequence was 3032230, and it didn’t pass per base sequence quality and the adapter content. From the graph, the adapter content (which was Nextera Transposase sequence) which was started at 40bp position and went little over 60% till 229 bp position. I am not sure how to interpret this without sending the graph. But I am not sure how to paste this graph. I tried printScrn, it is still not pasting.
    After trimming, total sequence is 1084634, but this sample passed both per base sequence quality and adapter content.

    How to check the adapter contamination? Is there a way?

    I am not clear about your last question: "How long were the reads to begin with (you have asked for minlength 36)?"
    Yes, I asked for minimum length 36, so it would drop the read if it is below 36. And my code at the end was:
    ILLUMINACLIP:/home/mydir/Trimmomatic-0.33/adapters/NexteraPE-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
    Output was :
    Using PrefixPair: 'AGATGTGTATAAGAGACAG' and 'AGATGTGTATAAGAGACAG'
    Using Long Clipping Sequence: 'GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG'
    Using Long Clipping Sequence: 'TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG'
    Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTGACGCTGCCGACGA'
    Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTCCGAGCCCACGAGAC'
    ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
    Input Read Pairs: 3032230 Both Surviving: 1084634 (35.77%) Forward Only Surviving: 1933475 (63.76%) Reverse Only Surviving: 606 (0.02%) Dropped: 13515 (0.45%)
    TrimmomaticPE: Completed successfully.

    Comment


    • Originally posted by Brian Bushnell View Post
      Hi all,

      I'm testing Trimmomatic's performance on adapter removal, and the results are mysteriously bad. So I'd like to make sure I'm not doing anything wrong. This is my command line (modified from the website):

      java -Xmx8g -jar trimmomatic-0.32.jar SE -phred33 dirty.fq tclean.fq ILLUMINACLIP:gruseq.fa:2:30:10

      ...where dirty.fq is a file containing reads with adapter sequences and gruseq.fa is a file containing the adapter sequences. The adapters are inserted synthetically and the reads are tagged, so I know precisely what the correct results should be, and what I'm getting is not really close. Any suggestions?
      Hello Brian,

      did you figure out what was the issue? I'm perplexed - I wanted to use Trimmomatic and BBduk and compare the results, however, Trimmomatic just would not remove the adapter that is absolutely certainly there and could be found using simply using grep. And they are precisely the case for the palindrome mode.

      So I have my test reads that are just two reads - one R1, one R2.

      Code:
      @E00513:47:HF757ALXX:1:1101:17208:2047 1:N:0:ATCACG
      CNAAAAAAAAGATTGCGACCTCGATGTTGGATTAAAATGAACTTTTGGCGCAAAAGTTAAAAGGGTTAGGTCTGTTCGACCTTTAAAATTTTAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCACGTATGCCGTCTTC
      +
      <#AAFAJJJJJFJFJFJJJJF<AFJJJJJJFFFJJJJJ<JF-FJJJJJFJA<-FAF-FFJAJFJFAFJJJAAF<<7<JJJ<JJJAJJJJJJJJJFJFJFFFJJFA<FJ<FJ7F<7--7F<JAFAAAFFFJJJ<-FA7FAF<J77AFJJ<-
      @E00513:47:HF757ALXX:1:1101:17208:2047 2:N:0:ATCACG
      AAAATTTTAAAGGTCGAACAGACCTAACCCTTTTAACTTTTGCGCCCAAAGTTCATTTTAATCCAACATCGAGGTCGCAATCTTTTTTTTCGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGACGTATCAAT
      +
      AAF<FFAFAJJJJAJFJJFJJFJJJAJJJJJFJFFFJJJJJFAF<<-A<-<FFFJJJJFJJ<FFFFJ7<F-77A--7<<FJJ7-<AFF<F7AFJ-7--7--<-77-77-A)7-FJ<7A-7FA<-----77--A))7)7)<)))7<-----
      Then I take the following adapter file:
      Code:
      >PrefixPE/1
      AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
      >PrefixPE/2
      AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA
      Both of these could be found in above reads - first one in R1, second one in R2. Both start at position 93.

      then I run

      Code:
      java -jar $JAR PE -trimlog trim.log test.R1.fastq test.R2.fastq output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:test.fa:0:30:10
      And the adapter is not recognized or clipped. IT IS, however, successfully clipped in simple mode (e.g. when adapters are renamed)!

      Even more curiously, the adapters I've posted above are actually reverse-complements of the TrueSeq3-PE.fa. When I use the rev-comp, they are recognized!

      I think this might be due to the library construction peculiarities of RNA-seq, but I am not sure. At any rate that's a strange behaviour for the trimmer, is it not?

      Comment


      • Originally posted by tonybolger View Post
        Enjoy and happy trimming.
        Dear Tony,

        few small suggestions that can make the program a lot easier to use -

        1) just require 1 tag/prefix for the output, and use that to construct the output file names. Four file names make the command unnecessary long and unreadable
        2) make a simple wrapper around the jar, sort of like FastQC has. This would also allow to "install" the program and thus have the adapter fasta at a fixed location for the program.

        Comment


        • I've also found a strange issue with the "keepBothEnds" option.

          In the example I have described two posts above (but with bigger files, 25K reads each), the following command
          Code:
          java -jar $JAR PE -trimlog trim.log test.R1.fastq test.R2.fastq output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:0:30:10
          gives the following output:

          Code:
          Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
          ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
          Quality encoding detected as phred33
          Input Read Pairs: 25000 Both Surviving: 18039 (72.16%) Forward Only Surviving: 6948 (27.79%) Reverse Only Surviving: 0 (0.00%) Dropped: 13 (0.05%)
          TrimmomaticPE: Completed successfully
          while adding the option to keep both reads leads to failure:

          Code:
          java -jar $JAR PE -trimlog trim.log test.R1.fastq test.R2.fastq output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:0:30:10:2:TRUE
          output:

          Code:
          Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
          ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
          Quality encoding detected as phred33
          Input Read Pairs: 25000 Both Surviving: 24987 (99.95%) Forward Only Surviving: 0 (0.00%) Reverse Only Surviving: 0 (0.00%) Dropped: 13 (0.05%)
          TrimmomaticPE: Completed successfully
          This should not happen. My trimmomatic version is 0.36.

          Comment


          • Originally posted by apredeus View Post
            then I run

            Code:
            java -jar $JAR PE -trimlog trim.log test.R1.fastq test.R2.fastq output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:test.fa:0:30:10
            And the adapter is not recognized or clipped. IT IS, however, successfully clipped in simple mode (e.g. when adapters are renamed)!

            Even more curiously, the adapters I've posted above are actually reverse-complements of the TrueSeq3-PE.fa. When I use the rev-comp, they are recognized!

            I think this might be due to the library construction peculiarities of RNA-seq, but I am not sure. At any rate that's a strange behaviour for the trimmer, is it not?
            One point to note might be that the thresholds you are using in the ILLUMINACLIP command are 30 for the palindrome mode and 10 for the simple mode, and you are not allowing for any mismatches. So it's quite possible that in palindrome mode the score is not reaching the threshold of 30 that trimmomatic needs in order to identify the presence of the adapter, whereas the score of 10 requires fewer matching bases and is more easily reached. Each matching base contributes about 0.6 to the score, according to the trimmomatic manual.

            Note that the adapter sequences used for palindrome mode are the reverse complements of the sequences used in simple mode.


            Edit: The adapter fasta sequences you have above appear to be the reverse complements of the sequences I have in my version of the TruSeqv3-PE adapter fasta file, which may explain why they didn't work until you used the reverse complements. My version of the TruSeqv3 file is:

            >PrefixPE/1
            TACACTCTTTCCCTACACGACGCTCTTCCGATCT
            >PrefixPE/2
            GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT

            It's also possible that at least one of the 2 adapters is different for the RNA-Seq kits.
            Last edited by mastal; 04-29-2017, 03:14 PM.

            Comment


            • Originally posted by apredeus View Post
              Dear Tony,

              few small suggestions that can make the program a lot easier to use -

              1) just require 1 tag/prefix for the output, and use that to construct the output file names. Four file names make the command unnecessary long and unreadable
              Unless you are using a very old version of trimmomatic, you can use the -basein and -baseout options to specify the prefix of the input and output file names.

              Comment


              • Originally posted by apredeus View Post
                I've also found a strange issue with the "keepBothEnds" option.

                while adding the option to keep both reads leads to failure:

                Code:
                java -jar $JAR PE -trimlog trim.log test.R1.fastq test.R2.fastq output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:0:30:10:2:TRUE
                output:

                Code:
                Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
                ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
                Quality encoding detected as phred33
                Input Read Pairs: 25000 Both Surviving: 24987 (99.95%) Forward Only Surviving: 0 (0.00%) Reverse Only Surviving: 0 (0.00%) Dropped: 13 (0.05%)
                TrimmomaticPE: Completed successfully
                This should not happen. My trimmomatic version is 0.36.
                Why do you think this indicates failure? You asked trimmomatic to keep both reads of a pair when it finds adapter in palindrome mode, and it appears to have done that.
                This does not mean that trimmomatic has not trimmed the adapters.

                The trimlog tells you for each read how many bases have been trimmed. Check the trimlog entries for some of the reads that were ending up in the forward_unpaired file previously, and check whether adapters have been trimmed from both the forward and reverse reads, although both reads of the pair have been kept, rather than the reverse read being discarded.

                Comment


                • Code:
                  Note that the adapter sequences used for palindrome mode are the reverse complements of the sequences used in simple mode.
                  Ok, I understand now. And it's the rev-comp of /2 that is present in read 1, and vise versa - so that's why the adapters are "switched". Which I guess makes sense since R1 read-through would lead us to reading adapter ligated to R2.

                  However, it makes it very confusing to make your own file. It should definitely be switched to sequences matching the simple mode.

                  Edit: The adapter fasta sequences you have above appear to be the reverse complements of the sequences I have in my version of the TruSeqv3-PE adapter fasta file, which may explain why they didn't work until you used the reverse complements. My version of the TruSeqv3 file is:
                  Yes, I know - I've written before that it works OK with the rev-comp adapters that are also switched. I was just not sure why that was the case.

                  One remaining question is why does it not work for with the extra option to keep the output paired-ended?

                  Comment


                  • Originally posted by mastal View Post
                    Why do you think this indicates failure? You asked trimmomatic to keep both reads of a pair when it finds adapter in palindrome mode, and it appears to have done that.
                    This does not mean that trimmomatic has not trimmed the adapters.
                    You are right, thank you! The message confused me as well.

                    Comment


                    • I assume that Trimmomatic processes the reads sequentially, so I don't understand why a run with paired-end fastq input could take >10GB of memory easily. Does anyone know what I am missing?

                      I'm using trimmomatic-0.36 with 4 threads for a job. The input fastq files are about 4GB in size each.

                      Thank you!
                      yip

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Techniques and Challenges in Conservation Genomics
                        by seqadmin



                        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                        Avian Conservation
                        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                        03-08-2024, 10:41 AM
                      • seqadmin
                        The Impact of AI in Genomic Medicine
                        by seqadmin



                        Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                        02-26-2024, 02:07 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 03-14-2024, 06:13 AM
                      0 responses
                      33 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-08-2024, 08:03 AM
                      0 responses
                      72 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-07-2024, 08:13 AM
                      0 responses
                      81 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-06-2024, 09:51 AM
                      0 responses
                      68 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X