Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with trimmomatic

    I ran trimmomatic on my 100bp paired end Illumina hiseq dataset using the command below. But the output files were all empty when unzipped (4kb when zipped). The program ran for around 8 hours (each of the input fastq files is around 33gb) and the trimlog file was populated (11.19gb total), but it looks like every line indicates that the sequence length for each respective read is 0 (e.g. HS4_80:8:2308:9999:99196/2 0 0 0 0). I don't think the problem is with the input files since they pass qc.

    Any ideas on what's going wrong?

    nohup java -classpath trimmomatic-0.17.jar org.usadellab.trimmomatic.TrimmomaticPE -trimlog trimlog C08DRACXX.8_1.fastq C08DRACXX.8_2.fastq forward_paired.fq.gz forward_unpaired.fq.gz reverse_paired.fq.gz reverse_unpaired.fq.gz ILLUMINACLIP:adapters+RC_NOindexes_for_trimmomatic.fastq:2:40:15 LEADING:10 TRAILING:10 SLIDINGWINDOW:4:20 MINLEN:70 &

  • #2
    Hum. That is indeed strange. The '0 0 0 0' indicates a surviving sequence length of 0 yet nothing was trimmed.

    Four suggestions:

    First, perhaps you encountered a system error (out of memory, out of time, etc.). Try running with a reduced input. Say a 'head --lines=40000' from each of the files.

    Second, perhaps you have malformed PE reads? Try running Trimmomatic as single-end on one of your input files.

    Third, just to make sure that you have proper looking input files please respond with the output of a 'head --lines=8' from one of your input files.

    And fourth, when you say that they pass QC, what quality control measurement are you using?

    Comment


    • #3
      Third, just to make sure that you have proper looking input files please respond with the output of a 'head --lines=8' from one of your input files.

      And fourth, when you say that they pass QC, what quality control measurement are you using?
      Thanks for the input. I received another tip that since my data was generated on a Hiseq, quality scores are probably phred33, whereas the Trimmomatic default is phred64. I will specify phredd33 and run the command again, if that doesn't work, I will follow your first two suggestions.

      Below are the first 8 lines of one of my input fastq files.

      I checked quality using fastqc, and also used picard commands CollectAlignmentSummaryMetrics and ValidateSamFile (our core provided our raw reads in a bam file).


      @HS4_80:8:1101:10000:100155/1
      CATTCGTGTGAAAATGATAGTGAACCTCTGATAAGCAGTACGGACTCCAAAGAAGTGAAAGATAATAAAAAAAATAGGAAAGCACTGGGGTGCATTAAAA
      +
      ?@@FFFFFFHFHHJIJIIJEFHHGHGGGIJIIIIIGIDFHGIIJJGIIIJHIJGIBF=F@FCAGGHGCDHGHCDDCDDDDDDCDDDDDDB59B>CDCC>A
      @HS4_80:8:1101:10000:101570/1
      TTGCGATCGGACGTCAAACATGAAGGTGTATTTATGACCATCGAGGGCACAGTCGACTTACAGATCAGCGCACAGAACGTAGGTGCTTTTGACGCTTTCT
      +
      @CCFFFFFDFHHHIIHJJJIJJJJJJ?DBGHGHEIJIJJJJIJBABGGEGGECEHBBBCCEEEECDDDDD@BBDCDDDD?BDDCCDDDDCCDDBD>BDCD

      Comment


      • #4
        I received another tip that since my data was generated on a Hiseq, quality scores are probably phred33, whereas the Trimmomatic default is phred64.
        That is a good point. I have my Trimmomatic script set up to automatically use phred33 thus completely forgot that this is the most likely culprit.

        Your file looks fine. Good to hear that you ran fastqc and the other programs. I would still reduce your dataset to a smaller number of reads. This would allow you to test out trimmomatic quickly.

        Comment


        • #5
          Originally posted by amango View Post
          Any ideas on what's going wrong?
          The most likely explanation is a phred33 vs phred64 mismatch - HiSeq data is typically Phred-33, while Trimmomatic uses phred64 by default (which was the 'standard' when it was implemented). If you get this wrong in one direction, you get no trimming, in the other direction, everything gets trimmed. I really need to implement a warning for this, since it's relatively common.

          Incidentally, you can just 'head' a few thousand lines from the input files for test purposes - normally you get a higher 'casuality' rate from the start/end of the file (since these tiles are at the edge of the flow cell), but some reads should still pass.
          Last edited by tonybolger; 02-04-2012, 03:03 AM.

          Comment


          • #6
            I did re-run trimmomatic on my fastq files, this time specifying --phred33, and it seems to have worked.

            However, when I tried to assess the resulting output files using fastQC, I run into errors. It seems the fastq files outputted by trimmomatic are not what fastqc expects. Below are A) the first few lines of one output file, trimmed_forward_paired.fq, and B) the first and last few lines of the log file produced by fastqc, documenting the errors encountered when this same file was run. Similar errors as described for lines 1 and 21 were found for many lines throughout the file.

            I haven't seen this type of a problem with fastQC before though I have tried it with fastq files. And I don't know enough about the fastq format to tell based on the files alone whether the problem here is with trimmomatic or fastqc. Any pointers would be appreciated.

            A)
            @HS4_80:8:1101:10000:100155/1
            CATTCGTGTGAAAATGATAGTGAACCTCTGATAAGCAGTACGGACTCCAAAGAAGTGAAAGATAATAAAAAAAATAGGAAAGCACTGGGGTGCATTAAAA
            +
            ?@@FFFFFFHFHHJIJIIJEFHHGHGGGIJIIIIIGIDFHGIIJJGIIIJHIJGIBF=F@FCAGGHGCDHGHCDDCDDDDDDCDDDDDDB59B>CDCC>A
            @HS4_80:8:1101:10000:104061/1
            CGAGATTGTAGTGTCCACCGCATTTGCTGACACCAAGCCGGCAGATAAGAACGAGAAGAAAAGGGCCATTTTATCCAACCCATTATTCTCATTTGGAGCC
            +
            CCCFFFFFHHHHHJIJJJJIJJJJJGGIJIJJJIIJIJJJJJJJIJJJJJIIJHHGFFFFDEEDDDDDDDEEEDDDCDDDDBCCCDEEEDEEEEECDDCD
            @HS4_80:8:1101:10000:105586/1
            ATGGCTTTTTTCATCCAAGATGAGGACGATAAATGCCAACCAATCTGTGAAAATCCCCGATGGCATTGATGTCACAGTCAATAAGAGGATCATAGTTGTC
            +
            @CCFFFFFHHGHHJJJJJJIJEHHIJIJIJJIJJIJJIIJJGGJIJJIJJIJIJJJJJJGHFFFCEEEEDEEDEDDCDDDDFEDDDD@BDCDDDDDEDDD
            @HS4_80:8:1101:10000:107366/1
            GAGCAATGTTAAAGTTAGGTGTCTTAAAGAATGCAACCAAATATCATATTCGCAACACTTGTCTGCAGCCTGTTTAGATGCCACAGAAGTTATATTGTAC
            +
            CCCFFFFFHHHHHJJJJJJIIIIJJIJEHIIJJJJJJJJIJJJJHIJJJJJJIIJJJJIJJJJJJJIJJJJIHHHHHHFFFFFFEEEEECCDDDFEEEEE
            @HS4_80:8:1101:10000:107743/1
            ATGGCTTCTACTGGCCACTGCACCGGTTGCGTGCGGATCTGCTCGTGCACCGCCAGTACCGTATCCGCGGTGTACGGCAGCCGACCGTAGACAA
            +
            CCCFFFFFHHHHHJJIJJJJJIIJIJDHHGIFHGHJIJIIJIGGHEHHEFFFDD:BCCDDDBDDDDBDBD9@5ACBBBBDDB>>BD@52<<C:?
            @HS4_80:8:1101:10000:11073/1
            TTACGATCTTCACGTCCACGTCATCGTCCTGGACCAGAGATTCGTGGAAAGCACTATGAACGGCCGCCACGCTAAACATCTTAATATCGATATTATAATC
            +
            CCCFFFFFHHHHHJJJJIJJJJJJJJJJJJJJJJJJJJJIJJJJHHJHIJJJJJJJJJJJJHHFFDDDDDDDDDDDDDDDDDDEEEEEDDDDDEEEDDED


            B)
            Semicolon seems to be missing at /Volumes/pichia/aman/data/trimmed_forward_paired.fq line 1.
            Array found where operator expected at /Volumes/pichia/aman/data/trimmed_forward_paired.fq line 21, at end of line
            (Might be a runaway multi-line ?? string starting on line 4)
            (Missing semicolon on previous line?)
            Semicolon seems to be missing at /Volumes/pichia/aman/data/trimmed_forward_paired.fq line 21.
            [...]
            syntax error at trimmed_forward_paired.fq line 1, near "@HS4_80:"
            syntax error at trimmed_forward_paired.fq line 84, near "@ACCDC@>"
            syntax error at trimmed_forward_paired.fq line 200, near "F@GBH"
            syntax error at trimmed_forward_paired.fq line 212, near "?DDDDD<FBFFDFFAGA=FF9CEGFIFE;CF>GEEDGBBFABFFF<B?FF"
            syntax error at trimmed_forward_paired.fq line 252, near "@@<DD;D?FFF"
            syntax error at trimmed_forward_paired.fq line 252, near "?DD1DD?FGDHGFB"
            syntax error at trimmed_forward_paired.fq line 252, near "C>"
            BEGIN not safe after errors--compilation aborted at trimmed_forward_paired.fq line 300.
            Last edited by amango; 02-04-2012, 02:59 PM.

            Comment


            • #7
              Originally posted by amango View Post
              I did re-run trimmomatic on my fastq files, this time specifying --phred33, and it seems to have worked.
              Excellent.

              Originally posted by amango View Post
              However, when I tried to assess the resulting output files using fastQC, I run into errors. It seems the fastq files outputted by trimmomatic are not what fastqc expects. Below are A) the first few lines of one output file, trimmed_forward_paired.fq, and B) the first and last few lines of the log file produced by fastqc, documenting the errors encountered when this same file was run. Similar errors as described for lines 1 and 21 were found for many lines throughout the file.
              Strange - i've just tested those 6 records on FastQC, and it seems relatively happy to parse them.

              BTW those errors you show seem very like what would happen if perl tried to parse a fastq file.

              Comment


              • #8
                Trimmomatic 0.32 error

                Hi
                I am running Trimmomatic 0.32 and I couldn't figure out why the trim log file shows the following for some reads in the reverse pair file:

                phred 33
                Illumina sequencing reads

                HWI-ST1122:289:C38D1ACXX:8:1101:1200:2205 1:N:0:ATCACG 98 0 98 3=>Read 1
                HWI-ST1122:289:C38D1ACXX:8:1101:1200:2205 2:N:0:ATCACG 0 0 0 0 =>Read 2

                HWI-ST1122:289:C38D1ACXX:8:1101:4410:2059 1:N:0:ATCACG 101 0 101 0=>Read 1
                HWI-ST1122:289:C38D1ACXX:8:1101:4410:2059 2:N:0:ATCACG 0 0 0 0=>Read 2

                HWI-ST1122:289:C38D1ACXX:8:1101:4892:2178 1:N:0:ATCACG 101 0 101 0 =>Read 1
                HWI-ST1122:289:C38D1ACXX:8:1101:4892:2178 2:N:0:ATCACG 0 0 0 0 =>Read 2

                It looks to me like reads trimmed 0 and reads survived 0.

                Comment


                • #9
                  In the excerpt from your log file, it looks like the Read2 reads were dropped by trimmomatic.

                  Did you have a look at those reads to see why? For example, they might have been very low quality, or shorter than a minimum length you specified.

                  What parameters did you run trimmomatic with?

                  Comment


                  • #10
                    Reply

                    Trimmomatic paramters:
                    -phred33
                    ILLUMINACLIP:/Users/mparida/Software/Trimmomatic-0.32/adapters/:2:40:12 SLIDINGWINDOW:5:20 LEADING:10 TRAILING:12 MINLEN:90

                    Thanks for your reply. I figured out what Trimmomatic is doing. After I changed the MINLEN parameter to 40, it started giving me different stats:
                    for example:
                    Read 1 101 0 101 0
                    Read 2 37 0 37 64
                    So when it trims and the read length doesn't pass MINLEN threshold, it shows us this weird stats for the same read:
                    Read 1 101 0 101 0
                    Read 2 0 0 0 0
                    This makes sense.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 06:37 PM
                    0 responses
                    7 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Yesterday, 06:07 PM
                    0 responses
                    7 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    49 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    66 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X