Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trimmomatic error

    I have a file containing reads. It seems the adapters were removed previously. So I do not need to eliminate the adapters.

    When I use fastqc, I have overrepresented sequences in first 7 positions and I want to trim them. That's it.

    I wrote this command:

    java -jar /usr/local/bin/trimmomatic-0.33.jar SE -phred64 inputfile.fq.gz output.fq.gz HEADCROP:7

    Is it wrong?

    I get this error:

    Exception in thread "main" java.io.IOException: Not in GZIP format
    at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:154)
    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:75)
    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:85)
    at org.usadellab.trimmomatic.util.ConcatGZIPInputStream$GZIPHelperInputStream.<init>(ConcatGZIPInputStream.java:109)
    at org.usadellab.trimmomatic.util.ConcatGZIPInputStream$GZIPHelperInputStream.<init>(ConcatGZIPInputStream.java:105)
    at org.usadellab.trimmomatic.util.ConcatGZIPInputStream.nextGzipInputStream(ConcatGZIPInputStream.java:37)
    at org.usadellab.trimmomatic.util.ConcatGZIPInputStream.<init>(ConcatGZIPInputStream.java:16)
    at org.usadellab.trimmomatic.fastq.FastqParser.parse(FastqParser.java:132)
    at org.usadellab.trimmomatic.TrimmomaticSE.process(TrimmomaticSE.java:192)
    at org.usadellab.trimmomatic.TrimmomaticSE.run(TrimmomaticSE.java:285)
    at org.usadellab.trimmomatic.Trimmomatic.main(Trimmomatic.java:40)

    How can I do it correctly?

    Thanks in advance

  • #2
    Error is indicating that your input sequence file is not in GZIP format. Are you also sure the input sequence is in phred+64 format?

    Comment


    • #3
      In addition to what Genomax wrote, there are some library types (e.g., RNAseq) that commonly show bias at the beginning of the reads. This shouldn't necessarily be trimmed off.

      Comment


      • #4
        I want to scream of happiness.. Thank you guyssss.

        Last time, I just renamed the file from .fq to .fq.gz :| :| :| . When GenoMax told me the file is not in GZIP format I did: gzip inputfile.fq and then I got the result.

        I'm not sure my input file is in phred+64 format or not. How should I understand it?

        dpryan, I know it's ok to have some bias at the beginning of reads and it is because of adapters, but I looked for typical Illumina adapters in my reads files using grep and non of reads have those adapters. So I understood, the adapters might be trimmed before (someone else passed the files to me) and I thought abnormality should be due to contamination and I decided to remove the first 7 bases.

        Thank you agaaaaain

        Comment


        • #5
          The bias isn't due to primers, it's due to random hexamer priming being biased.

          Comment


          • #6
            Originally posted by Saeideh View Post
            I'm not sure my input file is in phred+64 format or not. How should I understand it?
            If your data is recent vintage it is unlikely that it is in phred+64 (solexa) format. That said you can test the quality score format of your data by using testformat.sh utility from BBMap suite like this:

            Code:
            $ testformat.sh in=seq.fq

            Comment


            • #7
              you can also just run trimmomatic without the phred switch and it will try to autodetect quality scores (since 0.32)

              Best Wishes
              björn

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              30 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              32 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Working...
              X