SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Trimmomatic error while executing Irina Pulyakhina Bioinformatics 15 07-03-2015 04:44 AM
Trimmomatic String index error marlivlok Bioinformatics 2 06-22-2015 02:59 PM
Trimmomatic error dvanic Bioinformatics 31 04-06-2015 01:24 AM
Numberformatexception error in Trimmomatic mariruilo Bioinformatics 2 04-01-2013 11:08 AM
Trimmomatic Help Single Reads Error Hilary April Smith Bioinformatics 3 10-17-2012 06:05 AM

Reply
 
Thread Tools
Old 09-12-2015, 10:05 PM   #1
Saeideh
Member
 
Location: Iran

Join Date: Aug 2015
Posts: 25
Default Trimmomatic error

I have a file containing reads. It seems the adapters were removed previously. So I do not need to eliminate the adapters.

When I use fastqc, I have overrepresented sequences in first 7 positions and I want to trim them. That's it.

I wrote this command:

java -jar /usr/local/bin/trimmomatic-0.33.jar SE -phred64 inputfile.fq.gz output.fq.gz HEADCROP:7

Is it wrong?

I get this error:

Exception in thread "main" java.io.IOException: Not in GZIP format
at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:154)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:75)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:85)
at org.usadellab.trimmomatic.util.ConcatGZIPInputStream$GZIPHelperInputStream.<init>(ConcatGZIPInputStream.java:109)
at org.usadellab.trimmomatic.util.ConcatGZIPInputStream$GZIPHelperInputStream.<init>(ConcatGZIPInputStream.java:105)
at org.usadellab.trimmomatic.util.ConcatGZIPInputStream.nextGzipInputStream(ConcatGZIPInputStream.java:37)
at org.usadellab.trimmomatic.util.ConcatGZIPInputStream.<init>(ConcatGZIPInputStream.java:16)
at org.usadellab.trimmomatic.fastq.FastqParser.parse(FastqParser.java:132)
at org.usadellab.trimmomatic.TrimmomaticSE.process(TrimmomaticSE.java:192)
at org.usadellab.trimmomatic.TrimmomaticSE.run(TrimmomaticSE.java:285)
at org.usadellab.trimmomatic.Trimmomatic.main(Trimmomatic.java:40)

How can I do it correctly?

Thanks in advance
Saeideh is offline   Reply With Quote
Old 09-13-2015, 02:16 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,882
Default

Error is indicating that your input sequence file is not in GZIP format. Are you also sure the input sequence is in phred+64 format?
GenoMax is offline   Reply With Quote
Old 09-13-2015, 02:49 AM   #3
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

In addition to what Genomax wrote, there are some library types (e.g., RNAseq) that commonly show bias at the beginning of the reads. This shouldn't necessarily be trimmed off.
dpryan is offline   Reply With Quote
Old 09-13-2015, 03:13 AM   #4
Saeideh
Member
 
Location: Iran

Join Date: Aug 2015
Posts: 25
Default

I want to scream of happiness.. Thank you guyssss.

Last time, I just renamed the file from .fq to .fq.gz :| :| :| . When GenoMax told me the file is not in GZIP format I did: gzip inputfile.fq and then I got the result.

I'm not sure my input file is in phred+64 format or not. How should I understand it?

dpryan, I know it's ok to have some bias at the beginning of reads and it is because of adapters, but I looked for typical Illumina adapters in my reads files using grep and non of reads have those adapters. So I understood, the adapters might be trimmed before (someone else passed the files to me) and I thought abnormality should be due to contamination and I decided to remove the first 7 bases.

Thank you agaaaaain
Saeideh is offline   Reply With Quote
Old 09-13-2015, 03:28 AM   #5
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

The bias isn't due to primers, it's due to random hexamer priming being biased.
dpryan is offline   Reply With Quote
Old 09-13-2015, 06:50 AM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,882
Default

Quote:
Originally Posted by Saeideh View Post
I'm not sure my input file is in phred+64 format or not. How should I understand it?
If your data is recent vintage it is unlikely that it is in phred+64 (solexa) format. That said you can test the quality score format of your data by using testformat.sh utility from BBMap suite like this:

Code:
$ testformat.sh in=seq.fq
GenoMax is offline   Reply With Quote
Old 09-13-2015, 07:10 AM   #7
usad
Member
 
Location: aachen

Join Date: Sep 2009
Posts: 53
Default

you can also just run trimmomatic without the phred switch and it will try to autodetect quality scores (since 0.32)

Best Wishes
björn
usad is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:39 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO