SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Trimmomatic quality trimming kga1978 Bioinformatics 26 11-24-2015 10:14 AM
Trimmomatic error while executing Irina Pulyakhina Bioinformatics 15 07-03-2015 04:44 AM
Problem with trimmomatic amango Bioinformatics 9 12-29-2013 08:43 AM
Introducing pBWA [Parallel BWA] dp05yk Bioinformatics 52 05-21-2013 10:27 PM
Introducing our Ion Torrent! nickloman Ion Torrent 34 05-26-2011 05:56 PM

Reply
 
Thread Tools
Old 06-04-2013, 07:07 AM   #41
rmdoyle
Junior Member
 
Location: Indiana

Join Date: May 2013
Posts: 6
Default

Hi everyone,

I've recently used Trimmomatic on some Illumina HiSeq PE fastq files. I then attempted to run the post-Trimmomatic fastq files through fastqc. My original illumina files run through fastqc just fine, but the post-trimmomatic files get stuck, which makes me think I've corrupted the files somehow while using Trimmomatic.

When I run fastqc on my post-trimmomatic fastq files, I get the following output after inputting my sequences:

Exception in thread "Thread-4" java.lang.NullPointerException
at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:141)
at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:105)
at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:76)
at java.lang.Thread.run(Unknown Source)

I also did get one error message after running trimmomatic. This error was:

Exception in thread "main" java.lang.RuntimeException: Sequence and quality length don't match: 'GAGGTTCTTTGCTTCCTTCGGGAACCTCTCCAGCCCCACTGCCATCCTTGGCAACCCCATGGTCCGTGCCCATGGCAAGAAAGTGCTCAC' vs 'ggggggggggggfeggggggggcgggeggggggggeggg

My original trimmomatic code was:

TrimmomaticPE: -phred64 -trimlog trimlog SRR522907_1.fastq SRR522907_2.fastq paired_output1.fastq unpaired_output1.fastq paired_output2.fastq unpaired_output2.fastq ILLUMINACLIP:TruSeq3_PE.fa:2:30:10 LEADING:20 TRAILING:20 MINLEN:30

I'd appreciate any thoughts on where I went wrong...
rmdoyle is offline   Reply With Quote
Old 06-04-2013, 09:02 AM   #42
tonybolger
Senior Member
 
Location: berlin

Join Date: Feb 2010
Posts: 156
Default

Quote:
Originally Posted by rmdoyle View Post
I'd appreciate any thoughts on where I went wrong...
Very strange indeed, and nothing i've seen before.

I would suspect something like a lack of disk space, or something killed the trimmomatic process. It may also be a one-off glitch, so perhaps running it again, and checking if the output is still broken might help.
tonybolger is offline   Reply With Quote
Old 06-05-2013, 05:51 AM   #43
rmdoyle
Junior Member
 
Location: Indiana

Join Date: May 2013
Posts: 6
Default

Hmmm... gave it another shot and still no dice. Any thoughts on the following error/warning, tonybolger?

Exception in thread "main" java.lang.RuntimeException: Sequence and quality length don't match: 'GAGGTTCTTTGCTTCCTTCGGGAACCTCTCCAGCCCCACTGCCATCCTTGGCAACCCCATGGTCCGTGCCCATGGCAAGAAAGTGCTCAC' vs 'ggggggggggggfeggggggggcgggeggggggggeggg
rmdoyle is offline   Reply With Quote
Old 06-05-2013, 07:53 AM   #44
tonybolger
Senior Member
 
Location: berlin

Join Date: Feb 2010
Posts: 156
Default

Quote:
Originally Posted by rmdoyle View Post
Hmmm... gave it another shot and still no dice. Any thoughts on the following error/warning, tonybolger?

Exception in thread "main" java.lang.RuntimeException: Sequence and quality length don't match: 'GAGGTTCTTTGCTTCCTTCGGGAACCTCTCCAGCCCCACTGCCATCCTTGGCAACCCCATGGTCCGTGCCCATGGCAAGAAAGTGCTCAC' vs 'ggggggggggggfeggggggggcgggeggggggggeggg
Ah, that was a trimmomatic error. Normally a FASTQ record should have the same number of bases and quality scores, and for some reason, this read appears to have fewer quality scores, which trimmomatic considers invalid (AFAIK this is correct behaviour). At this point, trimmomatic gives up, and probably leaves a partial output file, which may cause other issues.

The question is why the record is invalid. Can you find that fastq record within the file?

Of course, trimmomatic should really log the name of the record as well, rather than just the data, but i haven't seen this happen before.
tonybolger is offline   Reply With Quote
Old 06-05-2013, 09:34 AM   #45
rmdoyle
Junior Member
 
Location: Indiana

Join Date: May 2013
Posts: 6
Default

Yup, the complete record is:

@FCB01CWABXX:1:2205:1823:145892
GAGGTTCTTTGCTTCCTTCGGGAACCTCTCCAGCCCCACTGCCATCCTTGGCAACCCCATGGTCCGTGCCCATGGCAAGAAAGTGCTCAC
+FCB01CWABXX:1:2205:1823:145892
ggggggggggggfeggggggggcgggeggggggggeggg18207:146312

I suppose I could just cut this record out?

Interestingly, if I leave out the ILLUMINACLIP:TruSeqForTrimmomatic.fna:2:30:10 option, and leave my code as:

trimmomatic paired-end -phred64 -trimlog trimlog SRR522907_1.fastq SRR522907_2.fastq paired_output1b.fastq unpaired_output1b.fastq paired_output2b.fastq unpaired_output2b.fastq LEADING:20 TRAILING:20 MINLEN:30

I get files that I CAN run through fastqc without any problems (the results don't look great, but I can run the files through). Does that set off any red flags?
rmdoyle is offline   Reply With Quote
Old 06-05-2013, 10:31 AM   #46
tonybolger
Senior Member
 
Location: berlin

Join Date: Feb 2010
Posts: 156
Default

Quote:
Originally Posted by rmdoyle View Post
Yup, the complete record is:

@FCB01CWABXX:1:2205:1823:145892
GAGGTTCTTTGCTTCCTTCGGGAACCTCTCCAGCCCCACTGCCATCCTTGGCAACCCCATGGTCCGTGCCCATGGCAAGAAAGTGCTCAC
+FCB01CWABXX:1:2205:1823:145892
ggggggggggggfeggggggggcgggeggggggggeggg18207:146312
This looks indeed like a dodgy record - the end of the quality score line is missing, and the numeric bit looks like the end of the name from another record.

Quote:
Originally Posted by rmdoyle View Post
I suppose I could just cut this record out?
Perhaps, but i would be rather concerned where it came from. Furthermore, you probably then end up with additional missing records, and you would need to match the other 'paired' file in the dataset.

Quote:
Originally Posted by rmdoyle View Post
Interestingly, if I leave out the ILLUMINACLIP:TruSeqForTrimmomatic.fna:2:30:10 option, and leave my code as:

trimmomatic paired-end -phred64 -trimlog trimlog SRR522907_1.fastq SRR522907_2.fastq paired_output1b.fastq unpaired_output1b.fastq paired_output2b.fastq unpaired_output2b.fastq LEADING:20 TRAILING:20 MINLEN:30

I get files that I CAN run through fastqc without any problems (the results don't look great, but I can run the files through). Does that set off any red flags?
This is also very strange. Trimmomatic parses the FASTQ in the same way regardless of the trimming steps selected, so for some reason, it must have seen valid records this time (or at least records which aren't broken in this way).

I suggest running something like md5sum on the original files on this computer, and on a separate machine, a few times each, to see if there is any inconsistency.
tonybolger is offline   Reply With Quote
Old 06-19-2013, 02:43 PM   #47
rmdoyle
Junior Member
 
Location: Indiana

Join Date: May 2013
Posts: 6
Default

Hi tonybolger,

Thanks for your help with this, I ended up removing the offensive record and the corresponding paired read. I don't know that this was causing the original problem with corrupted files, but the issue seems to have resolved itself...
rmdoyle is offline   Reply With Quote
Old 07-12-2013, 10:04 AM   #48
fishie
Junior Member
 
Location: Washington, DC

Join Date: Jul 2013
Posts: 3
Default

Hi. Sorry I'm generally new to this forum and NGS.

I'm looking at using trimmomatic to remove adapters and trim my sequences from MiSeq. I used the Nextera XT kit and their primers for my libraries, not the TruSeq kits. Does anyone know if a list of the Nextera adapater and primer sequences is readily available somewhere for use with ILLUMINACLIP?
fishie is offline   Reply With Quote
Old 07-12-2013, 12:42 PM   #49
fishie
Junior Member
 
Location: Washington, DC

Join Date: Jul 2013
Posts: 3
Default

per my above question... I did find the Nextera XT adapater sequences, and I understand how to use them for simple clipping for trimmomatic. However, I'm still unclear on exactly what the palindrome clipping is, and whether it is necessary for my MiSeq data (paired reads, 2x250. Note that it is NOT a mate-pair library though). If it is suggested to use palindrome clipping, what should be used for the prefixes?
fishie is offline   Reply With Quote
Old 07-18-2013, 08:27 AM   #50
htetre
Member
 
Location: US

Join Date: Jul 2013
Posts: 28
Default

Hello,

I am trying to use Trimmomatic for the first time and am getting the following error message:

TrimmomaticPE: Started with arguments: -phred33 /homes/htetre/Illumina06272013/ANN_ACAGTG_L004_R1_001.fastq /homes/htetre/Illumina06272013/ANN_ACAGTG_L004_R2_001.fastq ANN_1_PE.fastq ANN_1_UP.fastq ANN_2_PE.fastq ANN_2_UP.fastq ILLUMINACLIP:/homes/htetre/Trimmomatic-0.30/adapters/TruSeq2-PE.fa

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at org.usadellab.trimmomatic.trim.IlluminaClippingTrimmer.makeIlluminaClippingTrimmer(IlluminaClippingTrimmer.java:53)
at org.usadellab.trimmomatic.trim.TrimmerFactory.makeTrimmer(TrimmerFactory.java:27)
at org.usadellab.trimmomatic.TrimmomaticPE.run(TrimmomaticPE.java:344)
at org.usadellab.trimmomatic.Trimmomatic.main(Trimmomatic.java:23)

Can someone help me with this error message?

Thank you so much for your help
Hannah
htetre is offline   Reply With Quote
Old 07-18-2013, 12:40 PM   #51
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

I suspect something wrong with your input files. Could you head and tail 5 lines from each of the files and post it to this forum thread.
westerman is offline   Reply With Quote
Old 07-18-2013, 06:46 PM   #52
htetre
Member
 
Location: US

Join Date: Jul 2013
Posts: 28
Default

Below is the head and tail of my file.

Thanks for taking a look.

HEAD:
ANN_ACAGTG_L004_R1_001.fastq <==
@HWI-ST538:334:C225RACXX:4:1101:1202:1954 1:Y:0:ACAGTG
ACNTTATGATTTTTGGNNNGTNCNANGNNCAGNGCGGNGCGGGGNNNNNNNNGNAACATAATANNNNNNNNNNNCATGAT AAAAATGNATAACACNCAAT
+
<<#25>?@<<@@????################################################################ ####################

TAIL:
<=7?A><@A+?AB<BBB3A<<@<ABB@BA<;?A7=A>B90?=:??A><4=*('--=(.8>BBA.7>AAA###############################
@HWI-ST538:334:C225RACXX:4:2316:21264:100116 1:Y:0:ACAGTG
TTGTTCTGTCGTAATCTTCAAACGAAGCAATTTGTTTTACCGGAATCCAATTTACCCATATCAACTCCTCGAACGCATTCAAAGTGCTCGACTGAAGCAA
+
(5;=?@<@.)@(2<5>8):=8:<35@=(<1@#####################################################################
htetre is offline   Reply With Quote
Old 07-18-2013, 11:08 PM   #53
tonybolger
Senior Member
 
Location: berlin

Join Date: Feb 2010
Posts: 156
Default

Quote:
Originally Posted by htetre View Post
Hello,

I am trying to use Trimmomatic for the first time and am getting the following error message:

TrimmomaticPE: Started with arguments: -phred33 /homes/htetre/Illumina06272013/ANN_ACAGTG_L004_R1_001.fastq /homes/htetre/Illumina06272013/ANN_ACAGTG_L004_R2_001.fastq ANN_1_PE.fastq ANN_1_UP.fastq ANN_2_PE.fastq ANN_2_UP.fastq ILLUMINACLIP:/homes/htetre/Trimmomatic-0.30/adapters/TruSeq2-PE.fa

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at org.usadellab.trimmomatic.trim.IlluminaClippingTrimmer.makeIlluminaClippingTrimmer(IlluminaClippingTrimmer.java:53)
Hi Hannah,

It appears you're missing the numeric thresholds used with the ILUMINACLIP step. If you use the suggested values, that step should look like:

ILLUMINACLIP:/homes/htetre/Trimmomatic-0.30/adapters/TruSeq2-PE.fa:2:30:12

I really need to improve the error reporting when such problems occur.

Tony.
tonybolger is offline   Reply With Quote
Old 08-16-2013, 12:08 AM   #54
ebioman
Member
 
Location: Switzerland

Join Date: Aug 2013
Posts: 41
Question Question regarding Paired/Unpaired Output

I have a question regarding the output files if trimming of paired-ends is chosen.
To recapitulate:
4 files are generated:
  • R1_Paired
  • R1_Unpaired
  • R2_Paired
  • R2_Unpaired

I am confused of the final data these contain and find the manual less helpful.

E.g. I did a trimming and I analyze with fastqc the original file R1 and R1_Paired. This reports to me that the quality enhanced and I can observe that certain bad reads were discarded.
  1. What does R1_Unpaired contain ?
  2. Do I use R1_unpaired as well for the assembly or only R1_paired?
  3. In the case when no R1_paired is generated, do I use directly R1_unpaired or does this indicate a serious problem ?

I hope somebody can help solve some of these question
ebioman is offline   Reply With Quote
Old 08-16-2013, 12:48 AM   #55
tonybolger
Senior Member
 
Location: berlin

Join Date: Feb 2010
Posts: 156
Default

Quote:
Originally Posted by ebioman View Post
What does R1_Unpaired contain ?
The forward reads for which the corresponding reverse read was discarded, for whatever reason. The reason behind this is how most tools interpret paired files - tools often assume that both files contain corresponding reads in the same order. Thus you need somewhere else to put 'singleton' reads which have lost their mate.

Quote:
Originally Posted by ebioman View Post
Do I use R1_unpaired as well for the assembly or only R1_paired?
If the assembler (or other downstream tool) can handle a mix of paired and single end reads, you can use both.

Quote:
Originally Posted by ebioman View Post
In the case when no R1_paired is generated, do I use directly R1_unpaired or does this indicate a serious problem ?
This seems strange - 4 files should always be created (when running trimmomatic in paired-end mode).
tonybolger is offline   Reply With Quote
Old 08-16-2013, 12:52 AM   #56
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Quote:
Originally Posted by ebioman View Post
  1. What does R1_Unpaired contain ?
  2. Do I use R1_unpaired as well for the assembly or only R1_paired?
  3. In the case when no R1_paired is generated, do I use directly R1_unpaired or does this indicate a serious problem ?
When paired-end reads are trimmed, it can happen that one read of the pair is such low quality that it's discarded all together, or that it's simply trimmed such that it's too short for further use (i.e., it's length is below some threshold). In those cases, it's mate is moved to the R1_Unpaired file (or R2_Unpaired, though that's less typical), since it lost its paired read. If all of your reads end up unpaired, that suggests that either something went wrong in sequencing or you specified parameters incorrectly. I would assume that single-end reads are still usable for assembly, but that's not something I'm versed in so others would be better qualified to answer that.
dpryan is offline   Reply With Quote
Old 08-16-2013, 03:18 AM   #57
ebioman
Member
 
Location: Switzerland

Join Date: Aug 2013
Posts: 41
Default

Thanks for the fast replies !

The sequence which was that bad, giving only an unpaired output, was indeed not the best one.
In this case I have several short paired Illumina reads and 2 long mate-pair reads - one of them being less good. I guess I wont process the mate-pair reads and use them like that exclusively for the scaffolding process and not for contig-assembly. That should be not so bad.

I wonder still whether I should just discard the unpaired information of the short reads or add them to the SOAPdenovo assembly as well ...
ebioman is offline   Reply With Quote
Old 10-17-2013, 12:44 PM   #58
debarryj
Junior Member
 
Location: Georgia

Join Date: Oct 2013
Posts: 2
Default

Greetings and thanks for the great tool!
I am attempting to run Trimmomatic on MiSeq data and the error I am getting is not showing up in internet searches etc. Any help would be appreciated.

I have used FASTQC to check the data and select Trimmomatic parameters. The process dies quickly with the following contents in the error file:

"TrimmomaticPE: Started with arguments: -threads 1 -phred33 -trimlog TRIMlogFILE Agar1943_S1_L001_R1_001-1.fastq Agar1943_S1_L001_R2_001-1.fastq R1_paired R1_unpaired R2_paired R2_unpaired ILLUMINACLIP:adapters.fasta:2:40:15 LEADING:32 TRAILING:32 SLIDINGWINDOW:4:32 MINLEN:150
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Exception processing reads: M00313:59:000000000-A3E75:1:1101:16432:1530 1:N:0:1 and M00313:59:000000000-A3E75:1:1101:16432:1530 2:N:0:1
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 54
at org.usadellab.trimmomatic.fastq.trim.IlluminaClippingTrimmer.palindromeReadsCompare(IlluminaClippingTrimmer.java:383)
at org.usadellab.trimmomatic.fastq.trim.IlluminaClippingTrimmer.processRecords(IlluminaClippingTrimmer.java:184)
at org.usadellab.trimmomatic.TrimmomaticPE.processSingleThreaded(TrimmomaticPE.java:66)
at org.usadellab.trimmomatic.TrimmomaticPE.process(TrimmomaticPE.java:278)
at org.usadellab.trimmomatic.TrimmomaticPE.run(TrimmomaticPE.java:350)
at org.usadellab.trimmomatic.TrimmomaticPE.main(TrimmomaticPE.java:358)

real 0m1.568s
user 0m0.421s
sys 0m0.112s"

I have symbolic links to the adapter and input files.

Here are the first 5 lines of the input files:
$ head -n 5 Agar1943_S1_L001_R1_001-1.fastq
@M00313:59:000000000-A3E75:1:1101:17309:1456 1:N:0:1
CTCCGCTNCGCTCTGTAACTGTGAGGTTTGTGTTGCGGGAACTTAGTATTTTCCTCCTGCGTTTTTATTATGCCATGGAATGATCAGGTAATATTCCTCTGTGATGCTCTGGCCAGGGACTGCTATGAGTCCTTCGGCCATTAGAAAATTCTGTGGCATTTTAGGCAAT
+
?AAAAAA#>>>AF1FGBGGEDAFDFGABFAFGFGGBEACEGFFFHFAFGHH2DEAGGHFFA/BEFGHBGGHGBEGHGFHFHHGBGHHHEFHHHHHFFGHHHHHGHHGFFHGFHHGEHECFHFGHH2EEEF1GHHGAECG@GFHBFGHEHHHGGHHFFHBBD1FGBGFHG
@M00313:59:000000000-A3E75:1:1101:17278:1456 1:N:0:1

$ head -n 5 Agar1943_S1_L001_R2_001-1.fastq
@M00313:59:000000000-A3E75:1:1101:17309:1456 2:N:0:1
NTTGCCTAAAATGCCACAGAATTTTCTAATGGCCGAAGGACTCATAGCAGTCCCTGGCCAGAGCATCACAGAGGAATATTACCTGATCATTCCATGGCATAATAAAAACGCAGGAGGAAAATACTAAGTTCCCGCAACACAAACCTCACAGTTACAGAGCGGAGCGGAGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGCGTAGATCTCGGTGGTCGCCGTATCATTACAAAAAAAACACACCTATTATCC
+
#>>>AAFFFFFFGGGGGGGG1FHE3FHGFHHGHFCGGGGHACEGHHFHHFA1FHFGAGEBAB/BGFH1FGAGHGGGHHB2GHHHHHHHHHHHBFGFHHHDHHFHHHHH?EGGG?FGGGFFHHHHHFH1>>FHH/E//BGFGAGHGHHHHHEFGHHBFBGB//@C@-CC-AEEFGHCFEGHHHGCCA9EE990E/.9.CF/:;E@FBFFF?E--;-9@@?@;99F/FFB/;BBFF@?--9----/9///9//
@M00313:59:000000000-A3E75:1:1101:17278:1456 2:N:0:1

Here are the 2 reads mentioned in the error output:
@M00313:59:000000000-A3E75:1:1101:16432:1530 1:N:0:1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
###################################

@M00313:59:000000000-A3E75:1:1101:16432:1530 2:N:0:1
AGAACGGCAGACCGTCGAGTAGGGAAAGAGCGAAGATTTCGGTGGCCGCCGTATCATTAAAAAAAACTCAACAATCACATTCCCTTTTTATAGCCAGACTTCTCCTTCAACCTCCCCTCCTATTAATTATTCCATAATTATTTCATCTAACCCAATCCTGTTATGCTCAATCTCATGACACACATCACTTCATCCCTTAACTTTTCATCCTCTACATGCAACACACTACTCTAAAATATACATCACGCATT
+
debarryj is offline   Reply With Quote
Old 10-18-2013, 06:03 AM   #59
tonybolger
Senior Member
 
Location: berlin

Join Date: Feb 2010
Posts: 156
Default

Quote:
Originally Posted by debarryj View Post
Greetings and thanks for the great tool!
I am attempting to run Trimmomatic on MiSeq data and the error I am getting is not showing up in internet searches etc. Any help would be appreciated.
This bug, which should be fixed in the current version, is triggered when a read pair have a big difference in length between the forward and reverse reads - this was rare, since typically the tool worked on untrimmed data, with equal forward and reverse read lengths.

Incidentally, it appears you have enabled the miseq built-in trimming - this prevents the correct detection of adapter read-through by trimmomatic, so i typically would not suggest using both.
tonybolger is offline   Reply With Quote
Old 10-18-2013, 09:23 AM   #60
debarryj
Junior Member
 
Location: Georgia

Join Date: Oct 2013
Posts: 2
Default

Quote:
Originally Posted by tonybolger View Post
This bug, which should be fixed in the current version, is triggered when a read pair have a big difference in length between the forward and reverse reads - this was rare, since typically the tool worked on untrimmed data, with equal forward and reverse read lengths.

Incidentally, it appears you have enabled the miseq built-in trimming - this prevents the correct detection of adapter read-through by trimmomatic, so i typically would not suggest using both.
Many Thanks! If I may ask one more noob question, what indicated to you that the miseq trimming was enabled? I have been handed this data with very little information and would like to know how to spot it.

Best
debarryj is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:17 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO