SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Trimmomatic quality trimming kga1978 Bioinformatics 26 11-24-2015 11:14 AM
Trimmomatic error while executing Irina Pulyakhina Bioinformatics 15 07-03-2015 05:44 AM
Problem with trimmomatic amango Bioinformatics 9 12-29-2013 09:43 AM
Introducing pBWA [Parallel BWA] dp05yk Bioinformatics 52 05-21-2013 11:27 PM
Introducing our Ion Torrent! nickloman Ion Torrent 34 05-26-2011 06:56 PM

Reply
 
Thread Tools
Old 11-02-2015, 12:24 PM   #181
mgbfx9
Member
 
Location: USA

Join Date: Sep 2015
Posts: 25
Default

Hi GeoMax,

Thank you so much for all the help.

Yes, I saw that and for that reason I posted the percent surviving report.
Yes, I ran FastQC before processing, but after processing FastQC was giving me lots of troubles. Anyway,
Before Timming, total sequence was 3032230, and it didn’t pass per base sequence quality and the adapter content. From the graph, the adapter content (which was Nextera Transposase sequence) which was started at 40bp position and went little over 60% till 229 bp position. I am not sure how to interpret this without sending the graph. But I am not sure how to paste this graph. I tried printScrn, it is still not pasting.
After trimming, total sequence is 1084634, but this sample passed both per base sequence quality and adapter content.

How to check the adapter contamination? Is there a way?

I am not clear about your last question: "How long were the reads to begin with (you have asked for minlength 36)?"
Yes, I asked for minimum length 36, so it would drop the read if it is below 36. And my code at the end was:
ILLUMINACLIP:/home/mydir/Trimmomatic-0.33/adapters/NexteraPE-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Output was :
Using PrefixPair: 'AGATGTGTATAAGAGACAG' and 'AGATGTGTATAAGAGACAG'
Using Long Clipping Sequence: 'GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG'
Using Long Clipping Sequence: 'TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG'
Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTGACGCTGCCGACGA'
Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTCCGAGCCCACGAGAC'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Read Pairs: 3032230 Both Surviving: 1084634 (35.77%) Forward Only Surviving: 1933475 (63.76%) Reverse Only Surviving: 606 (0.02%) Dropped: 13515 (0.45%)
TrimmomaticPE: Completed successfully.
mgbfx9 is offline   Reply With Quote
Old 04-29-2017, 01:58 PM   #182
apredeus
Senior Member
 
Location: Bioinformatics Institute, SPb

Join Date: Jul 2012
Posts: 151
Default

Quote:
Originally Posted by Brian Bushnell View Post
Hi all,

I'm testing Trimmomatic's performance on adapter removal, and the results are mysteriously bad. So I'd like to make sure I'm not doing anything wrong. This is my command line (modified from the website):

java -Xmx8g -jar trimmomatic-0.32.jar SE -phred33 dirty.fq tclean.fq ILLUMINACLIP:gruseq.fa:2:30:10

...where dirty.fq is a file containing reads with adapter sequences and gruseq.fa is a file containing the adapter sequences. The adapters are inserted synthetically and the reads are tagged, so I know precisely what the correct results should be, and what I'm getting is not really close. Any suggestions?
Hello Brian,

did you figure out what was the issue? I'm perplexed - I wanted to use Trimmomatic and BBduk and compare the results, however, Trimmomatic just would not remove the adapter that is absolutely certainly there and could be found using simply using grep. And they are precisely the case for the palindrome mode.

So I have my test reads that are just two reads - one R1, one R2.

Code:
@E00513:47:HF757ALXX:1:1101:17208:2047 1:N:0:ATCACG
CNAAAAAAAAGATTGCGACCTCGATGTTGGATTAAAATGAACTTTTGGCGCAAAAGTTAAAAGGGTTAGGTCTGTTCGACCTTTAAAATTTTAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCACGTATGCCGTCTTC
+
<#AAFAJJJJJFJFJFJJJJF<AFJJJJJJFFFJJJJJ<JF-FJJJJJFJA<-FAF-FFJAJFJFAFJJJAAF<<7<JJJ<JJJAJJJJJJJJJFJFJFFFJJFA<FJ<FJ7F<7--7F<JAFAAAFFFJJJ<-FA7FAF<J77AFJJ<-
@E00513:47:HF757ALXX:1:1101:17208:2047 2:N:0:ATCACG
AAAATTTTAAAGGTCGAACAGACCTAACCCTTTTAACTTTTGCGCCCAAAGTTCATTTTAATCCAACATCGAGGTCGCAATCTTTTTTTTCGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGACGTATCAAT
+
AAF<FFAFAJJJJAJFJJFJJFJJJAJJJJJFJFFFJJJJJFAF<<-A<-<FFFJJJJFJJ<FFFFJ7<F-77A--7<<FJJ7-<AFF<F7AFJ-7--7--<-77-77-A)7-FJ<7A-7FA<-----77--A))7)7)<)))7<-----
Then I take the following adapter file:
Code:
>PrefixPE/1
AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
>PrefixPE/2
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA
Both of these could be found in above reads - first one in R1, second one in R2. Both start at position 93.

then I run

Code:
java -jar $JAR PE -trimlog trim.log test.R1.fastq test.R2.fastq output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:test.fa:0:30:10
And the adapter is not recognized or clipped. IT IS, however, successfully clipped in simple mode (e.g. when adapters are renamed)!

Even more curiously, the adapters I've posted above are actually reverse-complements of the TrueSeq3-PE.fa. When I use the rev-comp, they are recognized!

I think this might be due to the library construction peculiarities of RNA-seq, but I am not sure. At any rate that's a strange behaviour for the trimmer, is it not?
apredeus is offline   Reply With Quote
Old 04-29-2017, 02:05 PM   #183
apredeus
Senior Member
 
Location: Bioinformatics Institute, SPb

Join Date: Jul 2012
Posts: 151
Default

Quote:
Originally Posted by tonybolger View Post
Enjoy and happy trimming.
Dear Tony,

few small suggestions that can make the program a lot easier to use -

1) just require 1 tag/prefix for the output, and use that to construct the output file names. Four file names make the command unnecessary long and unreadable
2) make a simple wrapper around the jar, sort of like FastQC has. This would also allow to "install" the program and thus have the adapter fasta at a fixed location for the program.
apredeus is offline   Reply With Quote
Old 04-29-2017, 02:15 PM   #184
apredeus
Senior Member
 
Location: Bioinformatics Institute, SPb

Join Date: Jul 2012
Posts: 151
Default

I've also found a strange issue with the "keepBothEnds" option.

In the example I have described two posts above (but with bigger files, 25K reads each), the following command
Code:
java -jar $JAR PE -trimlog trim.log test.R1.fastq test.R2.fastq output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:0:30:10
gives the following output:

Code:
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
Input Read Pairs: 25000 Both Surviving: 18039 (72.16%) Forward Only Surviving: 6948 (27.79%) Reverse Only Surviving: 0 (0.00%) Dropped: 13 (0.05%)
TrimmomaticPE: Completed successfully
while adding the option to keep both reads leads to failure:

Code:
java -jar $JAR PE -trimlog trim.log test.R1.fastq test.R2.fastq output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:0:30:10:2:TRUE
output:

Code:
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
Input Read Pairs: 25000 Both Surviving: 24987 (99.95%) Forward Only Surviving: 0 (0.00%) Reverse Only Surviving: 0 (0.00%) Dropped: 13 (0.05%)
TrimmomaticPE: Completed successfully
This should not happen. My trimmomatic version is 0.36.
apredeus is offline   Reply With Quote
Old 04-29-2017, 02:52 PM   #185
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

Quote:
Originally Posted by apredeus View Post
then I run

Code:
java -jar $JAR PE -trimlog trim.log test.R1.fastq test.R2.fastq output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:test.fa:0:30:10
And the adapter is not recognized or clipped. IT IS, however, successfully clipped in simple mode (e.g. when adapters are renamed)!

Even more curiously, the adapters I've posted above are actually reverse-complements of the TrueSeq3-PE.fa. When I use the rev-comp, they are recognized!

I think this might be due to the library construction peculiarities of RNA-seq, but I am not sure. At any rate that's a strange behaviour for the trimmer, is it not?
One point to note might be that the thresholds you are using in the ILLUMINACLIP command are 30 for the palindrome mode and 10 for the simple mode, and you are not allowing for any mismatches. So it's quite possible that in palindrome mode the score is not reaching the threshold of 30 that trimmomatic needs in order to identify the presence of the adapter, whereas the score of 10 requires fewer matching bases and is more easily reached. Each matching base contributes about 0.6 to the score, according to the trimmomatic manual.

Note that the adapter sequences used for palindrome mode are the reverse complements of the sequences used in simple mode.


Edit: The adapter fasta sequences you have above appear to be the reverse complements of the sequences I have in my version of the TruSeqv3-PE adapter fasta file, which may explain why they didn't work until you used the reverse complements. My version of the TruSeqv3 file is:

>PrefixPE/1
TACACTCTTTCCCTACACGACGCTCTTCCGATCT
>PrefixPE/2
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT

It's also possible that at least one of the 2 adapters is different for the RNA-Seq kits.

Last edited by mastal; 04-29-2017 at 04:14 PM.
mastal is offline   Reply With Quote
Old 04-29-2017, 02:59 PM   #186
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

Quote:
Originally Posted by apredeus View Post
Dear Tony,

few small suggestions that can make the program a lot easier to use -

1) just require 1 tag/prefix for the output, and use that to construct the output file names. Four file names make the command unnecessary long and unreadable
Unless you are using a very old version of trimmomatic, you can use the -basein and -baseout options to specify the prefix of the input and output file names.
mastal is offline   Reply With Quote
Old 04-29-2017, 03:15 PM   #187
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

Quote:
Originally Posted by apredeus View Post
I've also found a strange issue with the "keepBothEnds" option.

while adding the option to keep both reads leads to failure:

Code:
java -jar $JAR PE -trimlog trim.log test.R1.fastq test.R2.fastq output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:0:30:10:2:TRUE
output:

Code:
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
Input Read Pairs: 25000 Both Surviving: 24987 (99.95%) Forward Only Surviving: 0 (0.00%) Reverse Only Surviving: 0 (0.00%) Dropped: 13 (0.05%)
TrimmomaticPE: Completed successfully
This should not happen. My trimmomatic version is 0.36.
Why do you think this indicates failure? You asked trimmomatic to keep both reads of a pair when it finds adapter in palindrome mode, and it appears to have done that.
This does not mean that trimmomatic has not trimmed the adapters.

The trimlog tells you for each read how many bases have been trimmed. Check the trimlog entries for some of the reads that were ending up in the forward_unpaired file previously, and check whether adapters have been trimmed from both the forward and reverse reads, although both reads of the pair have been kept, rather than the reverse read being discarded.
mastal is offline   Reply With Quote
Old 04-29-2017, 04:46 PM   #188
apredeus
Senior Member
 
Location: Bioinformatics Institute, SPb

Join Date: Jul 2012
Posts: 151
Default

Code:
Note that the adapter sequences used for palindrome mode are the reverse complements of the sequences used in simple mode.
Ok, I understand now. And it's the rev-comp of /2 that is present in read 1, and vise versa - so that's why the adapters are "switched". Which I guess makes sense since R1 read-through would lead us to reading adapter ligated to R2.

However, it makes it very confusing to make your own file. It should definitely be switched to sequences matching the simple mode.

Quote:
Edit: The adapter fasta sequences you have above appear to be the reverse complements of the sequences I have in my version of the TruSeqv3-PE adapter fasta file, which may explain why they didn't work until you used the reverse complements. My version of the TruSeqv3 file is:
Yes, I know - I've written before that it works OK with the rev-comp adapters that are also switched. I was just not sure why that was the case.

One remaining question is why does it not work for with the extra option to keep the output paired-ended?
apredeus is offline   Reply With Quote
Old 04-29-2017, 05:36 PM   #189
apredeus
Senior Member
 
Location: Bioinformatics Institute, SPb

Join Date: Jul 2012
Posts: 151
Default

Quote:
Originally Posted by mastal View Post
Why do you think this indicates failure? You asked trimmomatic to keep both reads of a pair when it finds adapter in palindrome mode, and it appears to have done that.
This does not mean that trimmomatic has not trimmed the adapters.
You are right, thank you! The message confused me as well.
apredeus is offline   Reply With Quote
Old 08-16-2018, 11:22 AM   #190
yip
Junior Member
 
Location: CA, USA

Join Date: Oct 2011
Posts: 5
Default

I assume that Trimmomatic processes the reads sequentially, so I don't understand why a run with paired-end fastq input could take >10GB of memory easily. Does anyone know what I am missing?

I'm using trimmomatic-0.36 with 4 threads for a job. The input fastq files are about 4GB in size each.

Thank you!
yip
yip is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:26 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO