![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Trimmomatic quality trimming | kga1978 | Bioinformatics | 27 | 09-21-2020 04:00 PM |
Trimmomatic error while executing | Irina Pulyakhina | Bioinformatics | 15 | 07-03-2015 05:44 AM |
Problem with trimmomatic | amango | Bioinformatics | 9 | 12-29-2013 09:43 AM |
Introducing pBWA [Parallel BWA] | dp05yk | Bioinformatics | 52 | 05-21-2013 11:27 PM |
Introducing our Ion Torrent! | nickloman | Ion Torrent | 34 | 05-26-2011 06:56 PM |
![]() |
|
Thread Tools |
![]() |
#181 |
Member
Location: USA Join Date: Sep 2015
Posts: 25
|
![]()
Hi GeoMax,
Thank you so much for all the help. Yes, I saw that and for that reason I posted the percent surviving report. Yes, I ran FastQC before processing, but after processing FastQC was giving me lots of troubles. Anyway, Before Timming, total sequence was 3032230, and it didn’t pass per base sequence quality and the adapter content. From the graph, the adapter content (which was Nextera Transposase sequence) which was started at 40bp position and went little over 60% till 229 bp position. I am not sure how to interpret this without sending the graph. But I am not sure how to paste this graph. I tried printScrn, it is still not pasting. After trimming, total sequence is 1084634, but this sample passed both per base sequence quality and adapter content. How to check the adapter contamination? Is there a way? I am not clear about your last question: "How long were the reads to begin with (you have asked for minlength 36)?" Yes, I asked for minimum length 36, so it would drop the read if it is below 36. And my code at the end was: ILLUMINACLIP:/home/mydir/Trimmomatic-0.33/adapters/NexteraPE-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 Output was : Using PrefixPair: 'AGATGTGTATAAGAGACAG' and 'AGATGTGTATAAGAGACAG' Using Long Clipping Sequence: 'GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG' Using Long Clipping Sequence: 'TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG' Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTGACGCTGCCGACGA' Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTCCGAGCCCACGAGAC' ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences Input Read Pairs: 3032230 Both Surviving: 1084634 (35.77%) Forward Only Surviving: 1933475 (63.76%) Reverse Only Surviving: 606 (0.02%) Dropped: 13515 (0.45%) TrimmomaticPE: Completed successfully. |
![]() |
![]() |
![]() |
#182 | |
Senior Member
Location: Bioinformatics Institute, SPb Join Date: Jul 2012
Posts: 151
|
![]() Quote:
did you figure out what was the issue? I'm perplexed - I wanted to use Trimmomatic and BBduk and compare the results, however, Trimmomatic just would not remove the adapter that is absolutely certainly there and could be found using simply using grep. And they are precisely the case for the palindrome mode. So I have my test reads that are just two reads - one R1, one R2. Code:
@E00513:47:HF757ALXX:1:1101:17208:2047 1:N:0:ATCACG CNAAAAAAAAGATTGCGACCTCGATGTTGGATTAAAATGAACTTTTGGCGCAAAAGTTAAAAGGGTTAGGTCTGTTCGACCTTTAAAATTTTAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCACGTATGCCGTCTTC + <#AAFAJJJJJFJFJFJJJJF<AFJJJJJJFFFJJJJJ<JF-FJJJJJFJA<-FAF-FFJAJFJFAFJJJAAF<<7<JJJ<JJJAJJJJJJJJJFJFJFFFJJFA<FJ<FJ7F<7--7F<JAFAAAFFFJJJ<-FA7FAF<J77AFJJ<- @E00513:47:HF757ALXX:1:1101:17208:2047 2:N:0:ATCACG AAAATTTTAAAGGTCGAACAGACCTAACCCTTTTAACTTTTGCGCCCAAAGTTCATTTTAATCCAACATCGAGGTCGCAATCTTTTTTTTCGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGACGTATCAAT + AAF<FFAFAJJJJAJFJJFJJFJJJAJJJJJFJFFFJJJJJFAF<<-A<-<FFFJJJJFJJ<FFFFJ7<F-77A--7<<FJJ7-<AFF<F7AFJ-7--7--<-77-77-A)7-FJ<7A-7FA<-----77--A))7)7)<)))7<----- Code:
>PrefixPE/1 AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC >PrefixPE/2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA then I run Code:
java -jar $JAR PE -trimlog trim.log test.R1.fastq test.R2.fastq output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:test.fa:0:30:10 Even more curiously, the adapters I've posted above are actually reverse-complements of the TrueSeq3-PE.fa. When I use the rev-comp, they are recognized! I think this might be due to the library construction peculiarities of RNA-seq, but I am not sure. At any rate that's a strange behaviour for the trimmer, is it not? |
|
![]() |
![]() |
![]() |
#183 |
Senior Member
Location: Bioinformatics Institute, SPb Join Date: Jul 2012
Posts: 151
|
![]()
Dear Tony,
few small suggestions that can make the program a lot easier to use - 1) just require 1 tag/prefix for the output, and use that to construct the output file names. Four file names make the command unnecessary long and unreadable 2) make a simple wrapper around the jar, sort of like FastQC has. This would also allow to "install" the program and thus have the adapter fasta at a fixed location for the program. |
![]() |
![]() |
![]() |
#184 |
Senior Member
Location: Bioinformatics Institute, SPb Join Date: Jul 2012
Posts: 151
|
![]()
I've also found a strange issue with the "keepBothEnds" option.
In the example I have described two posts above (but with bigger files, 25K reads each), the following command Code:
java -jar $JAR PE -trimlog trim.log test.R1.fastq test.R2.fastq output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:0:30:10 Code:
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT' ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences Quality encoding detected as phred33 Input Read Pairs: 25000 Both Surviving: 18039 (72.16%) Forward Only Surviving: 6948 (27.79%) Reverse Only Surviving: 0 (0.00%) Dropped: 13 (0.05%) TrimmomaticPE: Completed successfully Code:
java -jar $JAR PE -trimlog trim.log test.R1.fastq test.R2.fastq output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:0:30:10:2:TRUE Code:
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT' ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences Quality encoding detected as phred33 Input Read Pairs: 25000 Both Surviving: 24987 (99.95%) Forward Only Surviving: 0 (0.00%) Reverse Only Surviving: 0 (0.00%) Dropped: 13 (0.05%) TrimmomaticPE: Completed successfully |
![]() |
![]() |
![]() |
#185 | |
Senior Member
Location: uk Join Date: Mar 2009
Posts: 667
|
![]() Quote:
Note that the adapter sequences used for palindrome mode are the reverse complements of the sequences used in simple mode. Edit: The adapter fasta sequences you have above appear to be the reverse complements of the sequences I have in my version of the TruSeqv3-PE adapter fasta file, which may explain why they didn't work until you used the reverse complements. My version of the TruSeqv3 file is: >PrefixPE/1 TACACTCTTTCCCTACACGACGCTCTTCCGATCT >PrefixPE/2 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT It's also possible that at least one of the 2 adapters is different for the RNA-Seq kits. Last edited by mastal; 04-29-2017 at 04:14 PM. |
|
![]() |
![]() |
![]() |
#186 |
Senior Member
Location: uk Join Date: Mar 2009
Posts: 667
|
![]()
Unless you are using a very old version of trimmomatic, you can use the -basein and -baseout options to specify the prefix of the input and output file names.
|
![]() |
![]() |
![]() |
#187 | |
Senior Member
Location: uk Join Date: Mar 2009
Posts: 667
|
![]() Quote:
This does not mean that trimmomatic has not trimmed the adapters. The trimlog tells you for each read how many bases have been trimmed. Check the trimlog entries for some of the reads that were ending up in the forward_unpaired file previously, and check whether adapters have been trimmed from both the forward and reverse reads, although both reads of the pair have been kept, rather than the reverse read being discarded. |
|
![]() |
![]() |
![]() |
#188 | |
Senior Member
Location: Bioinformatics Institute, SPb Join Date: Jul 2012
Posts: 151
|
![]() Code:
Note that the adapter sequences used for palindrome mode are the reverse complements of the sequences used in simple mode. However, it makes it very confusing to make your own file. It should definitely be switched to sequences matching the simple mode. Quote:
One remaining question is why does it not work for with the extra option to keep the output paired-ended? |
|
![]() |
![]() |
![]() |
#189 |
Senior Member
Location: Bioinformatics Institute, SPb Join Date: Jul 2012
Posts: 151
|
![]()
You are right, thank you! The message confused me as well.
|
![]() |
![]() |
![]() |
#190 |
Junior Member
Location: CA, USA Join Date: Oct 2011
Posts: 5
|
![]()
I assume that Trimmomatic processes the reads sequentially, so I don't understand why a run with paired-end fastq input could take >10GB of memory easily. Does anyone know what I am missing?
I'm using trimmomatic-0.36 with 4 threads for a job. The input fastq files are about 4GB in size each. Thank you! yip |
![]() |
![]() |
![]() |
Thread Tools | |
|
|