Seqanswers Leaderboard Ad

**Brian Bushnell** · 06-29-2016, 09:36 AM

The number of threads and processes are different. And that message does not display elapsed time, but CPU time, which is (for a fully-multithreaded program) #threads times elapsed time. So, it won't change as you adjust the number of threads. Your command looks fine to me; you are presumably using 16 worker threads. But you can test that by comparing the elapsed time to the CPU time and seeing if one is in fact 16 times the other (just add "time " before "java" in the command line). Or run top on the node the process is running and look at the CPU utilization. If you want it to go faster, though, you can try BBDuk instead.

**mslider** · 06-29-2016, 11:27 AM

--hi,

thanks for your help Brian, i don't know BBDuk, i'm going to test your program.
Do you think it's necessary to use the java garbage collector parameters such as: -XX:ParallelGCThreads=4 -XX:+DoEscapeAnalysis to optimize the process ?

thnak you --

**Brian Bushnell** · 06-29-2016, 12:35 PM

No. I never use those. Theoretically "-XX:+DoEscapeAnalysis" might increase speed a tiny amount, but it's the kind of thing that typically gives timing results within the margin of error, and varies substantially between Java versions (both in its effects and what the default setting is); it's entirely possible that it's already enabled by default. There are a lot of non-default Java flags you can add and they mostly just increase the length of your command line. And I don't really see any reason to cap the number of parallel GC threads at 4 in any case, unless you are running on a shared system and are only allowed to use 4 threads max.

Well, anyway, I thought I'd test it, as an example...

Code:

D:\temp\contam>java -ea -Xmx1g jgi.SplitPairsAndSingles rp in=ecc31.fq.gz unpigz
Executing jgi.SplitPairsAndSingles [rp, in=ecc31.fq.gz, unpigz]

Set INTERLEAVED to false
No output stream specified.  To write to stdout, please specify 'out=stdout.fq' or similar.

Input:                          12000000 reads          1797457160 bases.
Result:                         12000000 reads (100.00%)        1797457160 bases (100.00%)
Pairs:                          12000000 reads (100.00%)        1797457160 bases (100.00%)
Singletons:                     0 reads (0.00%)         0 bases (0.00%)

Time:                           20.177 seconds.
Reads Processed:      12000k    594.73k reads/sec
Bases Processed:       1797m    89.08m bases/sec

Code:

D:\temp\contam>java -XX:ParallelGCThreads=4 -XX:+DoEscapeAnalysis -ea -Xmx1g -cp D:\temp\BBTools_public\BBMap_35.92\bbmap\current jgi.SplitPairsAndSingles rp in=ecc31.fq.gz unpigz
Executing jgi.SplitPairsAndSingles [rp, in=ecc31.fq.gz, unpigz]

Set INTERLEAVED to false
No output stream specified.  To write to stdout, please specify 'out=stdout.fq' or similar.

Input:                          12000000 reads          1797457160 bases.
Result:                         12000000 reads (100.00%)        1797457160 bases (100.00%)
Pairs:                          12000000 reads (100.00%)        1797457160 bases (100.00%)
Singletons:                     0 reads (0.00%)         0 bases (0.00%)

Time:                           20.242 seconds.
Reads Processed:      12000k    592.83k reads/sec
Bases Processed:       1797m    88.80m bases/sec

That's a 1.7% speed difference (in favor of NOT using those flags); under the margin of error which is around 2.5%, after running it multiple times.

**mslider** · 06-30-2016, 12:00 AM

Brian, thank you for this detailled answer and quick test.
I have installed BBmap and BBDuck to test it but i don't find an easy method to convert my trimmomatic command line below to BBDuck command line:

java -Xmx8g -jar trimmomatic-0.36.jar PE -threads 16 -phred33 ERR532589_1.fastq.gz ERR532589_2.fastq.gz Out_ERR532589_1.fastq.gz Out_unpaired_ERR532589_1.fastq.gz Out_ERR532589_2.fastq.gz Out_unpaired_ERR532589_2.fastq.gz ILLUMINACLIP:TruSeq3-PE-2.fa:2:40:15:8:true LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

Do you have some examples that can help me please ?

thank you,
Laurent --

**Brian Bushnell** · 06-30-2016, 09:25 AM

Hi Laurent,

There are examples in /bbmap/docs/guides/BBDukGuide.txt

But for reference, I recommend this command:

Code:

bbduk.sh -Xmx1g t=16 in=ERR532589_#.fastq.gz out=Out_ERR532589_#.fastq.gz ref=adapters.fa ktrim=r k=23 mink=11 hdist=1 tbo tpe qtrim=rl trimq=12 minlen=36

**tonybolger** · 06-30-2016, 10:19 AM

Originally posted by mslider View Post

What is the best way to optimize my command line, it seems that if i put 4,6..or 16 theads the behaviour and the elapsed time still the same.

I have already given you the long answer by email, but for the benefits of the rest of the community, the key points are:

Use of compressed output is the typical bottleneck in Trimmomatic - this part is (currently) limited to one thread per output file, and in many cases, 2 - 4 worker threads are already enough to move the bottleneck to the output compression threads.
Assuming you want output compression, you probably want to run multiple datasets in parallel (using multiple Trimmomatic processes using e.g. the shell, xargs or queuing system jobs) and use e.g 4 worker threads each.
Input decompression is also one thread per file, but since decompression is much faster, use of compressed input files only matters if you are really pushing things, beyond e.g. 12 worker threads. And you will need a decent disk setup to avoid it being a bottleneck.

Hope this helps

**mslider** · 06-30-2016, 11:12 AM

okay good,
Thank you Tony i have received your message by email.
Brian -> thank you so for the command line.

Mark.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 56 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

trimmomatic-0.36 problem

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News