Originally posted by GenoMax
View Post
Code:
clumpify.sh in=file.fq out=deduped.fq dedupe addcount
You are currently viewing the SEQanswers forums as a guest, which limits your access. Click here to register now, and join the discussion
clumpify.sh in=file.fq out=deduped.fq dedupe addcount
clumpify.sh -Xmx44g in=$INPUT out=$OUTPUT.clumped.fq.gz reorder=t overwrite=t Time: 3540.405 seconds. [COLOR="red"]Reads Processed: 27914k 7.88k reads/sec[/COLOR] Bases Processed: 1423m 0.40m bases/sec Reads In: 27914336 Clumps Formed: 2997579 Total time: 3540.575 seconds.
clumpify.sh -Xmx48g in=$INPUT out=$OUTPUT.clumped.fq.gz reorder=t overwrite=t \ dedupe=t addcount=t dupedist=40 optical=t Time: 38030.384 seconds. [COLOR="Red"]Reads Processed: 27914k 0.73k reads/sec[/COLOR] Bases Processed: 1423m 0.04m bases/sec Reads In: 27914336 Clumps Formed: 2997579 Duplicates Found: 27931 Total time: 41299.460 seconds.
dedupe=f optical=f (default) Nothing happens with regards to duplicates. dedupe=t optical=f All duplicates are detected, whether optical or not. All copies except one are removed for each duplicate. dedupe=f optical=t Nothing happens. dedupe=t optical=t Only optical duplicates (those with an X or Y coordinate within dist) are detected. All copies except one are removed for each duplicate. The allduplicates flag makes all copies of duplicates removed, rather than leaving a single copy. But like optical, it has no effect unless dedupe=t.
clumpify.sh in=reads.fq.gz out=clumped.fq.gz reorder Time: 40.496 seconds. Reads Processed: 18059k 445.97k reads/sec Bases Processed: 1824m 45.04m bases/sec Reads In: 18059968 Clumps Formed: 809674 Total time: 40.680 seconds.
clumpify.sh in=reads.fq.gz out=clumped2.fq.gz reorder dedupe optical dupedist=40 addcount Time: 39.087 seconds. Reads Processed: 18059k 462.04k reads/sec Bases Processed: 1824m 46.67m bases/sec Reads In: 18059968 Clumps Formed: 405473 Duplicates Found: 864 Total time: 39.135 seconds.
reformat.sh in=reads.fq.gz out=1pair.fq reads=1 cat 1pair.fq 1pair.fq 1pair.fq > 3pair.fq clumpify.sh in=3pair.fq out=deduped_3pair.fq dedupe addcount cat deduped_3pair.fq @HISEQ13:296:CA8E8ANXX:7:1101:1432:1903 1:N:0:ACGGAACTGTTCCG copies=3 NTTGGATTGCATTAAGGAGCGAGGATGAGCATTCCATTCACCCGCTGGCCGGAAGAGTTTGCCCGTCGCTATCGGGAAAAAGGCTACTGGCAGGATTTGCC + !<=@=BFDGGGGGGBDDFGGGBGDGGFGGGGGGGGGGGGGGGGGGGGGBG@C/>EGGGGGGCG>DGGGGGGGCGGGGGGGGB>GGGGGGGGGGGGGCDGED @HISEQ13:296:CA8E8ANXX:7:1101:1432:1903 2:N:0:ACGGAACTGTTCCG copies=3 TCGTTGAGCAGTTGCACCACGCGAATGGAGGAATGTTCTGTGACGAAAGTATTGAGGAAATCATCCCCGCTAAACAGCGCATGTTGGCGATCGGCAATCAG + BBBBBF0=1;@FGGDDGEGGDGD/EGGG:>E/BGGFDGGGGG@FGBEFGGEGGGGGDCGGCDGGCCCGGBGGDDGGEGGD>GGGGGGFD8?D<FGDG/EGC
PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20 0 56.006g 0.028t 13060 S 100.3 22.8 343:27.03 java 20 0 48424 1864 1324 S 0.0 0.0 0:00.00 pigz
PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20 0 56.006g 0.028t 13060 S 100.3 22.8 343:27.03 java 20 0 48424 1864 1324 S 0.0 0.0 0:00.00 pigz
openjdk version "1.8.0_121" OpenJDK Runtime Environment (build 1.8.0_121-b13) OpenJDK 64-Bit Server VM (build 25.121-b13, mixed mode) java -ea -Xmx48g -Xms48g -cp ~/package/bbmap/current/ clump.Clumpify -Xmx48g in=in.fastq.gz out=out.fq.bz2 dedupe=t addcount=t dupedist=40 optical=t Executing clump.Clumpify [-Xmx48g, in=in.fastq.gz, out=out.clumped.fq.bz2, dedupe=t, addcount=t, dupedist=40, optical=t] Clumpify version 37.17 Read Estimate: 30447286 Memory Estimate: 13555 MB Memory Available: 38586 MB Set groups to 1 Executing clump.KmerSort [in1=in.fas in2=null, out1=out.clumped.fq.bz2, out2=null, groups=1, ecco=false, rename=false, shortname=f, [COLOR="Red"]unpair=false[/COLOR], repair=false, namesort=false, ow=true, -Xmx48g, dedupe=t, addcount=t, dupedist=40, optical=t] Making comparator. Made a comparator with k=31, seed=1, border=1, hashes=4 Starting cris 0. Fetching reads. Making fetch threads. Starting threads. Waiting for threads. Fetch time: 95.473 seconds. Closing input stream. Combining thread output. Combine time: 0.133 seconds. Sorting. Sort time: 36.426 seconds. Making clumps. Clump time: 6.788 seconds. Deduping.
PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20 0 56.205g 0.023t 13008 S 99.7 18.7 10:00.83 java 20 0 188364 3932 2448 R 0.3 0.0 0:00.75 top 20 0 81452 1256 1012 S 0.0 0.0 0:00.00 pbzip2
Executing clump.KmerSplit [in1=in.fastq.gz, in2=null, out=out.fq.bz2, out2=null, groups=11, ecco=false, addname=f, shortname=f, [COLOR="Blue"]unpair=true[/COLOR], repair=f, namesort=f, ow=true, -Xmx16g, dedupe=t, addcount=t, dupedist=40, optical=t] Input is being processed as unpaired Made a comparator with k=31, seed=1, border=1, hashes=4 Time: 54.743 seconds. Reads Processed: 27914k 509.91k reads/sec Bases Processed: 1423m 26.01m bases/sec Executing clump.KmerSort3 [in1=in.clumped_clumpify_p1_temp%_2e77f340ad809301.fq.bz2, in2=null, out=out.clumped.fq.bz2, out2=null, groups=11, ecco=f, addname=false, shortname=f, [COLOR="red"]unpair=f[/COLOR], repair=false, namesort=false, ow=true, -Xmx16g, dedupe=t, addcount=t, dupedist=40, optical=t] Making comparator. Made a comparator with k=31, seed=1, border=1, hashes=4 Making 2 fetch threads. Starting threads. Fetching reads. Exception in thread "Thread-57" *Control-C or similar caught [sig=15], quitting... Exception in thread "Thread-58" Terminator thread: premature exit requested - quitting... java.lang.RuntimeException: Duplicate process for file nutrinetat_vegb11_edi-0_r2y.clumped_clumpify_p1_temp0_2e77f340ad809301.fq.bz2 at fileIO.ReadWrite.addProcess(ReadWrite.java:1599) at fileIO.ReadWrite.getInputStreamFromProcess(ReadWrite.java:1050) at fileIO.ReadWrite.getUnpbzip2Stream(ReadWrite.java:986) at fileIO.ReadWrite.getBZipInputStream2(ReadWrite.java:1086) at fileIO.ReadWrite.getBZipInputStream(ReadWrite.java:1066) at fileIO.ReadWrite.getInputStream(ReadWrite.java:802) at fileIO.ByteFile1.open(ByteFile1.java:261) at fileIO.ByteFile1.<init>(ByteFile1.java:96) at fileIO.ByteFile.makeByteFile(ByteFile.java:26) at stream.FastqReadInputStream.<init>(FastqReadInputStream.java:61) at stream.ConcurrentReadInputStream.getReadInputStream(ConcurrentReadInputStream.java:119) at stream.ConcurrentReadInputStream.getReadInputStream(ConcurrentReadInputStream.java:55) at clump.KmerSort3$FetchThread.fetchNext(KmerSort3.java:853) at clump.KmerSort3$FetchThread.run(KmerSort3.java:825)
PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20 0 188332 3956 2452 R 0.3 0.0 0:04.02 top 20 0 23.961g 4.125g 12916 S 0.0 1.6 0:42.05 java 20 0 81452 1256 1012 S 0.0 0.0 0:00.01 pbzip2 20 0 1950332 173572 1056 S 0.0 0.1 0:03.36 pbzip2 20 0 1941568 173768 1052 S 0.0 0.1 0:03.49 pbzip2
bushnell@gpint209:/global/projectb/scratch/bushnell/chiayi$ clumpify.sh in=chiayi.fq.gz out=clumped.fq.gz -Xmx63g java version "1.8.0_31" Java(TM) SE Runtime Environment (build 1.8.0_31-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode) java -ea -Xmx63g -Xms63g -cp /global/projectb/sandbox/gaag/bbtools/jgi-bbtools/current/ clump.Clumpify in=chiayi.fq.gz out=clumped.fq.gz -Xmx63g Executing clump.Clumpify [in=chiayi.fq.gz, out=clumped.fq.gz, -Xmx63g] Clumpify version 37.22 Read Estimate: 30447286 Memory Estimate: 13555 MB Memory Available: 50656 MB Set groups to 1 Executing clump.KmerSort [in1=chiayi.fq.gz, in2=null, out1=clumped.fq.gz, out2=null, groups=1, ecco=false, rename=false, shortname=f, unpair=false, repair=false, namesort=false, ow=true, -Xmx63g] Making comparator. Made a comparator with k=31, seed=1, border=1, hashes=4 Starting cris 0. Fetching reads. Making fetch threads. Starting threads. Waiting for threads. Fetch time: 18.757 seconds. Closing input stream. Combining thread output. Combine time: 0.127 seconds. Sorting. Sort time: 3.903 seconds. Making clumps. Clump time: 1.112 seconds. Writing. Waiting for writing to complete. Write time: 15.535 seconds. Done! Time: 39.821 seconds. Reads Processed: 27914k 701.00k reads/sec Bases Processed: 1423m 35.75m bases/sec Reads In: 27914336 Clumps Formed: 2997579 Total time: 40.089 seconds. bushnell@gpint209:/global/projectb/scratch/bushnell/chiayi$ clumpify.sh in=chiayi.fq.gz out=clumped.fq.gz -Xmx63g reorder java version "1.8.0_31" Java(TM) SE Runtime Environment (build 1.8.0_31-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode) java -ea -Xmx63g -Xms63g -cp /global/projectb/sandbox/gaag/bbtools/jgi-bbtools/current/ clump.Clumpify in=chiayi.fq.gz out=clumped.fq.gz -Xmx63g reorder Executing clump.Clumpify [in=chiayi.fq.gz, out=clumped.fq.gz, -Xmx63g, reorder] Clumpify version 37.22 Read Estimate: 30447286 Memory Estimate: 13555 MB Memory Available: 50656 MB Set groups to 1 Executing clump.KmerSort [in1=chiayi.fq.gz, in2=null, out1=clumped.fq.gz, out2=null, groups=1, ecco=false, rename=false, shortname=f, unpair=false, repair=false, namesort=false, ow=true, -Xmx63g, reorder] Making comparator. Made a comparator with k=31, seed=1, border=1, hashes=4 Starting cris 0. Fetching reads. Making fetch threads. Starting threads. Waiting for threads. Fetch time: 18.471 seconds. Closing input stream. Combining thread output. Combine time: 0.170 seconds. Sorting. Sort time: 4.112 seconds. Making clumps. Clump time: 19.301 seconds. Writing. Waiting for writing to complete. Write time: 13.423 seconds. Done! Time: 56.050 seconds. Reads Processed: 27914k 498.02k reads/sec Bases Processed: 1423m 25.40m bases/sec Reads In: 27914336 Clumps Formed: 2997579 Total time: 56.125 seconds. bushnell@gpint209:/global/projectb/scratch/bushnell/chiayi$ clumpify.sh in=chiayi.fq.gz out=clumped.fq.gz -Xmx63g reorder dedupe java version "1.8.0_31" Java(TM) SE Runtime Environment (build 1.8.0_31-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode) java -ea -Xmx63g -Xms63g -cp /global/projectb/sandbox/gaag/bbtools/jgi-bbtools/current/ clump.Clumpify in=chiayi.fq.gz out=clumped.fq.gz -Xmx63g reorder dedupe Executing clump.Clumpify [in=chiayi.fq.gz, out=clumped.fq.gz, -Xmx63g, reorder, dedupe] Clumpify version 37.22 Read Estimate: 30447286 Memory Estimate: 13555 MB Memory Available: 50656 MB Set groups to 1 Executing clump.KmerSort [in1=chiayi.fq.gz, in2=null, out1=clumped.fq.gz, out2=null, groups=1, ecco=false, rename=false, shortname=f, unpair=false, repair=false, namesort=false, ow=true, -Xmx63g, reorder, dedupe] Making comparator. Made a comparator with k=31, seed=1, border=1, hashes=4 Starting cris 0. Fetching reads. Making fetch threads. Starting threads. Waiting for threads. Fetch time: 18.377 seconds. Closing input stream. Combining thread output. Combine time: 0.174 seconds. Sorting. Sort time: 4.421 seconds. Making clumps. Clump time: 19.694 seconds. Deduping. Dedupe time: 0.767 seconds. Writing. Waiting for writing to complete. Write time: 5.675 seconds. Done! Time: 49.223 seconds. Reads Processed: 27914k 567.09k reads/sec Bases Processed: 1423m 28.92m bases/sec Reads In: 27914336 Clumps Formed: 2997579 Duplicates Found: 20066115 Total time: 49.299 seconds.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9032 bushnell 20 0 66.1g 16g 11m S 1474 13.5 1:16.19 java 9177 bushnell 20 0 666m 229m 1028 S 600 0.2 1:45.25 pbzip2
bushnell@gpint209:/global/projectb/scratch/bushnell/chiayi$ clumpify.sh in=chiayi.fq.bz2 out=clumped.fq.bz2 -Xmx63g reorder dedupe java version "1.8.0_31" Java(TM) SE Runtime Environment (build 1.8.0_31-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode) java -ea -Xmx63g -Xms63g -cp /global/projectb/sandbox/gaag/bbtools/jgi-bbtools/current/ clump.Clumpify in=chiayi.fq.bz2 out=clumped.fq.bz2 -Xmx63g reorder dedupe Executing clump.Clumpify [in=chiayi.fq.bz2, out=clumped.fq.bz2, -Xmx63g, reorder, dedupe] Clumpify version 37.22 Read Estimate: 36800962 Memory Estimate: 28076 MB Memory Available: 50656 MB Set groups to 1 Executing clump.KmerSort [in1=chiayi.fq.bz2, in2=null, out1=clumped.fq.bz2, out2=null, groups=1, ecco=false, rename=false, shortname=f, unpair=false, repair=false, namesort=false, ow=true, -Xmx63g, reorder, dedupe] Making comparator. Made a comparator with k=31, seed=1, border=1, hashes=4 Starting cris 0. Fetching reads. Making fetch threads. Starting threads. Waiting for threads. Fetch time: 18.779 seconds. Closing input stream. Combining thread output. Combine time: 0.153 seconds. Sorting. Sort time: 4.351 seconds. Making clumps. Clump time: 21.613 seconds. Deduping. Dedupe time: 0.795 seconds. Writing. Waiting for writing to complete. Write time: 6.608 seconds. Done! Time: 52.520 seconds. Reads Processed: 27914k 531.50k reads/sec Bases Processed: 1423m 27.11m bases/sec Reads In: 27914336 Clumps Formed: 2997579 Duplicates Found: 20066115 Total time: 52.575 seconds.
Executing clump.Clumpify [-Xmx16g, in=in.fastq.gz, out=out.fq.gz, dedupe, reorder]
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 97955 cc5544 20 0 25.117g 9.998g 12388 S 100.0 4.0 36:38.45 java 98052 cc5544 20 0 1924368 17736 700 S 0.0 0.0 1:30.90 pigz
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 03:17 PM
|
0 responses
10 views
0 likes
|
Last Post
by seqadmin
Today, 03:17 PM
|
||
Started by seqadmin, 01-03-2025, 11:18 AM
|
1 response
35 views
1 like
|
Last Post
by Tonia
01-05-2025, 12:15 PM
|
||
Started by seqadmin, 12-30-2024, 01:35 PM
|
0 responses
40 views
0 likes
|
Last Post
by seqadmin
12-30-2024, 01:35 PM
|
||
Started by seqadmin, 12-17-2024, 10:28 AM
|
0 responses
45 views
0 likes
|
Last Post
by seqadmin
12-17-2024, 10:28 AM
|
Comment