![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Introducing BBDuk: Adapter/Quality Trimming and Filtering | Brian Bushnell | Bioinformatics | 335 | 10-29-2020 03:23 AM |
Primer filtering with bbduk / bbduk2.sh | Latrunculia | Illumina/Solexa | 0 | 10-07-2016 06:33 AM |
How do you specify error rate in BBduk adapter trimming? | antifolate | Bioinformatics | 11 | 07-07-2016 04:00 PM |
Illumina-Tag sequencing, filtering homopolymers/entropy based filtering | tonybert | Bioinformatics | 0 | 12-30-2014 02:23 PM |
AddOrReplaceReadGroups error (that is probably a more general system or Java error) | efoss | Bioinformatics | 4 | 12-24-2012 04:01 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: South Africa Join Date: Sep 2013
Posts: 12
|
![]()
Hello,
I am having an issue with a java error when trying to use BBDuk to remove low entropy sequences from a fastq file. The libraries were made using ribozero so there are a number of polyT sequences I would like to remove. I have previously used BBDuk on the same library to remove PhiX an adapter sequences with no problem. The file has ~135 million 100bp SE reads. I am running on a node with 24 cores and 128 GiB RAM running CentOS Linux release 7.3.1611 and java version "1.7.0_131". I get this error with or without the -Xmx flag. Code:
$ bbduk.sh in=seq.fq out=seq_0-1-entrop-filtered.fq outm=low_complexity-0-1.fq entropy=0.1 java -Djava.library.path=/apps/chpc/bio/bbmap/jni/ -ea -Xmx24052m -Xms24052m -cp /apps/chpc/bio/bbmap/current/ jgi.BBDukF in=fish-coral_1_filtered_clean.fq out=fish-coral_1_filtered_clean_0-1-entrop-filtered.fq outm=low_complexity-0-1.fq entropy=0.1 Executing jgi.BBDukF [in=seq.fq, out=seq_0-1-entrop-filtered.fq, outm=low_complexity-0-1.fq, entropy=0.1] Version 37.90 [in=seq.fq, out=seq_0-1-entrop-filtered.fq, outm=low_complexity-0-1.fq, entropy=0.1] Initial: Memory: max=24170m, free=23665m, used=505m Input is being processed as unpaired Started output streams: 0.051 seconds. Exception in thread "Thread-6" java.lang.ArrayIndexOutOfBoundsException: 39 at structures.EntropyTracker.averageEntropy(EntropyTracker.java:302) at structures.EntropyTracker.passes(EntropyTracker.java:348) at jgi.BBDukF$ProcessThread.run(BBDukF.java:2583) Exception in thread "Thread-28" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-8" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-11" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-27" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-24" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-16" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-10" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-13" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-21" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-23" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-22" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-12" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-17" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-20" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-18" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-15" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-26" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-19" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-29" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-7" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-14" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-9" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-25" java.lang.ArrayIndexOutOfBoundsException Processing time: 0.384 seconds. Input: 34841 reads 3436691 bases. Low entropy discards: 2157 reads (6.19%) 215168 bases (6.26%) Total Removed: 2181 reads (6.26%) 216121 bases (6.29%) Result: 32660 reads (93.74%) 3220570 bases (93.71%) Time: 0.459 seconds. Reads Processed: 34841 75.88k reads/sec Bases Processed: 3436k 7.48m bases/sec Thank you. |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,082
|
![]()
Can you only try "-Xmx24052m threads=12" ? Don't use -Xms=.
|
![]() |
![]() |
![]() |
#3 |
Member
Location: South Africa Join Date: Sep 2013
Posts: 12
|
![]()
Hi,
Thank you for the suggestion. With the "-Xmx24052m threads=12" flag it runs with 12 threads and the memory but still has an ArrayIndexOutOfBoundsException in multiple threads... ![]() Code:
Set threads to 12 Initial: Memory: max=25224m, free=25194m, used=30m Input is being processed as unpaired Started output streams: 1.357 seconds. Exception in thread "Thread-6" java.lang.ArrayIndexOutOfBoundsException: 39 at structures.EntropyTracker.averageEntropy(EntropyTracker.java:302) at structures.EntropyTracker.passes(EntropyTracker.java:348) at jgi.BBDukF$ProcessThread.run(BBDukF.java:2583) Exception in thread "Thread-9" java.lang.ArrayIndexOutOfBoundsException: 30 at structures.EntropyTracker.averageEntropy(EntropyTracker.java:302) at structures.EntropyTracker.passes(EntropyTracker.java:348) at jgi.BBDukF$ProcessThread.run(BBDukF.java:2583) Exception in thread "Thread-12" Exception in thread "Thread-7" java.lang.ArrayIndexOutOfBoundsException java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-14" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-15" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-17" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-11" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-10" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-8" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-13" java.lang.ArrayIndexOutOfBoundsException Exception in thread "Thread-16" java.lang.ArrayIndexOutOfBoundsException Processing time: 1.649 seconds. Input: 19931 reads 1964689 bases. Low entropy discards: 1237 reads (6.21%) 123216 bases (6.27%) Total Removed: 1249 reads (6.27%) 123686 bases (6.30%) Result: 18682 reads (93.73%) 1841003 bases (93.70%) Time: 3.658 seconds. Reads Processed: 19931 5.45k reads/sec Bases Processed: 1964k 0.54m bases/sec @D00278:496:CC4LRANXX:7:1109:7642:2397 1:N:0:1 AAGACGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT + BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFFFFFFFFFFFFFFFFFF Thanks, Dave |
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,082
|
![]()
You can use something like
Code:
literal=TTTTTTTT k=5 |
![]() |
![]() |
![]() |
#5 |
Member
Location: South Africa Join Date: Sep 2013
Posts: 12
|
![]()
Hi,
Thank you, I finally got it to work with literal, but I had to increase the length of the literal to 20 X T and increase the kmer to 25. With literal=TTTTTTTT k=5 I got 98% of the reads filtered out, even with mm=false and hdist=0... Ok, another question - how can I use bbduk to split my file into multiple files based on a inline barcode of 8 bp at the 5' end? I have 100 bp SE end reads and they are multiplexed using a 32 X 8 base barcodes. I have used sabre before but it is on another machine and I would like to avoid transferring files between different servers if possible. Cheers, Dave |
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,082
|
![]()
If you know the barcode sequences then you could run bbduk in "match" mode and require the match to be strict of 8 bp on 5'-end of the read (restrictleft=7). You may have to try a few command options out to see what works best.
|
![]() |
![]() |
![]() |
Tags |
bbduk, illumina, java, read trimming |
Thread Tools | |
|
|