SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   BBDuk java error when filtering using entropy? (http://seqanswers.com/forums/showthread.php?t=80696)

DrYak 02-15-2018 06:37 AM

BBDuk java error when filtering using entropy?
 
Hello,

I am having an issue with a java error when trying to use BBDuk to remove low entropy sequences from a fastq file. The libraries were made using ribozero so there are a number of polyT sequences I would like to remove.

I have previously used BBDuk on the same library to remove PhiX an adapter sequences with no problem.

The file has ~135 million 100bp SE reads.

I am running on a node with 24 cores and 128 GiB RAM running CentOS Linux release 7.3.1611 and java version "1.7.0_131".

I get this error with or without the -Xmx flag.

Code:

$ bbduk.sh in=seq.fq out=seq_0-1-entrop-filtered.fq outm=low_complexity-0-1.fq entropy=0.1
java -Djava.library.path=/apps/chpc/bio/bbmap/jni/ -ea -Xmx24052m -Xms24052m -cp /apps/chpc/bio/bbmap/current/ jgi.BBDukF in=fish-coral_1_filtered_clean.fq out=fish-coral_1_filtered_clean_0-1-entrop-filtered.fq outm=low_complexity-0-1.fq entropy=0.1
Executing jgi.BBDukF [in=seq.fq, out=seq_0-1-entrop-filtered.fq, outm=low_complexity-0-1.fq, entropy=0.1]
Version 37.90 [in=seq.fq, out=seq_0-1-entrop-filtered.fq, outm=low_complexity-0-1.fq, entropy=0.1]

Initial:
Memory: max=24170m, free=23665m, used=505m

Input is being processed as unpaired
Started output streams: 0.051 seconds.
Exception in thread "Thread-6" java.lang.ArrayIndexOutOfBoundsException: 39
        at structures.EntropyTracker.averageEntropy(EntropyTracker.java:302)
        at structures.EntropyTracker.passes(EntropyTracker.java:348)
        at jgi.BBDukF$ProcessThread.run(BBDukF.java:2583)
Exception in thread "Thread-28" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-8" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-11" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-27" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-24" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-16" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-10" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-13" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-21" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-23" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-22" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-12" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-17" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-20" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-18" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-15" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-26" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-19" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-29" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-7" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-14" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-9" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-25" java.lang.ArrayIndexOutOfBoundsException
Processing time:                0.384 seconds.

Input:                          34841 reads            3436691 bases.
Low entropy discards:          2157 reads (6.19%)      215168 bases (6.26%)
Total Removed:                  2181 reads (6.26%)      216121 bases (6.29%)
Result:                        32660 reads (93.74%)    3220570 bases (93.71%)

Time:                          0.459 seconds.
Reads Processed:      34841    75.88k reads/sec
Bases Processed:      3436k    7.48m bases/sec

Any suggestions as to what might be the issue?

Thank you.

GenoMax 02-15-2018 06:52 AM

Can you only try "-Xmx24052m threads=12" ? Don't use -Xms=.

DrYak 02-18-2018 11:08 PM

Quote:

Originally Posted by GenoMax (Post 214925)
Can you only try "-Xmx24052m threads=12" ? Don't use -Xms=.

Hi,

Thank you for the suggestion.

With the "-Xmx24052m threads=12" flag it runs with 12 threads and the memory but still has an ArrayIndexOutOfBoundsException in multiple threads...
:confused:
Code:

Set threads to 12
Initial:
Memory: max=25224m, free=25194m, used=30m

Input is being processed as unpaired
Started output streams: 1.357 seconds.
Exception in thread "Thread-6" java.lang.ArrayIndexOutOfBoundsException: 39
        at structures.EntropyTracker.averageEntropy(EntropyTracker.java:302)
        at structures.EntropyTracker.passes(EntropyTracker.java:348)
        at jgi.BBDukF$ProcessThread.run(BBDukF.java:2583)
Exception in thread "Thread-9" java.lang.ArrayIndexOutOfBoundsException: 30
        at structures.EntropyTracker.averageEntropy(EntropyTracker.java:302)
        at structures.EntropyTracker.passes(EntropyTracker.java:348)
        at jgi.BBDukF$ProcessThread.run(BBDukF.java:2583)
Exception in thread "Thread-12" Exception in thread "Thread-7" java.lang.ArrayIndexOutOfBoundsException
java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-14" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-15" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-17" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-11" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-10" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-8" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-13" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-16" java.lang.ArrayIndexOutOfBoundsException
Processing time:                1.649 seconds.

Input:                          19931 reads            1964689 bases.
Low entropy discards:          1237 reads (6.21%)      123216 bases (6.27%)
Total Removed:                  1249 reads (6.27%)      123686 bases (6.30%)
Result:                        18682 reads (93.73%)    1841003 bases (93.70%)

Time:                          3.658 seconds.
Reads Processed:      19931    5.45k reads/sec
Bases Processed:      1964k    0.54m bases/sec

Is it because there are so many low complexity reads? Are there any other ways of filtering these polyT tracks (they tend to have the 8bp barcode followed by polyT, for eg:

@D00278:496:CC4LRANXX:7:1109:7642:2397 1:N:0:1
AAGACGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFFFFFFFFFFFFFFFFFF

Thanks,
Dave

GenoMax 02-19-2018 04:30 AM

You can use something like
Code:

literal=TTTTTTTT k=5
with bbduk.sh to remove those reads.

DrYak 03-02-2018 05:47 AM

worked finally - but had to increase kmer and literal..
 
Hi,

Thank you, I finally got it to work with literal, but I had to increase the length of the literal to 20 X T and increase the kmer to 25. With literal=TTTTTTTT k=5 I got 98% of the reads filtered out, even with mm=false and hdist=0...

Ok, another question - how can I use bbduk to split my file into multiple files based on a inline barcode of 8 bp at the 5' end?

I have 100 bp SE end reads and they are multiplexed using a 32 X 8 base barcodes. I have used sabre before but it is on another machine and I would like to avoid transferring files between different servers if possible.

Cheers,
Dave

GenoMax 03-02-2018 07:25 AM

If you know the barcode sequences then you could run bbduk in "match" mode and require the match to be strict of 8 bp on 5'-end of the read (restrictleft=7). You may have to try a few command options out to see what works best.


All times are GMT -8. The time now is 08:17 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.