SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Introducing BBDuk: Adapter/Quality Trimming and Filtering Brian Bushnell Bioinformatics 296 06-13-2018 04:56 AM
Primer filtering with bbduk / bbduk2.sh Latrunculia Illumina/Solexa 0 10-07-2016 05:33 AM
How do you specify error rate in BBduk adapter trimming? antifolate Bioinformatics 11 07-07-2016 03:00 PM
Illumina-Tag sequencing, filtering homopolymers/entropy based filtering tonybert Bioinformatics 0 12-30-2014 01:23 PM
AddOrReplaceReadGroups error (that is probably a more general system or Java error) efoss Bioinformatics 4 12-24-2012 03:01 PM

Reply
 
Thread Tools
Old 02-15-2018, 05:37 AM   #1
DrYak
Member
 
Location: South Africa

Join Date: Sep 2013
Posts: 12
Question BBDuk java error when filtering using entropy?

Hello,

I am having an issue with a java error when trying to use BBDuk to remove low entropy sequences from a fastq file. The libraries were made using ribozero so there are a number of polyT sequences I would like to remove.

I have previously used BBDuk on the same library to remove PhiX an adapter sequences with no problem.

The file has ~135 million 100bp SE reads.

I am running on a node with 24 cores and 128 GiB RAM running CentOS Linux release 7.3.1611 and java version "1.7.0_131".

I get this error with or without the -Xmx flag.

Code:
$ bbduk.sh in=seq.fq out=seq_0-1-entrop-filtered.fq outm=low_complexity-0-1.fq entropy=0.1
java -Djava.library.path=/apps/chpc/bio/bbmap/jni/ -ea -Xmx24052m -Xms24052m -cp /apps/chpc/bio/bbmap/current/ jgi.BBDukF in=fish-coral_1_filtered_clean.fq out=fish-coral_1_filtered_clean_0-1-entrop-filtered.fq outm=low_complexity-0-1.fq entropy=0.1
Executing jgi.BBDukF [in=seq.fq, out=seq_0-1-entrop-filtered.fq, outm=low_complexity-0-1.fq, entropy=0.1]
Version 37.90 [in=seq.fq, out=seq_0-1-entrop-filtered.fq, outm=low_complexity-0-1.fq, entropy=0.1]

Initial:
Memory: max=24170m, free=23665m, used=505m

Input is being processed as unpaired
Started output streams: 0.051 seconds.
Exception in thread "Thread-6" java.lang.ArrayIndexOutOfBoundsException: 39
        at structures.EntropyTracker.averageEntropy(EntropyTracker.java:302)
        at structures.EntropyTracker.passes(EntropyTracker.java:348)
        at jgi.BBDukF$ProcessThread.run(BBDukF.java:2583)
Exception in thread "Thread-28" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-8" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-11" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-27" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-24" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-16" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-10" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-13" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-21" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-23" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-22" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-12" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-17" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-20" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-18" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-15" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-26" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-19" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-29" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-7" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-14" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-9" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-25" java.lang.ArrayIndexOutOfBoundsException
Processing time:                0.384 seconds.

Input:                          34841 reads             3436691 bases.
Low entropy discards:           2157 reads (6.19%)      215168 bases (6.26%)
Total Removed:                  2181 reads (6.26%)      216121 bases (6.29%)
Result:                         32660 reads (93.74%)    3220570 bases (93.71%)

Time:                           0.459 seconds.
Reads Processed:       34841    75.88k reads/sec
Bases Processed:       3436k    7.48m bases/sec
Any suggestions as to what might be the issue?

Thank you.
DrYak is offline   Reply With Quote
Old 02-15-2018, 05:52 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,748
Default

Can you only try "-Xmx24052m threads=12" ? Don't use -Xms=.
GenoMax is offline   Reply With Quote
Old 02-18-2018, 10:08 PM   #3
DrYak
Member
 
Location: South Africa

Join Date: Sep 2013
Posts: 12
Default

Quote:
Originally Posted by GenoMax View Post
Can you only try "-Xmx24052m threads=12" ? Don't use -Xms=.
Hi,

Thank you for the suggestion.

With the "-Xmx24052m threads=12" flag it runs with 12 threads and the memory but still has an ArrayIndexOutOfBoundsException in multiple threads...

Code:
Set threads to 12
Initial:
Memory: max=25224m, free=25194m, used=30m

Input is being processed as unpaired
Started output streams: 1.357 seconds.
Exception in thread "Thread-6" java.lang.ArrayIndexOutOfBoundsException: 39
        at structures.EntropyTracker.averageEntropy(EntropyTracker.java:302)
        at structures.EntropyTracker.passes(EntropyTracker.java:348)
        at jgi.BBDukF$ProcessThread.run(BBDukF.java:2583)
Exception in thread "Thread-9" java.lang.ArrayIndexOutOfBoundsException: 30
        at structures.EntropyTracker.averageEntropy(EntropyTracker.java:302)
        at structures.EntropyTracker.passes(EntropyTracker.java:348)
        at jgi.BBDukF$ProcessThread.run(BBDukF.java:2583)
Exception in thread "Thread-12" Exception in thread "Thread-7" java.lang.ArrayIndexOutOfBoundsException
java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-14" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-15" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-17" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-11" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-10" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-8" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-13" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-16" java.lang.ArrayIndexOutOfBoundsException
Processing time:                1.649 seconds.

Input:                          19931 reads             1964689 bases.
Low entropy discards:           1237 reads (6.21%)      123216 bases (6.27%)
Total Removed:                  1249 reads (6.27%)      123686 bases (6.30%)
Result:                         18682 reads (93.73%)    1841003 bases (93.70%)

Time:                           3.658 seconds.
Reads Processed:       19931    5.45k reads/sec
Bases Processed:       1964k    0.54m bases/sec
Is it because there are so many low complexity reads? Are there any other ways of filtering these polyT tracks (they tend to have the 8bp barcode followed by polyT, for eg:

@D00278:496:CC4LRANXX:7:1109:7642:2397 1:N:0:1
AAGACGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFFFFFFFFFFFFFFFFFF

Thanks,
Dave
DrYak is offline   Reply With Quote
Old 02-19-2018, 03:30 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,748
Default

You can use something like
Code:
literal=TTTTTTTT k=5
with bbduk.sh to remove those reads.
GenoMax is offline   Reply With Quote
Old 03-02-2018, 04:47 AM   #5
DrYak
Member
 
Location: South Africa

Join Date: Sep 2013
Posts: 12
Smile worked finally - but had to increase kmer and literal..

Hi,

Thank you, I finally got it to work with literal, but I had to increase the length of the literal to 20 X T and increase the kmer to 25. With literal=TTTTTTTT k=5 I got 98% of the reads filtered out, even with mm=false and hdist=0...

Ok, another question - how can I use bbduk to split my file into multiple files based on a inline barcode of 8 bp at the 5' end?

I have 100 bp SE end reads and they are multiplexed using a 32 X 8 base barcodes. I have used sabre before but it is on another machine and I would like to avoid transferring files between different servers if possible.

Cheers,
Dave
DrYak is offline   Reply With Quote
Old 03-02-2018, 06:25 AM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,748
Default

If you know the barcode sequences then you could run bbduk in "match" mode and require the match to be strict of 8 bp on 5'-end of the read (restrictleft=7). You may have to try a few command options out to see what works best.
GenoMax is offline   Reply With Quote
Reply

Tags
bbduk, illumina, java, read trimming

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:54 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO