Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • DrYak
    Member
    • Sep 2013
    • 13

    BBDuk java error when filtering using entropy?

    Hello,

    I am having an issue with a java error when trying to use BBDuk to remove low entropy sequences from a fastq file. The libraries were made using ribozero so there are a number of polyT sequences I would like to remove.

    I have previously used BBDuk on the same library to remove PhiX an adapter sequences with no problem.

    The file has ~135 million 100bp SE reads.

    I am running on a node with 24 cores and 128 GiB RAM running CentOS Linux release 7.3.1611 and java version "1.7.0_131".

    I get this error with or without the -Xmx flag.

    Code:
    $ bbduk.sh in=seq.fq out=seq_0-1-entrop-filtered.fq outm=low_complexity-0-1.fq entropy=0.1
    java -Djava.library.path=/apps/chpc/bio/bbmap/jni/ -ea -Xmx24052m -Xms24052m -cp /apps/chpc/bio/bbmap/current/ jgi.BBDukF in=fish-coral_1_filtered_clean.fq out=fish-coral_1_filtered_clean_0-1-entrop-filtered.fq outm=low_complexity-0-1.fq entropy=0.1
    Executing jgi.BBDukF [in=seq.fq, out=seq_0-1-entrop-filtered.fq, outm=low_complexity-0-1.fq, entropy=0.1]
    Version 37.90 [in=seq.fq, out=seq_0-1-entrop-filtered.fq, outm=low_complexity-0-1.fq, entropy=0.1]
    
    Initial:
    Memory: max=24170m, free=23665m, used=505m
    
    Input is being processed as unpaired
    Started output streams: 0.051 seconds.
    Exception in thread "Thread-6" java.lang.ArrayIndexOutOfBoundsException: 39
            at structures.EntropyTracker.averageEntropy(EntropyTracker.java:302)
            at structures.EntropyTracker.passes(EntropyTracker.java:348)
            at jgi.BBDukF$ProcessThread.run(BBDukF.java:2583)
    Exception in thread "Thread-28" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-8" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-11" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-27" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-24" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-16" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-10" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-13" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-21" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-23" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-22" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-12" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-17" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-20" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-18" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-15" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-26" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-19" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-29" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-7" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-14" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-9" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-25" java.lang.ArrayIndexOutOfBoundsException
    Processing time:                0.384 seconds.
    
    Input:                          34841 reads             3436691 bases.
    Low entropy discards:           2157 reads (6.19%)      215168 bases (6.26%)
    Total Removed:                  2181 reads (6.26%)      216121 bases (6.29%)
    Result:                         32660 reads (93.74%)    3220570 bases (93.71%)
    
    Time:                           0.459 seconds.
    Reads Processed:       34841    75.88k reads/sec
    Bases Processed:       3436k    7.48m bases/sec
    Any suggestions as to what might be the issue?

    Thank you.
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    Can you only try "-Xmx24052m threads=12" ? Don't use -Xms=.

    Comment

    • DrYak
      Member
      • Sep 2013
      • 13

      #3
      Originally posted by GenoMax View Post
      Can you only try "-Xmx24052m threads=12" ? Don't use -Xms=.
      Hi,

      Thank you for the suggestion.

      With the "-Xmx24052m threads=12" flag it runs with 12 threads and the memory but still has an ArrayIndexOutOfBoundsException in multiple threads...

      Code:
      Set threads to 12
      Initial:
      Memory: max=25224m, free=25194m, used=30m
      
      Input is being processed as unpaired
      Started output streams: 1.357 seconds.
      Exception in thread "Thread-6" java.lang.ArrayIndexOutOfBoundsException: 39
              at structures.EntropyTracker.averageEntropy(EntropyTracker.java:302)
              at structures.EntropyTracker.passes(EntropyTracker.java:348)
              at jgi.BBDukF$ProcessThread.run(BBDukF.java:2583)
      Exception in thread "Thread-9" java.lang.ArrayIndexOutOfBoundsException: 30
              at structures.EntropyTracker.averageEntropy(EntropyTracker.java:302)
              at structures.EntropyTracker.passes(EntropyTracker.java:348)
              at jgi.BBDukF$ProcessThread.run(BBDukF.java:2583)
      Exception in thread "Thread-12" Exception in thread "Thread-7" java.lang.ArrayIndexOutOfBoundsException
      java.lang.ArrayIndexOutOfBoundsException
      Exception in thread "Thread-14" java.lang.ArrayIndexOutOfBoundsException
      Exception in thread "Thread-15" java.lang.ArrayIndexOutOfBoundsException
      Exception in thread "Thread-17" java.lang.ArrayIndexOutOfBoundsException
      Exception in thread "Thread-11" java.lang.ArrayIndexOutOfBoundsException
      Exception in thread "Thread-10" java.lang.ArrayIndexOutOfBoundsException
      Exception in thread "Thread-8" java.lang.ArrayIndexOutOfBoundsException
      Exception in thread "Thread-13" java.lang.ArrayIndexOutOfBoundsException
      Exception in thread "Thread-16" java.lang.ArrayIndexOutOfBoundsException
      Processing time:                1.649 seconds.
      
      Input:                          19931 reads             1964689 bases.
      Low entropy discards:           1237 reads (6.21%)      123216 bases (6.27%)
      Total Removed:                  1249 reads (6.27%)      123686 bases (6.30%)
      Result:                         18682 reads (93.73%)    1841003 bases (93.70%)
      
      Time:                           3.658 seconds.
      Reads Processed:       19931    5.45k reads/sec
      Bases Processed:       1964k    0.54m bases/sec
      Is it because there are so many low complexity reads? Are there any other ways of filtering these polyT tracks (they tend to have the 8bp barcode followed by polyT, for eg:

      @D00278:496:CC4LRANXX:7:1109:7642:2397 1:N:0:1
      AAGACGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
      +
      BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFFFFFFFFFFFFFFFFFF

      Thanks,
      Dave

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        You can use something like
        Code:
        literal=TTTTTTTT k=5
        with bbduk.sh to remove those reads.

        Comment

        • DrYak
          Member
          • Sep 2013
          • 13

          #5
          worked finally - but had to increase kmer and literal..

          Hi,

          Thank you, I finally got it to work with literal, but I had to increase the length of the literal to 20 X T and increase the kmer to 25. With literal=TTTTTTTT k=5 I got 98% of the reads filtered out, even with mm=false and hdist=0...

          Ok, another question - how can I use bbduk to split my file into multiple files based on a inline barcode of 8 bp at the 5' end?

          I have 100 bp SE end reads and they are multiplexed using a 32 X 8 base barcodes. I have used sabre before but it is on another machine and I would like to avoid transferring files between different servers if possible.

          Cheers,
          Dave

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            If you know the barcode sequences then you could run bbduk in "match" mode and require the match to be strict of 8 bp on 5'-end of the read (restrictleft=7). You may have to try a few command options out to see what works best.

            Comment

            Latest Articles

            Collapse

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, Yesterday, 10:09 AM
            0 responses
            10 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-04-2026, 08:59 AM
            0 responses
            18 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-02-2026, 12:03 PM
            0 responses
            26 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-02-2026, 11:40 AM
            0 responses
            21 views
            0 reactions
            Last Post SEQadmin2  
            Working...