Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BBDuk java error when filtering using entropy?

    Hello,

    I am having an issue with a java error when trying to use BBDuk to remove low entropy sequences from a fastq file. The libraries were made using ribozero so there are a number of polyT sequences I would like to remove.

    I have previously used BBDuk on the same library to remove PhiX an adapter sequences with no problem.

    The file has ~135 million 100bp SE reads.

    I am running on a node with 24 cores and 128 GiB RAM running CentOS Linux release 7.3.1611 and java version "1.7.0_131".

    I get this error with or without the -Xmx flag.

    Code:
    $ bbduk.sh in=seq.fq out=seq_0-1-entrop-filtered.fq outm=low_complexity-0-1.fq entropy=0.1
    java -Djava.library.path=/apps/chpc/bio/bbmap/jni/ -ea -Xmx24052m -Xms24052m -cp /apps/chpc/bio/bbmap/current/ jgi.BBDukF in=fish-coral_1_filtered_clean.fq out=fish-coral_1_filtered_clean_0-1-entrop-filtered.fq outm=low_complexity-0-1.fq entropy=0.1
    Executing jgi.BBDukF [in=seq.fq, out=seq_0-1-entrop-filtered.fq, outm=low_complexity-0-1.fq, entropy=0.1]
    Version 37.90 [in=seq.fq, out=seq_0-1-entrop-filtered.fq, outm=low_complexity-0-1.fq, entropy=0.1]
    
    Initial:
    Memory: max=24170m, free=23665m, used=505m
    
    Input is being processed as unpaired
    Started output streams: 0.051 seconds.
    Exception in thread "Thread-6" java.lang.ArrayIndexOutOfBoundsException: 39
            at structures.EntropyTracker.averageEntropy(EntropyTracker.java:302)
            at structures.EntropyTracker.passes(EntropyTracker.java:348)
            at jgi.BBDukF$ProcessThread.run(BBDukF.java:2583)
    Exception in thread "Thread-28" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-8" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-11" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-27" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-24" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-16" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-10" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-13" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-21" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-23" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-22" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-12" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-17" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-20" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-18" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-15" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-26" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-19" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-29" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-7" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-14" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-9" java.lang.ArrayIndexOutOfBoundsException
    Exception in thread "Thread-25" java.lang.ArrayIndexOutOfBoundsException
    Processing time:                0.384 seconds.
    
    Input:                          34841 reads             3436691 bases.
    Low entropy discards:           2157 reads (6.19%)      215168 bases (6.26%)
    Total Removed:                  2181 reads (6.26%)      216121 bases (6.29%)
    Result:                         32660 reads (93.74%)    3220570 bases (93.71%)
    
    Time:                           0.459 seconds.
    Reads Processed:       34841    75.88k reads/sec
    Bases Processed:       3436k    7.48m bases/sec
    Any suggestions as to what might be the issue?

    Thank you.

  • #2
    Can you only try "-Xmx24052m threads=12" ? Don't use -Xms=.

    Comment


    • #3
      Originally posted by GenoMax View Post
      Can you only try "-Xmx24052m threads=12" ? Don't use -Xms=.
      Hi,

      Thank you for the suggestion.

      With the "-Xmx24052m threads=12" flag it runs with 12 threads and the memory but still has an ArrayIndexOutOfBoundsException in multiple threads...

      Code:
      Set threads to 12
      Initial:
      Memory: max=25224m, free=25194m, used=30m
      
      Input is being processed as unpaired
      Started output streams: 1.357 seconds.
      Exception in thread "Thread-6" java.lang.ArrayIndexOutOfBoundsException: 39
              at structures.EntropyTracker.averageEntropy(EntropyTracker.java:302)
              at structures.EntropyTracker.passes(EntropyTracker.java:348)
              at jgi.BBDukF$ProcessThread.run(BBDukF.java:2583)
      Exception in thread "Thread-9" java.lang.ArrayIndexOutOfBoundsException: 30
              at structures.EntropyTracker.averageEntropy(EntropyTracker.java:302)
              at structures.EntropyTracker.passes(EntropyTracker.java:348)
              at jgi.BBDukF$ProcessThread.run(BBDukF.java:2583)
      Exception in thread "Thread-12" Exception in thread "Thread-7" java.lang.ArrayIndexOutOfBoundsException
      java.lang.ArrayIndexOutOfBoundsException
      Exception in thread "Thread-14" java.lang.ArrayIndexOutOfBoundsException
      Exception in thread "Thread-15" java.lang.ArrayIndexOutOfBoundsException
      Exception in thread "Thread-17" java.lang.ArrayIndexOutOfBoundsException
      Exception in thread "Thread-11" java.lang.ArrayIndexOutOfBoundsException
      Exception in thread "Thread-10" java.lang.ArrayIndexOutOfBoundsException
      Exception in thread "Thread-8" java.lang.ArrayIndexOutOfBoundsException
      Exception in thread "Thread-13" java.lang.ArrayIndexOutOfBoundsException
      Exception in thread "Thread-16" java.lang.ArrayIndexOutOfBoundsException
      Processing time:                1.649 seconds.
      
      Input:                          19931 reads             1964689 bases.
      Low entropy discards:           1237 reads (6.21%)      123216 bases (6.27%)
      Total Removed:                  1249 reads (6.27%)      123686 bases (6.30%)
      Result:                         18682 reads (93.73%)    1841003 bases (93.70%)
      
      Time:                           3.658 seconds.
      Reads Processed:       19931    5.45k reads/sec
      Bases Processed:       1964k    0.54m bases/sec
      Is it because there are so many low complexity reads? Are there any other ways of filtering these polyT tracks (they tend to have the 8bp barcode followed by polyT, for eg:

      @D00278:496:CC4LRANXX:7:1109:7642:2397 1:N:0:1
      AAGACGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
      +
      BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFFFFFFFFFFFFFFFFFF

      Thanks,
      Dave

      Comment


      • #4
        You can use something like
        Code:
        literal=TTTTTTTT k=5
        with bbduk.sh to remove those reads.

        Comment


        • #5
          worked finally - but had to increase kmer and literal..

          Hi,

          Thank you, I finally got it to work with literal, but I had to increase the length of the literal to 20 X T and increase the kmer to 25. With literal=TTTTTTTT k=5 I got 98% of the reads filtered out, even with mm=false and hdist=0...

          Ok, another question - how can I use bbduk to split my file into multiple files based on a inline barcode of 8 bp at the 5' end?

          I have 100 bp SE end reads and they are multiplexed using a 32 X 8 base barcodes. I have used sabre before but it is on another machine and I would like to avoid transferring files between different servers if possible.

          Cheers,
          Dave

          Comment


          • #6
            If you know the barcode sequences then you could run bbduk in "match" mode and require the match to be strict of 8 bp on 5'-end of the read (restrictleft=7). You may have to try a few command options out to see what works best.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            11 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            51 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            67 views
            0 likes
            Last Post seqadmin  
            Working...
            X