Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    I don't know why they do that. I blame it on Illumina's demultiplexing software defaults. Like on NextSeq, which only has one physical lane (in terms of library separation), we still produce 8 files per library, which is really inefficient from a tracking and labor perspective.

    Comment


    • #17
      Originally posted by Brian Bushnell View Post
      Like on NextSeq, which only has one physical lane (in terms of library separation), we still produce 8 files per library, which is really inefficient from a tracking and labor perspective.
      bcl2fastq (v. 2.17.x) has an option (--no-lane-splitting). I assume your NextSeq data is processed off-line, so may be worth looking into that (I have not personally used this option).

      Comment


      • #18
        Originally posted by Brian Bushnell View Post
        BBDuk2 should work fine if you adjust the parameters as I indicated. That said - personally, I would use 2 passes of BBDuk because BBDuk2 is a bit more confusing and less flexible (you can't use different kmer lengths for left and right trimming, for example). I designed BBDuk2 for integration into pipelines that get written once and then run exactly the same way for years, to achieve maximal efficiency, since it can do all kmer operations in a single pass (filtering, left-trimming, right-trimming, and masking). But actually I never use it because I usually want different values of K and a different hamming distance for the different steps.

        The issue here is either that you are running OpenJDK, or version 1.6, and probably both combined. I only test with Oracle's JDK, and use version 1.7 and 1.8.

        Hi Brian,

        So I have decided to use BBDuk instead of BBDul2 and do it twice. The following is my code:

        bash bbduk.sh in1=/PATH/M_R1_001.fastq.gz in2=/PATH/M_R2_001.fastq.gz out1=/PATH/M1_R1_001.fastq.gz out2=/PATH/M1_R2_001.fastq.gz ref="/truseq.fa.gz" ktrim=r k=13 mink=11 hdist=1 rcomp=t minlen=25 qtrim=rl trimq=10 tpe tbo

        For some of my samples, this works fine and I get a complete output and a summary of the input number of reads and reads left after trimming.

        However, for some of my samples, I am not getting this, instead I get the following:

        BBDuk version 35.66
        maskMiddle was disabled because useShortKmers=true
        Initial:
        Memory: max=41160m, free=40086m, used=1074m

        Added 182 kmers; time: 0.032 seconds.
        Memory: max=41160m, free=39012m, used=2148m

        Input is being processed as paired
        Started output streams: 0.221 seconds.
        bbduk.sh: line 282: 888 Killed java -Djava.library.path=/PATH/bbmap/jni/ -ea -Xmx40g -Xms40g -cp /PATH/bbmap/current/ jgi.BBDukF -Xmx40g in1=/PATH/M_R1_001.fastq.gz in2=/PATH/M_R2_001.fastq.gz out1=/scratch/ea11g10/PATH/M1_R1_001.fastq.gz out2=/PATH/M1_R2_001.fastq.gz literal=GCTCTTCCGATCT ktrim=l k=13 mink=11 hdist=1 rcomp=t minlen=25 qtrim=rl trimq=10 tpe tbo


        Do you know why it would be getting killed? I am not using Java 1.8 as it has been upgraded:
        java version "1.8.0_51"
        Java(TM) SE Runtime Environment (build 1.8.0_51-b16)
        Java HotSpot(TM) 64-Bit Server VM (build 25.51-b03, mixed mode)

        Am I running out of memory and so it is being killed?

        Thanks

        Comment


        • #19
          I wouldn't think so since you are using 40G. Are you hitting a storage quota limit?

          Comment


          • #20
            Nope, that's one of the first things I checked. I still have 100GB storage available

            Comment


            • #21
              You are doing this on a cluster so I assume your job gets assigned to a random node, correct? Does a job that fails once run fine if you submit it a second (third time)? Is the job dying right away?

              Comment


              • #22
                As this is a short job, I am not sending the job off and install I am running it on one of the login nodes.

                I have tried doing the same job more than once and it keeps failing for some reason, sometimes a job which ran fine the first pass of BBDuk, fails when I run it a second time through.

                Nope it is not dying right away, it get killed at some point while it is running.

                Comment


                • #23
                  I am just trying the same job on a different login node to see if it works on that

                  Comment


                  • #24
                    You may need to ask sysadmins if they can find any evidence in systems logs as to why the job fails.

                    So the job that fails is actually producing some output before it gets killed?

                    Comment


                    • #25
                      Yea the job that fails does produce a file in the specific output directory, but it is not the expected size (probably around half what it should be)

                      I have just tried running one of the files that failed previously on another login node and it seems to have run ok and wasn't killed. So I am guessing it might be the node, but what is causing it to fail I am not sure

                      Comment


                      • #26
                        Running jobs on head nodes is generally frowned upon by most admins.

                        Unless you want to figure out what is different between those two nodes just start using the job scheduler since you would need to do that anyway for alignments with BBMap.

                        Comment


                        • #27
                          We have been told to use the login nodes for any jobs less than half an hour, which all the BBDuk jobs are.

                          And yea, I am already using the job scheduler for alignments and other jobs which require more processing power and time.

                          Thanks for the help

                          Comment


                          • #28
                            The most likely problem is memory. Adapter-trimming requires only a little memory, so use the flag -Xmx1g instead of -Xmx40g. Also, by default, BBDuk will try to spawn pigz processes to accelerate compression and decompression, if pigz is installed. This can be disabled with "pigz=f unpigz=f". I think it is the combination of the two things. Essentially, due to some weirdness in Linux, when a process that uses a lot of virtual memory spawns a subprocess, for a split second it looks like it's using twice as much virtual memory. Often clusters are configured to kill jobs that do that.

                            So, "-Xmx1g" OR "pigz=f unpigz=f" will fix it - you don't need both. Sorry about that! We changed our cluster's configuration specifically so that it would not kill jobs in this circumstance, but I know that, for example, Amazon instances do.

                            Comment


                            • #29
                              Hi Brian, I tried lowering the memory, and it has worked. Thanks for your help. BBDuk has performed much better than cutadapt previously has for me.

                              Comment


                              • #30
                                You're welcome; sorry it took so much effort

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Advancing Precision Medicine for Rare Diseases in Children
                                  by seqadmin




                                  Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                                  12-16-2024, 07:57 AM
                                • seqadmin
                                  Recent Advances in Sequencing Technologies
                                  by seqadmin



                                  Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                                  Long-Read Sequencing
                                  Long-read sequencing has seen remarkable advancements,...
                                  12-02-2024, 01:49 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 12-17-2024, 10:28 AM
                                0 responses
                                39 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 12-13-2024, 08:24 AM
                                0 responses
                                52 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 12-12-2024, 07:41 AM
                                0 responses
                                38 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 12-11-2024, 07:45 AM
                                0 responses
                                46 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X