Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • issue changing predetermined K values on SOAPdenovo2

    I am currently using SOAPDENOVO2 on a supercomputer with SLURM queuing system to perform a de novo assembly from FASTQ paired-end files with genomic DNA reads.

    When I use SOAPdenovo-63mer or SOAPdenovo-127mer, I don't have any problem and I do assemblies with k=63 and k=127 in little more than 16 hours for each assembly, executed in a node with 64 threads and 240Gb of memory, sending to the queue system the following script.sh:

    #!/bin/sh
    #SBATCH --nodes=1
    #SBATCH --ntasks=64
    #SBATCH --mem=240000
    #SBATCH --time=3-00:00:00
    #SBATCH -e error_log.txt
    #SBATCH -o output_log.txt module load soapdenovo2
    SOAPdenovo-63mer all -s config_file.txt -o assemblies/k63_ -R -p 64 SOAPdenovo-127mer all -s config_file.txt -o assemblies/k127_ -R -p 64
    The troubles start when I try to choose another k value than the predetermined k=63 and k=127 using the -K parameter; for example, if I try to perform an assembly with k=89 through this command:

    SOAPdenovo-127mer all -s config_file.txt -K89 -o assemblies/k89_ -R -p 64
    the execution fails, and when I check the error_log I get this line:

    slurmstepd: error: Detected 1 oom-kill event(s) in StepId=2323585.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.
    So, I guess this is a memory issue... but this is not happening with the predefined values of k=63 and k=127... Why does SOAPdenovo2 increase the memory requirements when I use other k values, and how can I overcome this issue?

  • #2
    ampsevilla it is likely that the out-of-memory (OOM) error is due to SOAPdenovo2 requiring more memory to assemble the genome with a larger k-mer size.

    When you choose a k-mer size of 89, the memory requirements for the assembly process are increased, which is causing the OOM error. This is because increasing the k-mer size also increases the complexity of the assembly, which requires more memory to store the assembly graph and related data structures.

    To overcome this issue, you can try increasing the amount of memory allocated to the job in the SLURM script. You can also try reducing the number of threads used in the assembly process. This may help reduce the memory requirements for the assembly and avoid the OOM error.

    Additionally, you can try reducing the size of the input data by filtering out low-quality reads or using a subset of the data for the assembly. This may also help reduce the memory requirements for the assembly process.

    Finally, you can consider using a different de novo assembly tool that is better suited for larger k-mer sizes and has lower memory requirements. Some popular alternatives to SOAPdenovo2 include SPAdes, ABySS, and IDBA-UD.

    Comment


    • #3
      GenomicSeq first of all, I sincerely appreciate your quick response.

      Originally posted by GenomicSeq View Post
      ampsevilla it is likely that the out-of-memory (OOM) error is due to SOAPdenovo2 requiring more memory to assemble the genome with a larger k-mer size.

      When you choose a k-mer size of 89, the memory requirements for the assembly process are increased, which is causing the OOM error. This is because increasing the k-mer size also increases the complexity of the assembly, which requires more memory to store the assembly graph and related data structures.
      I don't understand why this is happening, because with the predetermined k=127, SOAPdenovo2 works perfectly, and K=89 is much smaller than it.

      Definitively, I'll try to reduce the number of threads as you say, maybe it will helps. Unfortunately, I can't reduce the size of input data because they are already filtered, the problem is that the genome we want to assemble is very large and complex.

      We are also trying another tools like SPAdes and ABySS, but we had some troubles with them too. We'll try IDBA-UD, thank you so much for the advice!

      Comment


      • #4
        ampsevilla that is odd...

        Now I'm wondering if it's something else. Let me know what you find and I've you're able to fix it!

        Comment


        • #5
          GenomicSeq I've tried to reduce the number of threads and use only the pregraph mode instead of all mode, and I gave it 247Gb for memory and 3 days for time limit, but I got still the same error message:
          Some of your processes may have been killed by the cgroup out-of-memory handler.
          I'm stuck with this issue.

          Comment


          • #6
            ampsevilla sorry, I wish I had some more advice to give. I'm a little lost. I'll try and ask some friends that are more savvy with this kind of work and get back to you once I hear their opinions.

            Comment


            • #7
              GenomicSeq Finally it worked: a problem due to recent cluster configuration changes was limiting the amount of available memory below the specified limits. Thank you so much for your assitance!šŸ˜„

              Comment


              • #8
                ampsevilla that's great! So what exactly did you have to change? I wish I could have been more help on this.

                Comment


                • #9
                  Originally posted by GenomicSeq View Post
                  ampsevilla that's great! So what exactly did you have to change? I wish I could have been more help on this.
                  Same code, the problem was related to header: #SBATCH --mem 240000 should give me 240G of RAM, but for some reasons related to cluster reconfiguring, the memory limit for all jobs was temporary adjusted up to 10GB, and I was driving me crazy.

                  Anyway, you have been helpful and I really appreciate it. Thank you so much!

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Advancing Precision Medicine for Rare Diseases in Children
                    by seqadmin




                    Many organizations study rare diseases, but few have a mission as impactful as Rady Childrenā€™s Institute for Genomic Medicine (RCIGM). ā€œWe are all about changing outcomes for children,ā€ explained Dr. Stephen Kingsmore, President and CEO of the group. The instituteā€™s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                    12-16-2024, 07:57 AM
                  • seqadmin
                    Recent Advances in Sequencing Technologies
                    by seqadmin



                    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                    Long-Read Sequencing
                    Long-read sequencing has seen remarkable advancements,...
                    12-02-2024, 01:49 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 12-17-2024, 10:28 AM
                  0 responses
                  39 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 12-13-2024, 08:24 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 12-12-2024, 07:41 AM
                  0 responses
                  38 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 12-11-2024, 07:45 AM
                  0 responses
                  46 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X