Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Oases pipeline fails on second kmer

    Hello all,

    I am hoping for help running the oases_pipeline.py. I am able to run the pipeline for a subset of my data and for 4 Kmer values using the following code:

    Code:
    python oases_pipeline.py -m 21 -M 27 -s 2 -o oases_test -d '-fastq -shortPaired -separate trimm_15_F_paired.fq trimm_15_R_paired.fq' -p '-ins_length 160'
    This runs successfully and produces output for K 21 through 27. However, when I try to run the following code on my full dataset (538769828 sequences) and for a larger range of K, it fails. This is my input code:

    Code:
    python oases_pipeline.py -m 21 -M 51 -s 2 -o oases_ALL -d '-fastq -shortPaired -separate ALL_F_paired.fq ALL_R_paired.fq' -p '-ins_length 160'
    This command runs successfully for K=21, but then crashes on K=23 with this output:

    Code:
    [5141.379366] Inputting sequence 66000000 / 538769828
    
    [5163.243577] Inputting sequence 67000000 / 538769828
    
    [5170.824776]  === Sequences loaded in 997.337692 s
    
    [5171.829179] Done inputting sequences
    
    [5171.829187] Destroying splay table
    
    [5173.870477] Splay table destroyed
    
    [5175.177294] Command failed!
    
    [5175.177304] rm -f oases_ALL_23/Sequences
    
    Hash failed
    I am at a loss for why it will run for a subset of data and for the first K, but crash on the second.

    Many thanks in advance for any input!

  • #2
    Is your computer running out of memory or disk space with the full data set?

    Comment


    • #3
      Thanks for the reply. I am running it on a computing node that has 128 GB of RAM, so I thought there should be sufficient memory, but I suppose this could be the case. I haven't received an error about memory, though.

      I tried re-running it but changing the step size (s) to 4 and this runs through all the K-mers, but never produces contigs or a merged assembly and then dies mid run. Would this be an indication that it is running out of memory, perhaps?

      Comment


      • #4
        It looks like it's either running out of memory, or running out of disk space to write the Sequences file to.

        How big are the fastq files with your reads, and if you are running this on a cluster, how much disk space are you allowed/have you requested?

        I'm not really familiar with Oases. When I've run velvet over multiple kmers, it just makes one Sequences file for the first kmer, and then uses symbolic links to the first Sequences file for the other kmers. The output you posted above looks like it was trying to write a Sequences file for k=23 and failed at that point.

        Velvet has an option to make a binary form of the Sequences file (see the manual), I'm not sure whether that works with Oases as well, but that woiuld use less disk space.

        Comment


        • #5
          Thanks! Yes, I think this is the case! I have 500 GB of storage, but I realized each value of K produces a Sequences file that is ~60 GB and a Roadmaps file that is around ~40 GB, so I believe I am running out of space. I will try to output the data as binary or run it batches. Thanks again!

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          9 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          49 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          67 views
          0 likes
          Last Post seqadmin  
          Working...
          X