Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • velvetg stops due to memory shortage

    Hi all,

    I have tried to use the velvetg for my metagenomic data assembly steps prior to MetaVelvet.

    My machine is about 250GiB RAM.

    The read length is about 90 for my dataset. The total number reads is about 20 million for a combined and inverleaved fastq file.

    However, when I try to run velvetg it it takes about two days before the program is killed.

    Can anyone give me recommendations on how much to increase my RAM for such a dataset? I have set the Kmer to 13 for my dataset and have turned on scaffolding, automatic coverage and automatic expected coverage since this is a metagenome.

  • #2
    Why is the program getting killed?

    Is it giving any error message?

    Is velvetg producing any output files before it stops?

    Why are you using such a short kmer size?

    Is velvet compiled with 'OPENMP=1'?

    Comment


    • #3
      I assume it gets killed because I run out of RAM (slowly filled up overtime). There is no error message and there are no output files before the program is killed. I am using a short kmer size because based on the histogram analysis of my sequences (generated with kmer genie) this was the best kmer for the assembly. How could I find out if it was complied with OPENMP=1?

      Comment


      • #4
        If you just run velveth or velvetg without any parameters, you should get the help page, at the beginning of which it tells you the version of velvet and gives a list of 'Compilation settings:', such as MAXKMERLENGTH and CATEGORIES.

        Comment


        • #5
          The output I get is:

          Copyright 2007, 2008 Daniel Zerbino ([email protected])
          This is free software; see the source for copying conditions. There is NO
          warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
          Compilation settings:
          CATEGORIES = 2
          MAXKMERLENGTH = 31
          OPENMP

          Comment


          • #6
            That looks OK, and doesn't explain why you're running out of memory.

            I have run larger datasets (read files about 100 Gb) on a computer with 128 Gb of memory. Velvetg uses about 60% of the memory, and runs in maybe a few (3-4) hours. But I've generally used much larger kmer sizes.

            You could try running an assembly using a k of 31, which is the default max kmer length for velvet, and see if it runs successfully.

            Comment


            • #7
              But if the distribution of kmers for my sequences show that kmer of 13 is best why would I use a higher kmer? How will this effect the output in the end? What happens when you use a kmer size that is higher than the actual kmer for a set of sequences?

              Comment


              • #8
                It was just a suggestion to try and see whether velvetg uses less memory or runs faster and manages to complete the run with a longer kmer size.

                Did you notice how much memory the machine was using while velvetg ran?

                Are you running this on a server or cluster where you have to specify the amount of time or memory allotted to the job?

                Comment


                • #9
                  Can you provide your whole velvetg command line submission so we can look at the specific flags and options you are using? Certain flags like -unused_reads yes will bump up the requirements for example...

                  Comment


                  • #10
                    The script I am running is:

                    ./velvetg /home/vanillasky/genomes/outdir -exp_cov auto -cov_cutoff auto -scaffolding yes -min_contig_lgth 250 -amos_file yes

                    Comment


                    • #11
                      In response to mastal: I used up all 250 GBi when I tried to run a sequence file of 2.3 GB and the program was killed after two days. I am now trying a smaller sequence file 390MB and while the memory usage is at 40%, the program is taking more than three days now and it is not completed. Also while we have 16 cores available for use only one core is being used and at 2-12% capacity. I am running this on our Linx server.

                      Comment


                      • #12
                        That's a good idea to try running a smaller file or just a subset of your reads.

                        Not all the steps that velvetg does can be parallelised, so at some stages it uses just fractions of one processor, while at other stages it uses many processors.

                        Do you have enough space for the output files in your 'outdir' directory?
                        Has velvetg produced any output files so far?

                        You might want to leave out the -amos_file parameter for the time being. I don't know if it takes velvetg any more time or memory to generate it, but the .afg files produced tend to be larger than the rest of velvetg's output files.

                        As for the -scaffolding parameter, yes is velvetg's default behaviour, but maybe if you turn it off, '-scaffolding no' , it wouldn't have to try and join the contigs into scaffolds, so it might run a bit faster.

                        Comment


                        • #13
                          Hi mastal,

                          Thanks for the feedback. I have 1TB of diskspace so I think the output dir should have enough room. I turned on the scaffolding because I plan on using metavelvet next and it requires scaffolding to build the bigger contigs. There isn't anything yet in the output dir folder and the program is still running. I guess I'll just wait and see where it is by Monday next week. Hopefully done. By any chance do you know which steps are parallelised?

                          Comment


                          • #14
                            It seems to use a lot of processors in one of the late stages, then goes back to one processor for a short while before printing the output files and finishing.

                            How did you choose the kmer size, what software did you use?

                            Comment


                            • #15
                              I used kmer genie http://kmergenie.bx.psu.edu/

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X