Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • vanillasky
    Member
    • Mar 2014
    • 42

    velvetg stops due to memory shortage

    Hi all,

    I have tried to use the velvetg for my metagenomic data assembly steps prior to MetaVelvet.

    My machine is about 250GiB RAM.

    The read length is about 90 for my dataset. The total number reads is about 20 million for a combined and inverleaved fastq file.

    However, when I try to run velvetg it it takes about two days before the program is killed.

    Can anyone give me recommendations on how much to increase my RAM for such a dataset? I have set the Kmer to 13 for my dataset and have turned on scaffolding, automatic coverage and automatic expected coverage since this is a metagenome.
  • mastal
    Senior Member
    • Mar 2009
    • 666

    #2
    Why is the program getting killed?

    Is it giving any error message?

    Is velvetg producing any output files before it stops?

    Why are you using such a short kmer size?

    Is velvet compiled with 'OPENMP=1'?

    Comment

    • vanillasky
      Member
      • Mar 2014
      • 42

      #3
      I assume it gets killed because I run out of RAM (slowly filled up overtime). There is no error message and there are no output files before the program is killed. I am using a short kmer size because based on the histogram analysis of my sequences (generated with kmer genie) this was the best kmer for the assembly. How could I find out if it was complied with OPENMP=1?

      Comment

      • mastal
        Senior Member
        • Mar 2009
        • 666

        #4
        If you just run velveth or velvetg without any parameters, you should get the help page, at the beginning of which it tells you the version of velvet and gives a list of 'Compilation settings:', such as MAXKMERLENGTH and CATEGORIES.

        Comment

        • vanillasky
          Member
          • Mar 2014
          • 42

          #5
          The output I get is:

          Copyright 2007, 2008 Daniel Zerbino ([email protected])
          This is free software; see the source for copying conditions. There is NO
          warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
          Compilation settings:
          CATEGORIES = 2
          MAXKMERLENGTH = 31
          OPENMP

          Comment

          • mastal
            Senior Member
            • Mar 2009
            • 666

            #6
            That looks OK, and doesn't explain why you're running out of memory.

            I have run larger datasets (read files about 100 Gb) on a computer with 128 Gb of memory. Velvetg uses about 60% of the memory, and runs in maybe a few (3-4) hours. But I've generally used much larger kmer sizes.

            You could try running an assembly using a k of 31, which is the default max kmer length for velvet, and see if it runs successfully.

            Comment

            • vanillasky
              Member
              • Mar 2014
              • 42

              #7
              But if the distribution of kmers for my sequences show that kmer of 13 is best why would I use a higher kmer? How will this effect the output in the end? What happens when you use a kmer size that is higher than the actual kmer for a set of sequences?

              Comment

              • mastal
                Senior Member
                • Mar 2009
                • 666

                #8
                It was just a suggestion to try and see whether velvetg uses less memory or runs faster and manages to complete the run with a longer kmer size.

                Did you notice how much memory the machine was using while velvetg ran?

                Are you running this on a server or cluster where you have to specify the amount of time or memory allotted to the job?

                Comment

                • jpummil
                  Member
                  • Apr 2014
                  • 85

                  #9
                  Can you provide your whole velvetg command line submission so we can look at the specific flags and options you are using? Certain flags like -unused_reads yes will bump up the requirements for example...

                  Comment

                  • vanillasky
                    Member
                    • Mar 2014
                    • 42

                    #10
                    The script I am running is:

                    ./velvetg /home/vanillasky/genomes/outdir -exp_cov auto -cov_cutoff auto -scaffolding yes -min_contig_lgth 250 -amos_file yes

                    Comment

                    • vanillasky
                      Member
                      • Mar 2014
                      • 42

                      #11
                      In response to mastal: I used up all 250 GBi when I tried to run a sequence file of 2.3 GB and the program was killed after two days. I am now trying a smaller sequence file 390MB and while the memory usage is at 40%, the program is taking more than three days now and it is not completed. Also while we have 16 cores available for use only one core is being used and at 2-12% capacity. I am running this on our Linx server.

                      Comment

                      • mastal
                        Senior Member
                        • Mar 2009
                        • 666

                        #12
                        That's a good idea to try running a smaller file or just a subset of your reads.

                        Not all the steps that velvetg does can be parallelised, so at some stages it uses just fractions of one processor, while at other stages it uses many processors.

                        Do you have enough space for the output files in your 'outdir' directory?
                        Has velvetg produced any output files so far?

                        You might want to leave out the -amos_file parameter for the time being. I don't know if it takes velvetg any more time or memory to generate it, but the .afg files produced tend to be larger than the rest of velvetg's output files.

                        As for the -scaffolding parameter, yes is velvetg's default behaviour, but maybe if you turn it off, '-scaffolding no' , it wouldn't have to try and join the contigs into scaffolds, so it might run a bit faster.

                        Comment

                        • vanillasky
                          Member
                          • Mar 2014
                          • 42

                          #13
                          Hi mastal,

                          Thanks for the feedback. I have 1TB of diskspace so I think the output dir should have enough room. I turned on the scaffolding because I plan on using metavelvet next and it requires scaffolding to build the bigger contigs. There isn't anything yet in the output dir folder and the program is still running. I guess I'll just wait and see where it is by Monday next week. Hopefully done. By any chance do you know which steps are parallelised?

                          Comment

                          • mastal
                            Senior Member
                            • Mar 2009
                            • 666

                            #14
                            It seems to use a lot of processors in one of the late stages, then goes back to one processor for a short while before printing the output files and finishing.

                            How did you choose the kmer size, what software did you use?

                            Comment

                            • vanillasky
                              Member
                              • Mar 2014
                              • 42

                              #15
                              I used kmer genie http://kmergenie.bx.psu.edu/

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              16 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              34 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              37 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              24 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...