SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Memory issue with velvetg nposnien Bioinformatics 3 01-03-2014 10:34 PM
Soapdenovo stops at reading reads raindancedani De novo discovery 1 03-16-2013 02:22 PM
bowtie2 stops during alignment moritzhess Bioinformatics 3 01-13-2013 04:53 AM
Myrna stops in statistics stage Serena Rhie RNA Sequencing 8 05-23-2011 03:47 AM
Myrna stops in Align stage kyungeun RNA Sequencing 0 09-19-2010 02:59 AM

Reply
 
Thread Tools
Old 05-07-2014, 04:36 AM   #1
vanillasky
Member
 
Location: Europe

Join Date: Mar 2014
Posts: 41
Default velvetg stops due to memory shortage

Hi all,

I have tried to use the velvetg for my metagenomic data assembly steps prior to MetaVelvet.

My machine is about 250GiB RAM.

The read length is about 90 for my dataset. The total number reads is about 20 million for a combined and inverleaved fastq file.

However, when I try to run velvetg it it takes about two days before the program is killed.

Can anyone give me recommendations on how much to increase my RAM for such a dataset? I have set the Kmer to 13 for my dataset and have turned on scaffolding, automatic coverage and automatic expected coverage since this is a metagenome.
vanillasky is offline   Reply With Quote
Old 05-07-2014, 05:17 AM   #2
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 662
Default

Why is the program getting killed?

Is it giving any error message?

Is velvetg producing any output files before it stops?

Why are you using such a short kmer size?

Is velvet compiled with 'OPENMP=1'?
mastal is offline   Reply With Quote
Old 05-07-2014, 06:07 AM   #3
vanillasky
Member
 
Location: Europe

Join Date: Mar 2014
Posts: 41
Default

I assume it gets killed because I run out of RAM (slowly filled up overtime). There is no error message and there are no output files before the program is killed. I am using a short kmer size because based on the histogram analysis of my sequences (generated with kmer genie) this was the best kmer for the assembly. How could I find out if it was complied with OPENMP=1?
vanillasky is offline   Reply With Quote
Old 05-07-2014, 06:20 AM   #4
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 662
Default

If you just run velveth or velvetg without any parameters, you should get the help page, at the beginning of which it tells you the version of velvet and gives a list of 'Compilation settings:', such as MAXKMERLENGTH and CATEGORIES.
mastal is offline   Reply With Quote
Old 05-07-2014, 06:23 AM   #5
vanillasky
Member
 
Location: Europe

Join Date: Mar 2014
Posts: 41
Default

The output I get is:

Copyright 2007, 2008 Daniel Zerbino ([email protected])
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Compilation settings:
CATEGORIES = 2
MAXKMERLENGTH = 31
OPENMP
vanillasky is offline   Reply With Quote
Old 05-07-2014, 06:47 AM   #6
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 662
Default

That looks OK, and doesn't explain why you're running out of memory.

I have run larger datasets (read files about 100 Gb) on a computer with 128 Gb of memory. Velvetg uses about 60% of the memory, and runs in maybe a few (3-4) hours. But I've generally used much larger kmer sizes.

You could try running an assembly using a k of 31, which is the default max kmer length for velvet, and see if it runs successfully.
mastal is offline   Reply With Quote
Old 05-08-2014, 06:07 AM   #7
vanillasky
Member
 
Location: Europe

Join Date: Mar 2014
Posts: 41
Default

But if the distribution of kmers for my sequences show that kmer of 13 is best why would I use a higher kmer? How will this effect the output in the end? What happens when you use a kmer size that is higher than the actual kmer for a set of sequences?
vanillasky is offline   Reply With Quote
Old 05-08-2014, 07:12 AM   #8
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 662
Default

It was just a suggestion to try and see whether velvetg uses less memory or runs faster and manages to complete the run with a longer kmer size.

Did you notice how much memory the machine was using while velvetg ran?

Are you running this on a server or cluster where you have to specify the amount of time or memory allotted to the job?
mastal is offline   Reply With Quote
Old 05-08-2014, 08:48 AM   #9
jpummil
Member
 
Location: Fayetteville, AR

Join Date: Apr 2014
Posts: 82
Default

Can you provide your whole velvetg command line submission so we can look at the specific flags and options you are using? Certain flags like -unused_reads yes will bump up the requirements for example...
jpummil is offline   Reply With Quote
Old 05-09-2014, 01:05 AM   #10
vanillasky
Member
 
Location: Europe

Join Date: Mar 2014
Posts: 41
Default

The script I am running is:

./velvetg /home/vanillasky/genomes/outdir -exp_cov auto -cov_cutoff auto -scaffolding yes -min_contig_lgth 250 -amos_file yes
vanillasky is offline   Reply With Quote
Old 05-09-2014, 01:10 AM   #11
vanillasky
Member
 
Location: Europe

Join Date: Mar 2014
Posts: 41
Default

In response to mastal: I used up all 250 GBi when I tried to run a sequence file of 2.3 GB and the program was killed after two days. I am now trying a smaller sequence file 390MB and while the memory usage is at 40%, the program is taking more than three days now and it is not completed. Also while we have 16 cores available for use only one core is being used and at 2-12% capacity. I am running this on our Linx server.
vanillasky is offline   Reply With Quote
Old 05-09-2014, 04:24 AM   #12
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 662
Default

That's a good idea to try running a smaller file or just a subset of your reads.

Not all the steps that velvetg does can be parallelised, so at some stages it uses just fractions of one processor, while at other stages it uses many processors.

Do you have enough space for the output files in your 'outdir' directory?
Has velvetg produced any output files so far?

You might want to leave out the -amos_file parameter for the time being. I don't know if it takes velvetg any more time or memory to generate it, but the .afg files produced tend to be larger than the rest of velvetg's output files.

As for the -scaffolding parameter, yes is velvetg's default behaviour, but maybe if you turn it off, '-scaffolding no' , it wouldn't have to try and join the contigs into scaffolds, so it might run a bit faster.
mastal is offline   Reply With Quote
Old 05-09-2014, 04:57 AM   #13
vanillasky
Member
 
Location: Europe

Join Date: Mar 2014
Posts: 41
Default

Hi mastal,

Thanks for the feedback. I have 1TB of diskspace so I think the output dir should have enough room. I turned on the scaffolding because I plan on using metavelvet next and it requires scaffolding to build the bigger contigs. There isn't anything yet in the output dir folder and the program is still running. I guess I'll just wait and see where it is by Monday next week. Hopefully done. By any chance do you know which steps are parallelised?
vanillasky is offline   Reply With Quote
Old 05-09-2014, 05:49 AM   #14
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 662
Default

It seems to use a lot of processors in one of the late stages, then goes back to one processor for a short while before printing the output files and finishing.

How did you choose the kmer size, what software did you use?
mastal is offline   Reply With Quote
Old 05-09-2014, 05:59 AM   #15
vanillasky
Member
 
Location: Europe

Join Date: Mar 2014
Posts: 41
Default

I used kmer genie http://kmergenie.bx.psu.edu/
vanillasky is offline   Reply With Quote
Old 05-15-2014, 06:48 AM   #16
vanillasky
Member
 
Location: Europe

Join Date: Mar 2014
Posts: 41
Default

So after one week of running, the output was really awful, with an N50 of 1 and the coverage was estimated as 6. Can you give me any input on how to better pick the kmer size? The output I get now from kmer genie seems to be too low but I don't know how much deviation from such histograms are allowable before the output becomes spurious.
vanillasky is offline   Reply With Quote
Old 05-15-2014, 07:14 AM   #17
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 662
Default

Try using velvetk.pl to calculate coverage first, you can get the script from:

http://www.vicbioinformatics.com/software.velvetk.shtml
mastal is offline   Reply With Quote
Old 05-15-2014, 07:41 AM   #18
jpummil
Member
 
Location: Fayetteville, AR

Join Date: Apr 2014
Posts: 82
Default

In the following link, there is an explanation of evaluating kmer coverage using the stats.txt file generated by your first assembly:

http://en.wikibooks.org/wiki/Next_Ge...g_(NGS)/Velvet

Of course, I'm not certain how good the initial assembly has to be for this to be accurate.
jpummil is offline   Reply With Quote
Reply

Tags
metagenomes, metavelvet, velvet

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:05 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO