SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
de novo assembly using Trinity versus Velvet-Oases Nol De novo discovery 8 10-26-2013 12:56 PM
how to resolve repeat areas with Velvet when doing de novo assembly salmonella De novo discovery 1 10-24-2011 09:42 PM
Velvet de novo assembly to amosvalidate canuck Bioinformatics 5 07-17-2011 12:24 PM
de novo assembly (velvet or others) strob Bioinformatics 1 01-20-2010 05:53 AM
Velvet de novo assembly of Solid reads HOWTO KevinLam De novo discovery 1 01-10-2010 01:11 AM

Reply
 
Thread Tools
Old 07-13-2009, 02:36 PM   #1
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default memory requirments of velvet tool (de novo assembly)

Hi,

Does someone have an idea of how much memory would velvet require for a given input of short reads? And how would it possibly scale with more / longer reads?

Also, any other 'large dataset' de novo assembly tools for the illumina reads. SOAP says 100Gb RAM for human sized genomes, are there other options and what would their memory requirements be?

thanks for sharing..
bioinfosm is offline   Reply With Quote
Old 07-14-2009, 09:09 AM   #2
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

To answer part of it myself, this is a useful source on velvet mailing-list
http://listserver.ebi.ac.uk/pipermai...ne/000359.html

The gist is,
Ram required for velvetg = -109635 + 18977*ReadSize + 86326*GenomeSize + 233353*NumReads - 51092*K

Gives the answer in kb.

Read size is in bases.
Genome size is in millions of bases (Mb)
Number of reads is in millions
K is the kmer hash value used in velveth
bioinfosm is offline   Reply With Quote
Old 08-04-2009, 07:56 PM   #3
Torst
Senior Member
 
Location: The University of Melbourne, AUSTRALIA

Join Date: Apr 2008
Posts: 275
Default

The above formula derived by Simon Gladman has a caveat of only being applicable to Velvet when compiled with the default MAXKMERSIZE=31. If you compiled with 63 for example, the memory usage will increase.
Torst is offline   Reply With Quote
Old 01-05-2010, 09:24 PM   #4
KevinLam
Senior Member
 
Location: SEA

Join Date: Nov 2009
Posts: 203
Default

So what happens when the machine doesn't have enough ram?
does it give a error or just proceed very very slowly?

would having a large enough swap partition help?
KevinLam is offline   Reply With Quote
Old 01-06-2010, 07:09 AM   #5
Zigster
(Jeremy Leipzig)
 
Location: Philadelphia, PA

Join Date: May 2009
Posts: 116
Default

It will segfault, but sometimes it will lock up a machine so badly you will have to physically pull the plug.

I suggest using ulimit, for example I have a 256gb machine and use
ulimit -v 240000000
before every run
__________________
--
Jeremy Leipzig
Bioinformatics Programmer
--
My blog
Twitter
Zigster is offline   Reply With Quote
Old 01-06-2010, 08:20 AM   #6
jgibbons1
Senior Member
 
Location: Worcester, MA

Join Date: Oct 2009
Posts: 133
Default

We've been needing approximately 30g of RAM for velvet assembly with a minimum of 24g depending on the kmer length specified. *This is with single-ended read 36bp Illumina data.

Last edited by jgibbons1; 01-06-2010 at 08:52 AM.
jgibbons1 is offline   Reply With Quote
Old 01-06-2010, 08:50 AM   #7
francesco.vezzi
Member
 
Location: Udine (Italy)

Join Date: Jan 2009
Posts: 50
Default

In order to assmebly a lane of paired reads of length 75 we used 120 giga with a k-mer size of 47.
Obviously the amount of date decrease with a smaller k-mer, but a shorter k-mer implies a higher possibility of mistakes.

I think, this is a my opinion, that with the increasing of the read length tools like velvet will became too memory consuming, and they will became unpractical.
With a read length of 150 an approach like PCAP, ARACNE and EDENA that build an overlap graph and not a de bruijn graph is the only feasible opportunity
francesco.vezzi is offline   Reply With Quote
Old 01-06-2010, 01:52 PM   #8
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

Is it human genome you are working on

One approach is to map reads to reference, and assemble the unmapped reads. Though this can yield a pretty fragmented assembly that is hard to use eventually...

Do you usually do things like remove contaminants or low quality reads, take only the unique set of reads.. ? These can certainly reduce the run time, but last I looked, using a redundant set of reads gave slightly different assembly than a non-redundant one.
__________________
--
bioinfosm
bioinfosm is offline   Reply With Quote
Old 01-06-2010, 02:03 PM   #9
jgibbons1
Senior Member
 
Location: Worcester, MA

Join Date: Oct 2009
Posts: 133
Default

Sorry...I replied to the wrong thread.

Last edited by jgibbons1; 01-06-2010 at 03:05 PM. Reason: replied to the wrong thread
jgibbons1 is offline   Reply With Quote
Old 04-09-2012, 09:50 PM   #10
nxtgenkid10
Member
 
Location: india

Join Date: Feb 2011
Posts: 16
Default how to regulate velevt memory cosumption

I m using velevet for assembly and velevtg is consuming aroung 90% of my memeory ...is there any ways where in i can control the same ... say by threading or any other step?
nxtgenkid10 is offline   Reply With Quote
Old 04-10-2012, 07:43 AM   #11
jgibbons1
Senior Member
 
Location: Worcester, MA

Join Date: Oct 2009
Posts: 133
Default

I've found that one of the best ways to reduce the memory requirements is to quality filter your read set before assembly. Low quality reads directly impact memory. Trimmomatic and Quake are both very good for quality filtering.
jgibbons1 is offline   Reply With Quote
Old 04-18-2012, 07:36 PM   #12
jjjscuedu
Member
 
Location: NY

Join Date: Mar 2012
Posts: 35
Default velvet memory problem

Hi all,

I have tried to use the velvet for my RNAseq data assembly.

My machine is about 40G RAM.

The read length is about 101 for my dataset. The total number reads is about 60 million for one file of pair end. The total size is about 120 million.

However, when I try to assembly it, and when it run after GHost threads and begin to threading through reads. I find it has occupied about 65% RAM. Hence, I need to stop it.

Can anyone give me some suggestions about how to reduce the memory usage? I have set the Kmer to 75 for my dataset.

Jingjing
jjjscuedu is offline   Reply With Quote
Old 04-19-2012, 04:26 AM   #13
arvid
Senior Member
 
Location: Berlin

Join Date: Jul 2011
Posts: 156
Default

1. Get a machine with more RAM
2. Use shorter k-mers
3. Try to reduce complexity in your reads by using Quake or something similar
4. Subsample your reads
5. Use a different assembler

Velvet is known to be memory-hungry, therefore 1 is the best choice. However, if this isn't an option, you should at least try 2 (75 sounds very, very high) or 3, with 4 as the last resort - unless you want to try a completely different assembler. CLC is very memory efficient but commercial...
arvid is offline   Reply With Quote
Reply

Tags
assemble, de novo, memory, soap, velvet

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:40 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO