![]() |
|
|
#1 |
|
Senior Member
Join Date: Jan 2008
Location: USA
Posts: 363
|
Hi,
Does someone have an idea of how much memory would velvet require for a given input of short reads? And how would it possibly scale with more / longer reads? Also, any other 'large dataset' de novo assembly tools for the illumina reads. SOAP says 100Gb RAM for human sized genomes, are there other options and what would their memory requirements be? thanks for sharing.. |
|
|
|
|
|
#2 |
|
Senior Member
Join Date: Jan 2008
Location: USA
Posts: 363
|
To answer part of it myself, this is a useful source on velvet mailing-list
http://listserver.ebi.ac.uk/pipermai...ne/000359.html The gist is, Ram required for velvetg = -109635 + 18977*ReadSize + 86326*GenomeSize + 233353*NumReads - 51092*K Gives the answer in kb. Read size is in bases. Genome size is in millions of bases (Mb) Number of reads is in millions K is the kmer hash value used in velveth |
|
|
|
|
|
#3 |
|
Senior Member
Join Date: Apr 2008
Location: Victorian Bioinformatics Consortium, AUSTRALIA
Posts: 135
|
The above formula derived by Simon Gladman has a caveat of only being applicable to Velvet when compiled with the default MAXKMERSIZE=31. If you compiled with 63 for example, the memory usage will increase.
|
|
|
|
|
|
#4 |
|
Senior Member
Join Date: Nov 2009
Location: SEA
Posts: 114
|
So what happens when the machine doesn't have enough ram?
does it give a error or just proceed very very slowly? would having a large enough swap partition help? |
|
|
|
|
|
#5 |
|
Member
Join Date: May 2009
Location: Philadelphia, PA
Posts: 86
|
It will segfault, but sometimes it will lock up a machine so badly you will have to physically pull the plug.
I suggest using ulimit, for example I have a 256gb machine and use ulimit -v 240000000 before every run
__________________
-- Jeremy Leipzig Bioinformatics Programmer -- Standardized Velvet Assembly Report - designed to help Velvet users identify the optimal kmer and cvCut parameters My blog FriendFeed |
|
|
|
|
|
#6 |
|
Member
Join Date: Oct 2009
Location: Nashville, TN
Posts: 36
|
We've been needing approximately 30g of RAM for velvet assembly with a minimum of 24g depending on the kmer length specified. *This is with single-ended read 36bp Illumina data.
Last edited by jgibbons1; 01-06-2010 at 07:52 AM. |
|
|
|
|
|
#7 |
|
Member
Join Date: Jan 2009
Location: Udine (Italy)
Posts: 48
|
In order to assmebly a lane of paired reads of length 75 we used 120 giga with a k-mer size of 47.
Obviously the amount of date decrease with a smaller k-mer, but a shorter k-mer implies a higher possibility of mistakes. I think, this is a my opinion, that with the increasing of the read length tools like velvet will became too memory consuming, and they will became unpractical. With a read length of 150 an approach like PCAP, ARACNE and EDENA that build an overlap graph and not a de bruijn graph is the only feasible opportunity |
|
|
|
|
|
#8 |
|
Senior Member
Join Date: Jan 2008
Location: USA
Posts: 363
|
Is it human genome you are working on
One approach is to map reads to reference, and assemble the unmapped reads. Though this can yield a pretty fragmented assembly that is hard to use eventually... Do you usually do things like remove contaminants or low quality reads, take only the unique set of reads.. ? These can certainly reduce the run time, but last I looked, using a redundant set of reads gave slightly different assembly than a non-redundant one. |
|
|
|
|
|
#9 |
|
Member
Join Date: Oct 2009
Location: Nashville, TN
Posts: 36
|
Sorry...I replied to the wrong thread.
Last edited by jgibbons1; 01-06-2010 at 02:05 PM. Reason: replied to the wrong thread |
|
|
|
![]() |
| Tags |
| assemble, de novo, memory, soap, velvet |
| Thread Tools | |
|
|