View Single Post
Old 03-01-2017, 02:47 PM   #2
Brian Bushnell
Super Moderator
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707

There are several possible approaches here. First, you can try other assemblers:

Megahit - we use this routinely for metagenome assemblies because the resource requirements (time and memory) are much lower than Spades.

Disco - an overlap-based assembler designed for metagenomes, which uses a similar amount of memory to the size of the input data.

Second, you can reduce the memory footprint of the data through preprocessing. This involves filtering and trimming the data, and potentially by error-correcting it and/or discarding reads with very high coverage or with too low coverage to assemble. An example is posted here; at least, the first 5 steps. For a large metagenome, I also recommend removing human reads (just prior to error-correction) as a way to reduce memory consumption.

Normalization can be done like this:

Code: in1=./paired_1.fastq in2=./paired_2.fastq out=normalized.fq target=100 min=3
That will reduce coverage to a maximum of 100x and discard reads with coverage under 3x, which can greatly increase speed and reduce memory consumption. Sometimes it also results in a better assembly, but that depends on the data. Normalization should be (ideally) done after error-correction.
Brian Bushnell is offline   Reply With Quote