View Single Post
Old 06-25-2014, 08:54 AM   #2
Brian Bushnell
Super Moderator
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707


You can either subsample or normalize; sometimes one gives a better assembly than the other. BBNorm will not lose pairing or quality information.

That package also contains Reformat, a tool that can do subsampling: in=reads.fq.gz out=sampled.fq.gz samplerate=0.04

...will subsample to 4%.

With BBNorm, to normalize to 35x coverage: in=reads.fq.gz out=normalized.fq.gz target=35 -Xmx29g

You may or may not need to set the -Xmx flag, depending on your environment. If you do, then set it to about 85% of the machine's physical memory.

In my experience, normalization normally yields a better assembly than subsampling, particularly if you have variable coverage. The disadvantages of normalization compared to subsampling are that it is slower and uses more memory, and destroys information about repeat content, but I consider these disadvantages to be unimportant if the resultant assembly is better.

Also, I find that Velvet, with default parameters, yields the best assembly at around 35-40x coverage.
Brian Bushnell is offline   Reply With Quote