View Single Post
Old 09-05-2015, 10:28 PM   #21
Brian Bushnell
Super Moderator
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707

I'll read the article in a few days, and comment on it then. As Titus stated, you cannot do binning by depth after normalization - it destroys that information. Furthermore, MDA'd single cells cannot be individually binned for contaminants based on depth, as the depth is exponentially random across the genome.

I use BBNorm (with the settings target=100 min=2) to preprocess amplified single cells prior to assembly with Spades, as it vastly reduces the total runtime and memory use, meaning that the jobs are much less likely to crash or be killed. If you want to reduce contamination, though, I have a different tool called CrossBlock, which is designed to eliminate cross-contamination between multiplexed single-cell libraries. You need to first assemble all the libraries, then run CrossBlock with all of the libraries and their reads (raw, not normalized!) as input; it essentially removes contigs from assemblies that have greater coverage from another library than their own library. Incidentally, CrossBlock does in fact use BBNorm.

The latest version of Spades does not really have too much trouble with high-abundance kmers, unless they get extremely high or you have a limited amount of memory. So, you don't HAVE to normalize before running Spades, but it tends to give a comparable assemble with a small fraction of the resources - typically with slightly better continuity and slightly lower misassembly rates, with slightly lower genome recovery, but a slightly higher rate of long genes being called (according to Quast).

On the other hand, if you want to assemble MDA-amplified single-cell data with an assembler designed for isolate data, normalization is pretty much essential for a decent assembly.

Last edited by Brian Bushnell; 09-05-2015 at 10:32 PM.
Brian Bushnell is offline   Reply With Quote