View Single Post
Old 05-05-2014, 01:24 PM   #4
Brian Bushnell
Super Moderator
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707

Originally Posted by cjfields View Post
Worth mentioning that the partitioning that BBNorm appears to use is coverage-based. khmer uses a (simplified) de Bruijn graph-based partitioning (separating into disconnected partitions of the graph). That's a very important distinction between the two.
Thanks for pointing that out; I was unaware that khmer's partitioning was NOT coverage-based. Yes, BBNorm's partitioning is purely coverage-based and will not be useful except in situations where you have multiple organisms (or organelles, plasmids, etc) with highly different coverage, though that's typically the case in metagenomes.

That said - I've found partitioning by connectivity (overlap, debruijn, etc) in metagenomes to be problematic with short (~100bp) reads; the situation can easily devolve into a single cluster because everything will be connected by a single highly-conserved element, like a 16s subsequence. With longer (~250bp) reads it seems to work better.
Brian Bushnell is offline   Reply With Quote