View Single Post
Old 06-19-2014, 10:48 AM   #11
Brian Bushnell
Super Moderator
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707


It might be better to normalize using a kmer length of 41, but BBNorm only supports a maximum of 31 In practice, it should make very little difference, though. Using long kmers is important for assembly, as it helps span short repeats that would otherwise cause contigs to terminate. But normalization is much less sensitive to that issue, and very long kmers can cause problems in the presence of errors. With k=31, a 100bp read with 1 error could yield 31 kmers with a depth of 1, out of a total of 70 kmers - in that case, the median depth would not be impacted. With k=63, there could be 63 of the 70 total kmers spanning the error, thus having a depth of 1, so the median depth of the read would look like 1 instead of its correct value. And BBNorm normalizes based on the median kmer depth of a read.

It's a lot more computationally efficient to use a max kmer length of 31, so that's how I designed it. I've tried shorter kmers down to about k=25 and not noticed an appreciable difference in normalization or error correction.

As for your prior (deleted) post, sorry for not responding - I think the problem was that you were running Java 6 instead of Java 7. Most of the programs in BBTools work fine in Java 6 but it looks like BBNorm requires Java 7 (or higher).
Brian Bushnell is offline