View Single Post
Old 05-12-2017, 09:13 PM   #20
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Sorry, but the reason for the assertion is because the data structures fundamentally won't handle kmers longer than 31, since they are being stored in 64-bit integers. It's possible to modify the code to allow K>31 because I now have some classes that support unlimited-length kmers, but that would take quite a lot of work. You can always disable assertions by adding the flag -da when the program runs - "bbcountunique.sh -da <other arguments>". But it won't give correct results for K>31.

Generally, I don't see a useful purpose for K>31 with that program - the longer K is, the more errors, which inflate the uniqueness; and K=31 should be sufficient for determining whether you have seen a read before. You won't saturate the K=31 kmer space for read uniqueness plots with Illumina HiSeq machines; that would take 600 million terabases at 2x150bp.

Last edited by Brian Bushnell; 05-12-2017 at 09:45 PM.
Brian Bushnell is offline   Reply With Quote