I've done my best to search for other threads addressing this question but maybe I just don't know the correct terminology:
I am looking for a tool for identifying nearly identical kmers (one base pair mismatch) within a sequence dataset. I can get the kmer counts from the raw sequence data but the next step is eluding me. I could write a script to calculate every possible one base pair change and check for it in the set but it'd take an extremely long time to run. Surely somebody must have developed an optimized tool or library for this task?
I'm dealing with data from heterozygous individuals and would like to spot sequences which contain single base pair with polymorphisms. Downstream I can do lots of filtering to eliminate sequencing errors based on counts, I just need to know what kmers to compare in the first place.
I am looking for a tool for identifying nearly identical kmers (one base pair mismatch) within a sequence dataset. I can get the kmer counts from the raw sequence data but the next step is eluding me. I could write a script to calculate every possible one base pair change and check for it in the set but it'd take an extremely long time to run. Surely somebody must have developed an optimized tool or library for this task?
I'm dealing with data from heterozygous individuals and would like to spot sequences which contain single base pair with polymorphisms. Downstream I can do lots of filtering to eliminate sequencing errors based on counts, I just need to know what kmers to compare in the first place.