I want to find all of the short, nearly identical peptide matches between two genomes. These matches are ~10bp in length, allowing for 1 or 2 mismatches, but no gaps.
When I do the BLAST search, I can find many such matches. However, I also find longer matches that are below my identity cutoff (e.g. 30bp matches that are only 70% identity).
From what I know about BLAST, it seems possible that there are 10bp perfect matches buried in these 30bp 70% identity matches. Is this correct? If so, can anyone recommend a way to solve this problem, or point me in the right direction?
I will even code this up in python if that is the best way. (I guess I would split my database into words of length 3, then look at every word in my query and calculate the edit distances between all such strings... doesn't sound very fun...)
When I do the BLAST search, I can find many such matches. However, I also find longer matches that are below my identity cutoff (e.g. 30bp matches that are only 70% identity).
From what I know about BLAST, it seems possible that there are 10bp perfect matches buried in these 30bp 70% identity matches. Is this correct? If so, can anyone recommend a way to solve this problem, or point me in the right direction?
I will even code this up in python if that is the best way. (I guess I would split my database into words of length 3, then look at every word in my query and calculate the edit distances between all such strings... doesn't sound very fun...)
Comment