Does anyone know any program that does clustering efficiently with super short sequences such as 3-4 amino acid long?
I was trying to cluster amino acid sequences by similarity and would like to have a similarity threshold 0.7. My sequences are 3-25 aa long. CDHit throws out sequences shorter than 6 and usearch is not accurate for short sequence cluster either. For example 'EFGV' and 'EFGH' by definition has 75% similarity, but with 0.7 similarity threshold usearch puts them into different clusters. I haven't been able to find another program that does what I want: to cluster short sequences by similarity.
Any thoughts?
I was trying to cluster amino acid sequences by similarity and would like to have a similarity threshold 0.7. My sequences are 3-25 aa long. CDHit throws out sequences shorter than 6 and usearch is not accurate for short sequence cluster either. For example 'EFGV' and 'EFGH' by definition has 75% similarity, but with 0.7 similarity threshold usearch puts them into different clusters. I haven't been able to find another program that does what I want: to cluster short sequences by similarity.
Any thoughts?
Comment