I recently discovered to compare sequences by their matching subsequences
- say - of length 12 for nucleotides.
This has the advantage that you needn't align the sequences,
which may take quite some time (usinf MAFFT) for large sets
and can't be done from batch.
You can also store many subsequences in an array and quickly scan
whole databases for matches
Now I wonder how good/reliable that method is to measure
genetic similarity ? Is it generally being done to replace alignment ?
I'm mainly doing it for influenza
- say - of length 12 for nucleotides.
This has the advantage that you needn't align the sequences,
which may take quite some time (usinf MAFFT) for large sets
and can't be done from batch.
You can also store many subsequences in an array and quickly scan
whole databases for matches
Now I wonder how good/reliable that method is to measure
genetic similarity ? Is it generally being done to replace alignment ?
I'm mainly doing it for influenza
Comment