I have 60 bp pyrosequencing data that targets the 16s rRNA gene to discover bacterial communities.
My first step is to pairwise align the pyrosequencing data against itself, then cluster the sequences using a 97% sequence similarity threshold. Then for a particular cluster, perform multiple alignment.
My question is the following: for sequence X in a particular multiple alignment, is there a tool that will find the most 'similar' sequence (i.e. the distance in the tree constructed by the multiple aligner) to sequence X in the multiple alignment?
I've written some scripts in perl, but I'm concerned for it's robustness. I use MUSCLE for multiple alignment and use the tree it constructs to obtain the 'similarity' measures between sequences.
Thanks
My first step is to pairwise align the pyrosequencing data against itself, then cluster the sequences using a 97% sequence similarity threshold. Then for a particular cluster, perform multiple alignment.
My question is the following: for sequence X in a particular multiple alignment, is there a tool that will find the most 'similar' sequence (i.e. the distance in the tree constructed by the multiple aligner) to sequence X in the multiple alignment?
I've written some scripts in perl, but I'm concerned for it's robustness. I use MUSCLE for multiple alignment and use the tree it constructs to obtain the 'similarity' measures between sequences.
Thanks
Comment