Hi all,
I am familiar with the BWA + SAMTOOLS pipeline and how it can be used to call SNPs by comparing reads against a reference genome. In this case, ploidy, coverage and sequence quality information can be used to guide the SNP calling, and various fancy algorithms exist to do so.
But what about an (apparently) simpler situation, where you do not deal with reads, but rather plain CDS sequences; e.g., orthologs or paralogs? We know that these sequences are highly similar, and want to point the most significant SNP, indel, etc. that best distinguish them.
My first thought was to plug a multiple alignment software's output (such as CLUSTAL or MUSCLE) to SAMTOOLS to generate a pileup and a VCF file. None of these tools generate a SAM/BAM output however, nor could I find a converter. This tells me that trying to hook CLUSTAL to SAMTOOLS is maybe not the best way.
Does anybody have experience with this situation?
Best,
Aurélien
I am familiar with the BWA + SAMTOOLS pipeline and how it can be used to call SNPs by comparing reads against a reference genome. In this case, ploidy, coverage and sequence quality information can be used to guide the SNP calling, and various fancy algorithms exist to do so.
But what about an (apparently) simpler situation, where you do not deal with reads, but rather plain CDS sequences; e.g., orthologs or paralogs? We know that these sequences are highly similar, and want to point the most significant SNP, indel, etc. that best distinguish them.
My first thought was to plug a multiple alignment software's output (such as CLUSTAL or MUSCLE) to SAMTOOLS to generate a pileup and a VCF file. None of these tools generate a SAM/BAM output however, nor could I find a converter. This tells me that trying to hook CLUSTAL to SAMTOOLS is maybe not the best way.
Does anybody have experience with this situation?
Best,
Aurélien