Is it possible to take a file of aligned fasta reads (say 5kb of fasta format sequence or, even better, multifasta format, including one "reference" sequence) and produce a VCF file showing the variants? I'd like to treat each base in the fasta sequences as known (i.e. perfect sequence quality).
I can easily look at a fasta alignment, say in clustal. It is relatively easy to generate a VCF listing all SNPs. But it gets more complicated when you add in indels and complex variants, and when there are lots of samples it can get messy quickly.
One solution would be to use an aligner to map these 5kb sequences to the 5kb "reference" sequence, then use any number of tools to generate the VCF from the BAM. However, this would be redoing the alignment step - I already have an alignment! Moreover, it might be difficult to calibrate the SNP calling tools which are not used to working with perfect sequence.
A similar question is asked in the following thread, but this no satisfactory solution for FASTA --> VCF is found: http://seqanswers.com/forums/showthread.php?t=30461
Any ideas?
I can easily look at a fasta alignment, say in clustal. It is relatively easy to generate a VCF listing all SNPs. But it gets more complicated when you add in indels and complex variants, and when there are lots of samples it can get messy quickly.
One solution would be to use an aligner to map these 5kb sequences to the 5kb "reference" sequence, then use any number of tools to generate the VCF from the BAM. However, this would be redoing the alignment step - I already have an alignment! Moreover, it might be difficult to calibrate the SNP calling tools which are not used to working with perfect sequence.
A similar question is asked in the following thread, but this no satisfactory solution for FASTA --> VCF is found: http://seqanswers.com/forums/showthread.php?t=30461
Any ideas?