Hi,
I could really use some help with my variant calling pipeline. I have illumina reads from outbreak bacterial strains, mapped to a reference using bwa, sorted and filtered for quality snps using vcftools and my output are .vcf files for each genome.
My research question in this instance is quite specific, how many snps do my genomes differ by?
I know what I would like to do - create a pair wise snp matrix for all genomes showing snp differences. My ideal output would be a table like this
Sample 1 sample 2 sample 3
Sample 1 0 4 12
Sample 2 4 0 1
Sample 3 12 1 0
I have a feeling it requires a custom Python script or programming in R?
Any help, advice or comments would be very much appreciated
Al'Thor
I could really use some help with my variant calling pipeline. I have illumina reads from outbreak bacterial strains, mapped to a reference using bwa, sorted and filtered for quality snps using vcftools and my output are .vcf files for each genome.
My research question in this instance is quite specific, how many snps do my genomes differ by?
I know what I would like to do - create a pair wise snp matrix for all genomes showing snp differences. My ideal output would be a table like this
Sample 1 sample 2 sample 3
Sample 1 0 4 12
Sample 2 4 0 1
Sample 3 12 1 0
I have a feeling it requires a custom Python script or programming in R?
Any help, advice or comments would be very much appreciated
Al'Thor
Comment