Hi all,
I have been using both GATK and Samtools for variant calling in individual samples. Both these tools uses a bayesian approach to call the genotypes but still produces slightly different variants. Is the difference between the two algorithms arise because of prior probabilities or likelihood calculation they take into account to calculate the posterior probability? There exists a slight difference in the likelihood model in modelling errors, is there also a difference in priors? If there exists a difference only the Genotype of the variant detected at a specific position should vary between the two algorithms, but i have observed difference in total number of SNPs called by the two methods. Could you explain where does this difference comes from?
In the article "Genotype and SNP calling from next-generation sequencing data" it is mentioned that variant calling is done in two steps: SNP calling followed by genotype calling for the called sites during SNP calling. Since the genotype calling is done by similar Bayesain approach in both GATK and samtools, does the SNP calling methods used in these two tools produce different number of SNP calls?
If the difference is from SNP calling methods used in GATK and Samtools, would you please suggest some source or give a short summary of the SNP calling methods.
I have read the following papers to figure out the difference between the algorithms. Unfortunately i am only able to find out the genotype calling methods carried by both the algorithms but not the difference leading
to different number of SNP calls between the two.
1. Framework for variation discovery and genotyping from
next-generation DNA sequencing(GATK).
2. Mapping short DNA sequencing reads and calling variants using
mapping quality scores(Samtools)
3. The Genome Analysis Toolkit: a MapReduce framework for analyzing
next-generation DNA sequencing data
Could someone give your suggestions!!
Awaiting response
I have been using both GATK and Samtools for variant calling in individual samples. Both these tools uses a bayesian approach to call the genotypes but still produces slightly different variants. Is the difference between the two algorithms arise because of prior probabilities or likelihood calculation they take into account to calculate the posterior probability? There exists a slight difference in the likelihood model in modelling errors, is there also a difference in priors? If there exists a difference only the Genotype of the variant detected at a specific position should vary between the two algorithms, but i have observed difference in total number of SNPs called by the two methods. Could you explain where does this difference comes from?
In the article "Genotype and SNP calling from next-generation sequencing data" it is mentioned that variant calling is done in two steps: SNP calling followed by genotype calling for the called sites during SNP calling. Since the genotype calling is done by similar Bayesain approach in both GATK and samtools, does the SNP calling methods used in these two tools produce different number of SNP calls?
If the difference is from SNP calling methods used in GATK and Samtools, would you please suggest some source or give a short summary of the SNP calling methods.
I have read the following papers to figure out the difference between the algorithms. Unfortunately i am only able to find out the genotype calling methods carried by both the algorithms but not the difference leading
to different number of SNP calls between the two.
1. Framework for variation discovery and genotyping from
next-generation DNA sequencing(GATK).
2. Mapping short DNA sequencing reads and calling variants using
mapping quality scores(Samtools)
3. The Genome Analysis Toolkit: a MapReduce framework for analyzing
next-generation DNA sequencing data
Could someone give your suggestions!!
Awaiting response