I was wondering if there is any way to change the assumptions for the genotypes calling in .vcf files from mpileup in samtools. I am working with a diploid organism but the individuals are mostly homozygous recombinant inbred lines with only about 1% residual heterozygosity. (or highly inbred lines). The problem is that when calling a SNP with low coverage (1-3 reads) and only one allele is observed in a sample, it often assumes the individual is heterozygous if the observed allele is the less common allele.
The problem is that it assumes that all the loci in my individuals are in H-W equilibrium, when in fact due to experimental design they are not anywhere close to being in HW eq and most loci are going to be homozygous. Filtering by the quality on genotype calls reduces the problem but also discards much of the data.
Of course sequencing to a high depth would solve this question with the existing tools, but when I expect >99% homozygous individuals at each loci that should not be necessary, as one or two A "Reads" should be enough to predict an AA genotype.
The problem is that it assumes that all the loci in my individuals are in H-W equilibrium, when in fact due to experimental design they are not anywhere close to being in HW eq and most loci are going to be homozygous. Filtering by the quality on genotype calls reduces the problem but also discards much of the data.
Of course sequencing to a high depth would solve this question with the existing tools, but when I expect >99% homozygous individuals at each loci that should not be necessary, as one or two A "Reads" should be enough to predict an AA genotype.
Comment