I am having trouble interpreting the genotype field from the 1000 genomes phase1 variant calls. My problem is that I don’t know how to interpret the ‘GL’ parameter in the genotype field. The VCF v4.1 documentation says that the ‘GL’ in the format field indicates that these quality scores are “log10-scaled” likelihoods for all the possible genotypes.
Here is an arbitrary vcf record showing only the first two genotypes.
If I understand this correctly, the first sample is a homozygous match for the reference and the probability for this genotype is 10^(-0.03) = 0.93. So there is a 93% percent chance that this genotype is accurate.
This seems to be the only way that this would make sense except that the VCF documentation uses radically different example values like:
witch would make my interpretation unlikely because 10 raised to the power of any of these leaves you with a very small number.
Also, here is where I got this vcf file from:
ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/integrated_call_sets/
Any help with this would be greatly appreciated.
Here is an arbitrary vcf record showing only the first two genotypes.
Code:
22 16050408 rs149201999 T C 100 PASS <INFO> GT:DS:GL 0|0:0.050:-0.03,-1.17,-5.00 0|1:0.900:-0.71,-0.09,-5.00
This seems to be the only way that this would make sense except that the VCF documentation uses radically different example values like:
Code:
GT:GL 0/1:-323.03,-99.29,-802.53
Also, here is where I got this vcf file from:
ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/integrated_call_sets/
Any help with this would be greatly appreciated.