Dear all,
I have been variant calling from a reference that contains IUPAC ambiguous bases (such as K, W etc...), using samtools/bcftools.
In the mpileup file, these special base characters are maintained in the reference column, while the read bases are one of the classical 4 bases, ACGT.
However, in the vcf file, two things happen:
1. at SNP positions, that coincide with the ambiguous base positions in the reference, the vcf file says "N" at the reference and "ACGT" (or a comma-separated combination of those) at the alternative field. This seems to agree with the vcf4.1 format specifications saying that the reference field may be only ACGTN.
2. at INDEL positions, however, that include an ambiguous base position, these ambiguous bases are displayed in the reference field (and also inside the sequence of the alternative field, if the indel includes that position), such as:
How is this possible? Is the original reference.fasta read for the INDEL positions? Does the vcf4.1 restriction to ref=ACGTN not apply to INDEL positions?
Any of your comments will be very much appreciated.
cheers,
Sophia
I have been variant calling from a reference that contains IUPAC ambiguous bases (such as K, W etc...), using samtools/bcftools.
In the mpileup file, these special base characters are maintained in the reference column, while the read bases are one of the classical 4 bases, ACGT.
However, in the vcf file, two things happen:
1. at SNP positions, that coincide with the ambiguous base positions in the reference, the vcf file says "N" at the reference and "ACGT" (or a comma-separated combination of those) at the alternative field. This seems to agree with the vcf4.1 format specifications saying that the reference field may be only ACGTN.
2. at INDEL positions, however, that include an ambiguous base position, these ambiguous bases are displayed in the reference field (and also inside the sequence of the alternative field, if the indel includes that position), such as:
Code:
Lg10 29366679 . K KCG,KG 49.5 PASS INDEL;DP=19;VDB=0.0318;AF1=1;AC1=2;DP4=0,0,6,9;MQ=20;FQ=-58.5;MPB=U; GT:PL:DP:SP:GQ 1/1:154,88,64,104,0,83:15:0:45 Lg10 29832925 . TTAWAKWTATA TTA 98.5 mrd15 INDEL;DP=17;VDB=0.0404;AF1=1;AC1=2;DP4=0,0,4,6;MQ=20;FQ=-64.5;MPB=U GT:PL:DP:SP:GQ 1/1:139,30,0:10:0:57
Any of your comments will be very much appreciated.
cheers,
Sophia