This is an example of an output in my somatic calling pipeline:
Code:
Variant Annotation Gene Exon Codon Chr Position Variant N(A) N(C) N(G) N(T) T(A) T(C) T(G) T(T) QSS Frequency
nonsynonymous SNV KRAS exon2 G12A chr12 25398284 G35C 0 83 0 0 1 37 12 0 57 0.24
N(A), N(C), N(G), and N(T) stand for the number of calls for A, C, G, and T in the normal sample. T(A), T(C), T(G), and T(T) stand for the number of calls for A, C, G, and T in the tumor sample.
So for the normal sample, there are 83 sequences where the base is called C, and 0 for everything else. So it's pretty clear that in the reference and the normal sample, the base is C.
However, in the tumor sample, there are 37 calls for C (reference) and 12 calls for G (variant), so the variant frequency is 12/(12+37) = 0.245.
Notice the variant is labeled G35C. That's because the coding strand is the minus strand, where the DNA reads use the plus strand.
Anyway, how do you interpret 24.5%?
Well, if I assume that
1) the KRAS mutation is present in all tumor cells (just use this assumption as an example), and
2) KRAS mutation is a heterozygous mutation, then
The tumor sample must contain 1/2 non-tumor cells, because those cells give me half of the reference reads. Out of the remaining 1/2 are tumor cells, half of those chromosomes give me reference reads, and the other 1/2 chromosomes give me mutant reads.