View Single Post
Old 10-23-2013, 11:27 PM   #4
Location: San Francisco, CA

Join Date: Aug 2011
Posts: 91

This is an example of an output in my somatic calling pipeline:

Variant Annotation	Gene	Exon	Codon	Chr	Position	Variant	N(A)	N(C)	N(G)	N(T)	T(A)	T(C)	T(G)	T(T)	QSS	Frequency
nonsynonymous SNV	KRAS	exon2	G12A	chr12	25398284	G35C	0	83	0	0	1	37	12	0	57	0.24

N(A), N(C), N(G), and N(T) stand for the number of calls for A, C, G, and T in the normal sample. T(A), T(C), T(G), and T(T) stand for the number of calls for A, C, G, and T in the tumor sample.

So for the normal sample, there are 83 sequences where the base is called C, and 0 for everything else. So it's pretty clear that in the reference and the normal sample, the base is C.
However, in the tumor sample, there are 37 calls for C (reference) and 12 calls for G (variant), so the variant frequency is 12/(12+37) = 0.245.
Notice the variant is labeled G35C. That's because the coding strand is the minus strand, where the DNA reads use the plus strand.

Anyway, how do you interpret 24.5%?
Well, if I assume that
1) the KRAS mutation is present in all tumor cells (just use this assumption as an example), and
2) KRAS mutation is a heterozygous mutation, then
The tumor sample must contain 1/2 non-tumor cells, because those cells give me half of the reference reads. Out of the remaining 1/2 are tumor cells, half of those chromosomes give me reference reads, and the other 1/2 chromosomes give me mutant reads.
lethalfang is offline   Reply With Quote