Hi everyone,
I have a germline sample that underwent whole genome sequencing at 30x on Illumina Hiseq. After extracting the VAFs at all 1000 Genomes SNP positions, it seems that the data is strangely noisy.
- If you plot a histogram of VAFs from normal samples, you would expect a peak at 0.5 for heterozygous variants and 1.0 for homozygous variants, with some sort of distribution in between for sequencing noise. However, in this sample there is almost a uniform distribution from 0 to 0.8 and a peak at 1.0 (see attached figure).
- This is also reflected when I plot the VAFs at these SNPs from the germline sample (in blue) and a matched tumor sample (in red). In a normal-looking sample, the blue dots should cluster around 0.5 on the y-axis, and the red dots separate where there is a CNV. In this weird sample, you can see that the blue dots basically do not cluster around any VAF, whereas the matched tumor sample looks fine.
I've compared this sample with other samples, and there is no significant difference in coverage, insert size, GC content, ACGT content, indels, base quality or mismatch distributions. Anyone have any idea what might give rise to such noisy data or anyone seen a similar case before? Thanks!
I have a germline sample that underwent whole genome sequencing at 30x on Illumina Hiseq. After extracting the VAFs at all 1000 Genomes SNP positions, it seems that the data is strangely noisy.
- If you plot a histogram of VAFs from normal samples, you would expect a peak at 0.5 for heterozygous variants and 1.0 for homozygous variants, with some sort of distribution in between for sequencing noise. However, in this sample there is almost a uniform distribution from 0 to 0.8 and a peak at 1.0 (see attached figure).
- This is also reflected when I plot the VAFs at these SNPs from the germline sample (in blue) and a matched tumor sample (in red). In a normal-looking sample, the blue dots should cluster around 0.5 on the y-axis, and the red dots separate where there is a CNV. In this weird sample, you can see that the blue dots basically do not cluster around any VAF, whereas the matched tumor sample looks fine.
I've compared this sample with other samples, and there is no significant difference in coverage, insert size, GC content, ACGT content, indels, base quality or mismatch distributions. Anyone have any idea what might give rise to such noisy data or anyone seen a similar case before? Thanks!
Comment