I am currrently looking into some exome data that we have run on our hiscanSQ. We have sequenced a sample in two different runs, to compare the number of identified variants in both runs.
I also looked into the zygosity of the samples, and found that 16 % (!!) of the variants differed in their zygosity (e.g. hetozygous in one run, heterozygous in the next). This sounds high, so I was wondering if anybody else has looked into this in their data, and want to share their experience?.
Details of analysis:
I used a pipeline with bwa + GATK for alignment and genotyping. Initially, I filtered my vcf-files so that only variants covered by at least 30 X was included, and also removed some low-quality variants not passing filters. Then I intersected the two files with BEDtools and added the -wo option, to get both outputs in the same file. I used a simple awk to output the lines in which the GT-field was different.
I also looked into the zygosity of the samples, and found that 16 % (!!) of the variants differed in their zygosity (e.g. hetozygous in one run, heterozygous in the next). This sounds high, so I was wondering if anybody else has looked into this in their data, and want to share their experience?.
Details of analysis:
I used a pipeline with bwa + GATK for alignment and genotyping. Initially, I filtered my vcf-files so that only variants covered by at least 30 X was included, and also removed some low-quality variants not passing filters. Then I intersected the two files with BEDtools and added the -wo option, to get both outputs in the same file. I used a simple awk to output the lines in which the GT-field was different.
Comment