I am doing variant calling for exomes generated from Illumina HiSeq 2000, with Nextera Exome Enrichment kit.
Earlier I used the same variant calling pipeline to call variants from sequences based on TruSeq Exome Enrichment kit.
Now with the Nextera Exome Enrichment kit, I find that the number of variants called using GATK is reduced to almost half as compared to TruSeq.
I am not able to reach a conclusion as to why this drastic reduction in number.
TruSeq sample Nextera sample
Mean coverage 40X 47.5X
GATK variants 53917 27596
Here is the GATK command I used for calling variants for both TruSeq and Nextera samples:
java -Xmx2g -Djava.io.tmpdir=temp -jar $gatk_jar -T UnifiedGenotyper -I $in -R $ucsc_hg19_fasta -D:name,VCF $dbsnp146_hg19_vcf -o $o -dcov 1000 -A AlleleBalance -A BaseCounts -A VariantType -baq CALCULATE_AS_NECESSARY -stand_call_conf 30.0 -stand_emit_conf 10.0 -glm BOTH -L $nextera_target_bed(or $truseq_target_bed )
Earlier I used the same variant calling pipeline to call variants from sequences based on TruSeq Exome Enrichment kit.
Now with the Nextera Exome Enrichment kit, I find that the number of variants called using GATK is reduced to almost half as compared to TruSeq.
I am not able to reach a conclusion as to why this drastic reduction in number.
TruSeq sample Nextera sample
Mean coverage 40X 47.5X
GATK variants 53917 27596
Here is the GATK command I used for calling variants for both TruSeq and Nextera samples:
java -Xmx2g -Djava.io.tmpdir=temp -jar $gatk_jar -T UnifiedGenotyper -I $in -R $ucsc_hg19_fasta -D:name,VCF $dbsnp146_hg19_vcf -o $o -dcov 1000 -A AlleleBalance -A BaseCounts -A VariantType -baq CALCULATE_AS_NECESSARY -stand_call_conf 30.0 -stand_emit_conf 10.0 -glm BOTH -L $nextera_target_bed(or $truseq_target_bed )
Comment