Hello everyone,
I apologize if this is a stupid question but I'm new to using GATK.
I've searched this forum for an answer but nothing seems to fit the bill.
I'm trying to use GATK 4.1 to analyze WGS data I have.
The data is 30X deep.
At this point I'm only testing and optimizing my pipeline so I am only working with 4 samples (2 males and 2 females).
I've followed the information available a the Best Practices Workflow entitled "Germline short variant discovery (SNPs + Indels)". (https://gatk.broadinstitute.org/hc/e...s/360035535932)
I have produced the Analysis-Ready BAM files and got down to the VCF files made by GenotypeGVCFs.
As a quick sanity check I went to see the variant calls on chromosome Y and I'm a little confused by what I see.
First off for the males in my test cohort, multiple sites are listed as heterozygous. Others are reported as missed calls (ie: `./.`)
Second: If I go over to the female samples I see that they have variant calls for the Y chromosome as well.
In both cases the VCF files exclude the pseudo-autosomal regions on chrY.
Looking at the BAM files I can clearly see that the males have read counts for known Y chromosome genes and the females do not.
I'm wondering what commands or steps I need to run on the data to get the correct gentoyping reported in the VCF file.
Right now my pipeline produces one genomicDB for each chromosome.
I'm guessing that when I generate the VCF files with GenotypeGVCFs I need to adjust the command for chromosome Y (?)
I've tried:
But the VCF file remains the same.
I've posted this on the GATK forum and it's gone unanswered.
Maybe it's a stupid question and the answer should be obvious but I can't find it.
Can anyone suggest what I should do to get the correct genotyping?
Thanks in advance for any and all help
I apologize if this is a stupid question but I'm new to using GATK.
I've searched this forum for an answer but nothing seems to fit the bill.
I'm trying to use GATK 4.1 to analyze WGS data I have.
The data is 30X deep.
At this point I'm only testing and optimizing my pipeline so I am only working with 4 samples (2 males and 2 females).
I've followed the information available a the Best Practices Workflow entitled "Germline short variant discovery (SNPs + Indels)". (https://gatk.broadinstitute.org/hc/e...s/360035535932)
I have produced the Analysis-Ready BAM files and got down to the VCF files made by GenotypeGVCFs.
As a quick sanity check I went to see the variant calls on chromosome Y and I'm a little confused by what I see.
First off for the males in my test cohort, multiple sites are listed as heterozygous. Others are reported as missed calls (ie: `./.`)
Second: If I go over to the female samples I see that they have variant calls for the Y chromosome as well.
In both cases the VCF files exclude the pseudo-autosomal regions on chrY.
Looking at the BAM files I can clearly see that the males have read counts for known Y chromosome genes and the females do not.
I'm wondering what commands or steps I need to run on the data to get the correct gentoyping reported in the VCF file.
Right now my pipeline produces one genomicDB for each chromosome.
I'm guessing that when I generate the VCF files with GenotypeGVCFs I need to adjust the command for chromosome Y (?)
I've tried:
Code:
gatk GenotypeGVCFs \ -R hg38.ref/Homo_sapiens_assembly38.fasta \ -D hg38.ref/Homo_sapiens_assembly38.dbsnp138.vcf \ -V gendb://genomicDB_chrY \ -O chrY.gatk_hg38.vcf.gz \ -ploidy 1
I've posted this on the GATK forum and it's gone unanswered.
Maybe it's a stupid question and the answer should be obvious but I can't find it.
Can anyone suggest what I should do to get the correct genotyping?
Thanks in advance for any and all help
Comment