SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
RNAseq bam to variants calling (GATK, Mutect1/2 memento Bioinformatics 0 07-19-2017 11:13 AM
Using GATK's local realigner outside of GATK pipeline for variant calling NikTuzov Bioinformatics 1 09-08-2015 08:41 AM
calling variants using GATK kjaja Bioinformatics 0 03-19-2013 10:39 AM
GATK UnifiedGenotyper not calling any variants krobasky Bioinformatics 7 10-25-2012 10:19 PM
SNP calling for one chromosome using mpileup zhanglu295 Bioinformatics 1 07-05-2011 05:07 AM

Reply
 
Thread Tools
Old 02-24-2020, 05:07 AM   #1
dfermin315
Junior Member
 
Location: Michigan, US

Join Date: Mar 2017
Posts: 4
Default GATK: Calling variants on chromosome Y

Hello everyone,

I apologize if this is a stupid question but I'm new to using GATK.
I've searched this forum for an answer but nothing seems to fit the bill.

I'm trying to use GATK 4.1 to analyze WGS data I have.
The data is 30X deep.

At this point I'm only testing and optimizing my pipeline so I am only working with 4 samples (2 males and 2 females).

I've followed the information available a the Best Practices Workflow entitled "Germline short variant discovery (SNPs + Indels)". (https://gatk.broadinstitute.org/hc/e...s/360035535932)

I have produced the Analysis-Ready BAM files and got down to the VCF files made by GenotypeGVCFs.

As a quick sanity check I went to see the variant calls on chromosome Y and I'm a little confused by what I see.

First off for the males in my test cohort, multiple sites are listed as heterozygous. Others are reported as missed calls (ie: `./.`)

Second: If I go over to the female samples I see that they have variant calls for the Y chromosome as well.

In both cases the VCF files exclude the pseudo-autosomal regions on chrY.

Looking at the BAM files I can clearly see that the males have read counts for known Y chromosome genes and the females do not.

I'm wondering what commands or steps I need to run on the data to get the correct gentoyping reported in the VCF file.

Right now my pipeline produces one genomicDB for each chromosome.
I'm guessing that when I generate the VCF files with GenotypeGVCFs I need to adjust the command for chromosome Y (?)

I've tried:
Code:
gatk GenotypeGVCFs \
-R hg38.ref/Homo_sapiens_assembly38.fasta \
-D hg38.ref/Homo_sapiens_assembly38.dbsnp138.vcf \
-V gendb://genomicDB_chrY \
-O chrY.gatk_hg38.vcf.gz \
-ploidy 1
But the VCF file remains the same.
I've posted this on the GATK forum and it's gone unanswered.
Maybe it's a stupid question and the answer should be obvious but I can't find it.

Can anyone suggest what I should do to get the correct genotyping?

Thanks in advance for any and all help
dfermin315 is offline   Reply With Quote
Old 02-25-2020, 06:42 AM   #2
m_two
Member
 
Location: USA

Join Date: Mar 2010
Posts: 50
Default

The PAR is typically masked on chrY since the PAR reference is an exact duplicate of chrX sequence.

chrY unique regions tend to be repetitive and highly variable with many deletions, duplications, and CNV

https://rbej.biomedcentral.com/artic...958-018-0330-5

https://journals.plos.org/plosgeneti...l.pgen.1006834
m_two is offline   Reply With Quote
Reply

Tags
chromosome y, gatk, genotype, genotypegvcfs

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:08 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO