Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
aligners filtering out reads with excessive read depth? efoss Bioinformatics 2 10-01-2013 11:19 AM
Read Depth in vcf (samtools / bcftools) Marie_Noir Bioinformatics 1 04-17-2012 06:48 AM
No reference/alternate (DP4) read depth in VCF? Tally Bioinformatics 0 02-20-2012 03:33 PM
Read distribution at high sequence depth ForeignMan General 10 05-26-2011 03:50 AM

Thread Tools
Old 10-04-2019, 06:51 AM   #1
Location: Germany

Join Date: Dec 2010
Posts: 28
Default Filtering out individual genotype calls with too HIGH read depth from VCF

I have VCF files generated from GATK's HaplotypeCaller. One file per each of 20 individuals. These VCF files will be combined into a multi-sample gVCF for joint genotyping using GenotypeGVCFs (GATK), producing a vcf.gz file including all variable positions across the individuals.

I would like to set a filter to remove certain variants. The tricky part is that this filter is not a global filter, meaning, the filtering threshold should be set differently for each individual. Specifically, I'm looking to exclude any genotype (variant) call *within an individual* that has more than 4-times the average read depth of *that individual*.

How do I achieve such filtering? Can this be done on the combined vcf-file (or even the variants vcf file), or do I have to do such filtering before combining individual VCF files into one?
And, how do I implement this filter? I cannot think of any tool that allows me to filter out positions with a too high read depth, and particularly not if the respective threshold depends on the genome-wide average.

Thank you for your help!
Marius is offline   Reply With Quote
Old 10-04-2019, 09:59 AM   #2
Registered Vendor
Location: Eugene, OR

Join Date: May 2013
Posts: 521

That would require some scripting, I think. vcftools can filter for sites within a range of read depth, so you could:
extract one individual with vcftools --indv
find the mean depth with vcftools --depth
filter that individual with --max-meanDP

but at that point I would just parse the vcf with a scripting language.
Providing nextRAD genotyping and PacBio sequencing services.
SNPsaurus is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 03:44 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO