VCF Indel encoding question

darren.obbard

Junior Member

Join Date: Jan 2012
Posts: 5

VCF Indel encoding question

05-23-2013, 09:34 AM

Hi,

I'm having problems understanding a GATK output VCF. I have read the VCF standard, but I'm obviously missing something.

I /think/ I understand how SNPs and short indels are represented, but clearly I do not. Below is an excerpt that illustrates sites which I do not understand. I suspect it may be something to do with GATK quality filters that I'm not understanding...

The excerpt below was generated using

GATK -l INFO -I my.bam -R my.fa -T UnifiedGenotyper -S LENIENT -nt 8 --heterozygosity 0.1 -o test.vcf --genotype_likelihoods_model BOTH --min_base_quality_score 10 --output_mode EMIT_ALL_SITES -ploidy 2

Thanks!

Darren

-------------------------------------------------------

Code:

CH1	225	.	T	G	12.71	LowQual	AC=1;AF=0.500;AN=2;BaseQRankSum=1.978;DP=59;Dels=0.03;FS=0.000;HaplotypeScore=10.2840;MLEAC=1;MLEAF=0.500;MQ=70.25;MQ0=8;MQRankSum=-5.349;QD=0.22;ReadPosRankSum=-3.188	GT:AD:DP:GQ:PL	0/1:41,16:55:20:20,0,1435
CH1	226	.	T	.	121.53	.	AN=2;DP=59;MQ=70.25;MQ0=8	GT:DP	0/0:43
CH1	227	.	A	.	121.53	.	AN=2;DP=59;MQ=70.25;MQ0=8	GT:DP	0/0:43
CH1	228	.	T	.	121.53	.	AN=2;DP=59;MQ=70.25;MQ0=8	GT:DP	0/0:43
CH1	229	.	A	.	115.53	.	AN=2;DP=57;MQ=69.66;MQ0=8	GT:DP	0/0:38
CH1	230	.	C	.	115.53	.	AN=2;DP=57;MQ=69.66;MQ0=8	GT:DP	0/0:38
CH1	231	.	T	.	115.53	.	AN=2;DP=57;MQ=69.66;MQ0=8	GT:DP	0/0:36
CH1	232	.	G	.	115.53	.	AN=2;DP=57;MQ=69.66;MQ0=8	GT:DP	0/0:36
CH1	233	.	C	.	115.53	.	AN=2;DP=57;MQ=69.66;MQ0=8	GT:DP	0/0:37
CH1	234	.	A	.	139.53	.	AN=2;DP=70;MQ=59.20;MQ0=14	GT:DP	0/0:63
CH1	235	.	A	.	175.53	.	AN=2;DP=84;MQ=51.67;MQ0=15	GT:DP	0/0:79
CH1	236	.	A	.	175.53	.	AN=2;DP=84;MQ=51.67;MQ0=15	GT:DP	0/0:79
CH1	237	.	T	.	175.53	.	AN=2;DP=85;MQ=51.37;MQ0=16	GT:DP	0/0:80
CH1	238	.	A	.	175.53	.	AN=2;DP=102;MQ=46.90;MQ0=28	GT:DP	0/0:97
CH1	238	.	A	AGAAAGAAAGCTTGTA	83.73	.	AC=1;AF=0.500;AN=2;BaseQRankSum=6.172;DP=102;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=46.90;MQ0=0;MQRankSum=-6.190;QD=0.05;ReadPosRankSum=-5.733	GT:AD:DP:GQ:PL	0/1:27,25:57:99:121,0,4853
CH1	239	.	A	.	175.53	.	AN=2;DP=102;MQ=46.90;MQ0=28	GT:DP	0/0:101
CH1	240	.	T	.	175.53	.	AN=2;DP=102;MQ=46.90;MQ0=28	GT:DP	0/0:98
CH1	241	.	A	.	169.53	.	AN=2;DP=108;MQ=44.14;MQ0=29	GT:DP	0/0:107
CH1	242	.	T	.	169.53	.	AN=2;DP=109;MQ=43.94;MQ0=29	GT:DP	0/0:103
CH1	242	.	T	.	118.27	.	AN=2;DP=109;MQ=43.94;MQ0=29	GT:AD:DP	0/0:27:55
CH1	243	.	C	.	172.53	.	AN=2;DP=110;MQ=43.76;MQ0=29	GT:DP	0/0:108
CH1	243	.	CTTTT	.	118.27	.	AN=2;DP=110;MQ=43.76;MQ0=29	GT:AD:DP	0/0:27:56
CH1	244	.	T	.	91.53	.	AN=2;DP=110;MQ=43.76;MQ0=29	GT:DP	0/0:61
CH1	245	.	T	.	91.53	.	AN=2;DP=110;MQ=43.76;MQ0=29	GT:DP	0/0:53
CH1	246	.	T	.	73.53	.	AN=2;DP=110;MQ=43.76;MQ0=29	GT:DP	0/0:41
CH1	247	.	T	.	91.53	.	AN=2;DP=110;MQ=43.76;MQ0=29	GT:DP	0/0:46
CH1	248	.	A	.	172.53	.	AN=2;DP=116;MQ=42.61;MQ0=31	GT:DP	0/0:100
CH1	249	.	A	.	172.53	.	AN=2;DP=116;MQ=42.61;MQ0=31	GT:DP	0/0:100
CH1	250	.	T	.	172.53	.	AN=2;DP=117;MQ=42.43;MQ0=32	GT:DP	0/0:101
CH1	251	.	T	.	169.53	.	AN=2;DP=117;MQ=42.43;MQ0=32	GT:DP	0/0:96
CH1	251	.	T	.	118.27	.	AN=2;DP=117;MQ=42.43;MQ0=32	GT:AD:DP	0/0:27:56
CH1	252	.	C	.	172.53	.	AN=2;DP=117;MQ=42.43;MQ0=32	GT:DP	0/0:113
CH1	253	.	C	.	172.53	.	AN=2;DP=117;MQ=42.43;MQ0=32	GT:DP	0/0:110
CH1	254	.	T	.	172.53	.	AN=2;DP=117;MQ=42.43;MQ0=32	GT:DP	0/0:111
CH1	255	.	T	.	172.53	.	AN=2;DP=117;MQ=42.43;MQ0=32	GT:DP	0/0:111
CH1	256	.	T	.	172.53	.	AN=2;DP=117;MQ=42.43;MQ0=32	GT:DP	0/0:111

Line 1 is a SNP
Lines 14 and 15 are an indel that I do understand
Lines 19 and 20 I do /not/ understand
Lines 21 and 22 I do /not/ understand
---------------------------------------------------

Tags: gatk, indel, vcf

Torst

Senior Member

Join Date: Apr 2008

Posts: 275
- Share
- Tweet
#2

06-06-2013, 12:38 AM

Darren, maybe you should re-post with the lines numbered. You can use the Unix "nl" command to do this.
Comment

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
- Channel: Articles
Yesterday, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 51 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 50 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 44 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

VCF Indel encoding question

Comment

Latest Articles

ad_right_rmr

News