Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • VCF Indel encoding question

    Hi,

    I'm having problems understanding a GATK output VCF. I have read the VCF standard, but I'm obviously missing something.

    I /think/ I understand how SNPs and short indels are represented, but clearly I do not. Below is an excerpt that illustrates sites which I do not understand. I suspect it may be something to do with GATK quality filters that I'm not understanding...

    The excerpt below was generated using

    GATK -l INFO -I my.bam -R my.fa -T UnifiedGenotyper -S LENIENT -nt 8 --heterozygosity 0.1 -o test.vcf --genotype_likelihoods_model BOTH --min_base_quality_score 10 --output_mode EMIT_ALL_SITES -ploidy 2

    Thanks!

    Darren

    -------------------------------------------------------
    Code:
    CH1	225	.	T	G	12.71	LowQual	AC=1;AF=0.500;AN=2;BaseQRankSum=1.978;DP=59;Dels=0.03;FS=0.000;HaplotypeScore=10.2840;MLEAC=1;MLEAF=0.500;MQ=70.25;MQ0=8;MQRankSum=-5.349;QD=0.22;ReadPosRankSum=-3.188	GT:AD:DP:GQ:PL	0/1:41,16:55:20:20,0,1435
    CH1	226	.	T	.	121.53	.	AN=2;DP=59;MQ=70.25;MQ0=8	GT:DP	0/0:43
    CH1	227	.	A	.	121.53	.	AN=2;DP=59;MQ=70.25;MQ0=8	GT:DP	0/0:43
    CH1	228	.	T	.	121.53	.	AN=2;DP=59;MQ=70.25;MQ0=8	GT:DP	0/0:43
    CH1	229	.	A	.	115.53	.	AN=2;DP=57;MQ=69.66;MQ0=8	GT:DP	0/0:38
    CH1	230	.	C	.	115.53	.	AN=2;DP=57;MQ=69.66;MQ0=8	GT:DP	0/0:38
    CH1	231	.	T	.	115.53	.	AN=2;DP=57;MQ=69.66;MQ0=8	GT:DP	0/0:36
    CH1	232	.	G	.	115.53	.	AN=2;DP=57;MQ=69.66;MQ0=8	GT:DP	0/0:36
    CH1	233	.	C	.	115.53	.	AN=2;DP=57;MQ=69.66;MQ0=8	GT:DP	0/0:37
    CH1	234	.	A	.	139.53	.	AN=2;DP=70;MQ=59.20;MQ0=14	GT:DP	0/0:63
    CH1	235	.	A	.	175.53	.	AN=2;DP=84;MQ=51.67;MQ0=15	GT:DP	0/0:79
    CH1	236	.	A	.	175.53	.	AN=2;DP=84;MQ=51.67;MQ0=15	GT:DP	0/0:79
    CH1	237	.	T	.	175.53	.	AN=2;DP=85;MQ=51.37;MQ0=16	GT:DP	0/0:80
    CH1	238	.	A	.	175.53	.	AN=2;DP=102;MQ=46.90;MQ0=28	GT:DP	0/0:97
    CH1	238	.	A	AGAAAGAAAGCTTGTA	83.73	.	AC=1;AF=0.500;AN=2;BaseQRankSum=6.172;DP=102;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=46.90;MQ0=0;MQRankSum=-6.190;QD=0.05;ReadPosRankSum=-5.733	GT:AD:DP:GQ:PL	0/1:27,25:57:99:121,0,4853
    CH1	239	.	A	.	175.53	.	AN=2;DP=102;MQ=46.90;MQ0=28	GT:DP	0/0:101
    CH1	240	.	T	.	175.53	.	AN=2;DP=102;MQ=46.90;MQ0=28	GT:DP	0/0:98
    CH1	241	.	A	.	169.53	.	AN=2;DP=108;MQ=44.14;MQ0=29	GT:DP	0/0:107
    CH1	242	.	T	.	169.53	.	AN=2;DP=109;MQ=43.94;MQ0=29	GT:DP	0/0:103
    CH1	242	.	T	.	118.27	.	AN=2;DP=109;MQ=43.94;MQ0=29	GT:AD:DP	0/0:27:55
    CH1	243	.	C	.	172.53	.	AN=2;DP=110;MQ=43.76;MQ0=29	GT:DP	0/0:108
    CH1	243	.	CTTTT	.	118.27	.	AN=2;DP=110;MQ=43.76;MQ0=29	GT:AD:DP	0/0:27:56
    CH1	244	.	T	.	91.53	.	AN=2;DP=110;MQ=43.76;MQ0=29	GT:DP	0/0:61
    CH1	245	.	T	.	91.53	.	AN=2;DP=110;MQ=43.76;MQ0=29	GT:DP	0/0:53
    CH1	246	.	T	.	73.53	.	AN=2;DP=110;MQ=43.76;MQ0=29	GT:DP	0/0:41
    CH1	247	.	T	.	91.53	.	AN=2;DP=110;MQ=43.76;MQ0=29	GT:DP	0/0:46
    CH1	248	.	A	.	172.53	.	AN=2;DP=116;MQ=42.61;MQ0=31	GT:DP	0/0:100
    CH1	249	.	A	.	172.53	.	AN=2;DP=116;MQ=42.61;MQ0=31	GT:DP	0/0:100
    CH1	250	.	T	.	172.53	.	AN=2;DP=117;MQ=42.43;MQ0=32	GT:DP	0/0:101
    CH1	251	.	T	.	169.53	.	AN=2;DP=117;MQ=42.43;MQ0=32	GT:DP	0/0:96
    CH1	251	.	T	.	118.27	.	AN=2;DP=117;MQ=42.43;MQ0=32	GT:AD:DP	0/0:27:56
    CH1	252	.	C	.	172.53	.	AN=2;DP=117;MQ=42.43;MQ0=32	GT:DP	0/0:113
    CH1	253	.	C	.	172.53	.	AN=2;DP=117;MQ=42.43;MQ0=32	GT:DP	0/0:110
    CH1	254	.	T	.	172.53	.	AN=2;DP=117;MQ=42.43;MQ0=32	GT:DP	0/0:111
    CH1	255	.	T	.	172.53	.	AN=2;DP=117;MQ=42.43;MQ0=32	GT:DP	0/0:111
    CH1	256	.	T	.	172.53	.	AN=2;DP=117;MQ=42.43;MQ0=32	GT:DP	0/0:111
    Line 1 is a SNP
    Lines 14 and 15 are an indel that I do understand
    Lines 19 and 20 I do /not/ understand
    Lines 21 and 22 I do /not/ understand
    ---------------------------------------------------

  • #2
    Darren, maybe you should re-post with the lines numbered. You can use the Unix "nl" command to do this.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM
    • seqadmin
      Techniques and Challenges in Conservation Genomics
      by seqadmin



      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

      Avian Conservation
      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
      03-08-2024, 10:41 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 06:37 PM
    0 responses
    10 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, Yesterday, 06:07 PM
    0 responses
    9 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-22-2024, 10:03 AM
    0 responses
    49 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-21-2024, 07:32 AM
    0 responses
    67 views
    0 likes
    Last Post seqadmin  
    Working...
    X