SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Understanding the qmap bis-seq format gwilson Epigenetics 3 10-31-2016 12:47 PM
VCF to Circos format(s) bzdyelnik Genomic Resequencing 4 05-27-2014 08:23 AM
problems understanding pileup format pi101 Bioinformatics 2 11-14-2012 02:47 PM
Understanding BAM format. Joker!sAce Genomic Resequencing 7 03-16-2011 06:55 PM

Reply
 
Thread Tools
Old 08-03-2012, 01:23 PM   #41
laura
Senior Member
 
Location: Cambridge UK

Join Date: Sep 2008
Posts: 151
Default

The missing ids are unlikely to be the cause of this

You need to provide specific details about the filtering methods you are choosing if people are going to help figure out what is wrong
laura is offline   Reply With Quote
Old 08-15-2012, 11:26 PM   #42
aforntacc
Member
 
Location: italy

Join Date: Jun 2011
Posts: 48
Default

Hello

Please i want to find out how i can interpret the indels, i have some thing like this chr3 4466963 . TGGAG TGGAGGAG 999 PASS AC1=5;AF1=0.4167;DP4=7,197,3,84;DP=347;FQ=999;G3=0.1667,0.8333,8.319e-50;HWE=0.0465;INDEL;MQ=44;MfGt=0/1;MinDP=28;NeqMfGt=1;PV4=1,1.5e-70,2e-112,1 GT:PL: DP:SP:GQ 0/1:93,0,255:33:0:95 0/0:0,122,255:59:0:99 0/1:53,0,241:42:0:55 0/1:139,0,250:59:7:99 0/1:78,0,255:70:3:80 0/1:62,0,238:28:0:64
so will it be like this TGGAG = 0 and TGGAGGAG = 1 as the same for the snps. or is it interpreted differently?
secondly how can i use the vcftols to compare two individuals (i have seperated them into two vcf files) with 3 libraries WT (0/1:93,0,255:33:0:95 0/0:0,122,255:59:0:99 0/1:53,0,241:42:0:55) and
MT (0/1:139,0,250:59:7:99 0/1:78,0,255:70:3:80 0/1:62,0,238:28:0:64).
thanks a lot
aforntacc is offline   Reply With Quote
Old 08-15-2012, 11:28 PM   #43
aforntacc
Member
 
Location: italy

Join Date: Jun 2011
Posts: 48
Default

Please i want to find out how i can interpret the indels, i have some thing like this chr3 4466963 . TGGAG TGGAGGAG 999 PASS AC1=5;AF1=0.4167;DP4=7,197,3,84;DP=347;FQ=999;G3=0.1667,0.8333,8.319e-50;HWE=0.0465;INDEL;MQ=44;MfGt=0/1;MinDP=28;NeqMfGt=1;PV4=1,1.5e-70,2e-112,1 GT:PL: DP:SP:GQ 0/1:93,0,255:33:0:95 0/0:0,122,255:59:0:99 0/1:53,0,241:42:0:55 0/1:139,0,250:59:7:99 0/1:78,0,255:70:3:80 0/1:62,0,238:28:0:64
so will it be like this TGGAG = 0 and TGGAGGAG = 1 as the same for the snps. or is it interpreted differently?
secondly how can i use the vcftols to compare two individuals (i have seperated them into two vcf files) with 3 libraries WT (0/1:93,0,255:33:0:95 0/0:0,122,255:59:0:99 0/1:53,0,241:42:0:55) and
MT (0/1:139,0,250:59:7:99 0/1:78,0,255:70:3:80 0/1:62,0,238:28:0:64).
thanks a lot[/QUOTE]
aforntacc is offline   Reply With Quote
Old 08-16-2012, 12:51 AM   #44
laura
Senior Member
 
Location: Cambridge UK

Join Date: Sep 2008
Posts: 151
Default

You may find it more fruitful to start another post with this question with a more specific title

How do you interpret a particular indel is really a how long is a piece of string question, what are you trying to find out, what hypothesis are you testing and these will determine how you might look at your result

vcf tools has various scripts to produce intersections and compare two different vcf files, which tools are appropriate again very much depends on what questions you want to answer

I would think about the specific questions you wish to answer and come back with a new thread and more information and you may get more help that way
laura is offline   Reply With Quote
Old 08-16-2012, 04:06 AM   #45
aforntacc
Member
 
Location: italy

Join Date: Jun 2011
Posts: 48
Default

OK, i have opened a new thread,
i have only one objective to compare wt (0/1:93,0,255:33:0:95, 0/0:0,122,255:59:0:99 0/1:53,0,241:42:0:55) and mutant (0/1:139,0,250:59:7:99, 0/1:78,0,255:70:3:80 0/1:62,0,238:28:0:64).
to find what is common indels and unique ones
and i kindly ask please do i interpret the indels as the way the snps are interpreted. as in ref =0 alt =1. because i tried to read stuff about indels i cant find anything useful so i turn to this site for the much need help.
thanks a lot for your help i am grateful
aforntacc is offline   Reply With Quote
Old 07-10-2013, 01:37 PM   #46
daisieh
Junior Member
 
Location: Santa Barbara, CA

Join Date: Jan 2011
Posts: 3
Default

I think this is relevant to this thread, which is why I'm reawakening it:

According to 1000genomes's VCF 4.1 spec, the ordering of genotypes is given by this:

Quote:
If A is the allele in REF and B,C,... are the alleles as ordered in ALT, the ordering of genotypes for the likelihoods is given by: F(j/k) = (k*(k+1)/2)+j. In other words, for biallelic sites the ordering is: AA,AB,BB; for triallelic sites the ordering is: AA,AB,BB,AC,BC,CC, etc.
Just in case anyone else is desperately googling for the answer to how to order genotypes for bi/triallelic alternate alleles in a vcf file!
daisieh is offline   Reply With Quote
Old 05-17-2014, 08:14 AM   #47
vyellapa
Member
 
Location: phoenix

Join Date: Oct 2011
Posts: 59
Default

Would anyone know why Samtools does not give GT and GQ for some calls. Example:3 52720080 in vcf snippet below.

3 33434831 . G A 14.2 . DP=135;VDB=1.109343e-02;RPB=1.478724e+00;AF1=0.5;AC1=1;DP4=49,47,18,19;MQ=20;FQ=17.1;PV4=0.85,1,0.065,1 GT:PL:GQ 0/1:44,0,113:47
3 42251263 . C T 133 . DP=21;VDB=1.850543e-01;RPB=-1.073440e+00;AF1=1;AC1=2;DP4=0,1,12,8;MQ=20;FQ=-70;PV4=0.43,0.27,1,1 GT:PL:GQ 1/1:166,43,0:83
3 42787469 . A G 9.31 . DP=2;VDB=6.720000e-02;AF1=1;AC1=2;DP4=0,0,1,1;MQ=20;FQ=-33 GT:PL:GQ 1/1:40,6,0:8
3 52720080 . A . 48.9 . DP=67;VDB=1.892600e-02;RPB=-4.838016e-01;AF1=0;AC1=0;DP4=52,0,13,0;MQ=20;FQ=-46;PV4=1,1,1,0.34 PL 0
3 101576175 . T C 171 . DP=100;VDB=1.334901e-01;AF1=1;AC1=2;DP4=0,0,59,40;MQ=20;FQ=-282 GT:PL:GQ 1/1:204,255,0:99
vyellapa is offline   Reply With Quote
Old 05-24-2014, 02:01 PM   #48
laura
Senior Member
 
Location: Cambridge UK

Join Date: Sep 2008
Posts: 151
Default

vyellapa

This is relevant to samtools and not to the VCF spec. I would recommend creating a new question about this or emailing the samtools help list you will get a better response this way
laura is offline   Reply With Quote
Old 07-09-2014, 08:24 PM   #49
drdna
Member
 
Location: Kentucky

Join Date: May 2012
Posts: 73
Default

There are a number of quirks of VCF files that I do not understand. I would appreciate it if someone could explain them to me:

1) VCF reports multiple alternate alleles but provides detailed information for only one of them. As an example, view the following three lines from a samtools mpileup run:

supercont8.1 889760 . T G,C 222 . DP=17;VDB=1.071396e-01;AF1=1;AC1=2;DP4=0,0,13,4;MQ=39;FQ=-75 GT:PL:GQ 1/1:255,48,0,255,37,252:93
supercont8.1 893978 . C T,G 174 . DP=37;VDB=2.882637e-01;AF1=1;AC1=2;DP4=0,0,12,24;MQ=21;FQ=-132 GT:PL:GQ 1/1:207,105,0,207,89,204:99
supercont8.1 905324 . T C,G 213 . DP=44;VDB=2.155955e-01;AF1=1;AC1=2;DP4=0,0,20,16;MQ=27;FQ=-132 GT:PL:GQ 1/1:246,105,0,248,92,245:99

First, the MLE for the first alt allele count is 2. However, if we are only counting the FIRST alt allele, this value should always be 1 (you can only have one FIRST alt allele).

If we disregard this semantic error and assume that there are two alt alleles, then why is the genotype listed as 1/1. After all, in each entry, the data presented thus far suggest two alt alleles. In this case, the genotypes should be 1/2. The list of Phred-scaled genotype likelihoods (PL values) imply low confidence for one the "2" alt allele. Problem, is we are provided no information on the frequency/quality of alt allele #2, so we cannot independently evaluate the mpileup call.

2) Consider the following output line:

supercont8.4 3182995 . AG AACG,AGCCCAACG 30.9 . INDEL;IDV=11;IMF=0.366667;DP=30;VDB=3.034881e-04;AF1=0.8294;AC1=1;DP4=1,0,8,1;MQ=36;FQ=-33.5;PV4=1,1,0.22,0.13 GT:PL:GQ 0/1:120,53,54,67,0,71:4

Here, two alt alleles are reported, however the MLE for first alt allele count is only 1. Presumably mpileup did not like one of the alt allele calls but the vcf report provides no way to find out why.

Is there a way to make mpileup report frequencies/qualities for all alt alleles?

3) Finally, when two alternate alleles are found, I am assuming the list of Phred-scaled genotype likelihoods is provided in the following order: 0/0, 0/1, 0/2, 1/1, 1/2, 2/2. Is this a reasonable assumption?
drdna is offline   Reply With Quote
Reply

Tags
vcf

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:32 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO