SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Understanding the qmap bis-seq format gwilson Epigenetics 3 10-31-2016 12:47 PM
VCF to Circos format(s) bzdyelnik Genomic Resequencing 4 05-27-2014 08:23 AM
problems understanding pileup format pi101 Bioinformatics 2 11-14-2012 02:47 PM
Understanding BAM format. Joker!sAce Genomic Resequencing 7 03-16-2011 06:55 PM

Reply
 
Thread Tools
Old 05-09-2011, 01:53 AM   #21
marcela
Junior Member
 
Location: Sweden

Join Date: Feb 2011
Posts: 7
Question

Hi!

I have the following PL's

REF=A
ALT=C,G

PL=159,39,137,0,6,137

P(D|AA)=10^{-15.9}
P(D|AC)=10^{-.39}
P(D|CC)=10^{-13.7}
P(D|AG)=1
P(D|CG)=10^{-0.06}
P(D|GG)=10^{-13.7}

From where I assumed the genotype would be AG, however, looking at the alignment:
A CCCgCcCCcCCCCCCCccccc

I would think it is AC instead, is the order of the genotypes calculated in a different way?
How do I assign the order for:
REF=G
ALT=T,C,A
PL:236,157,228,235,0,131,138,225,224,232

Thanks!

Last edited by marcela; 05-10-2011 at 12:26 AM.
marcela is offline   Reply With Quote
Old 05-09-2011, 06:20 AM   #22
AKilleen
Guest
 

Posts: n/a
Default Bovine snps in vcf format

Hi Ketan/everyone,

I'm just wondering could anybody point me in the direction of known bovine SNPs in vcf format??
  Reply With Quote
Old 08-23-2011, 09:28 AM   #23
ashrafi_h
Junior Member
 
Location: Davis

Join Date: Jan 2010
Posts: 7
Smile VCF file Allele composition

Hi,

In the old pileup file of pileup command we could calculate or at least see the allele composition of reads at each position. For instance, if ref base is A and the reads are ......,,,,,,.T.... was meaning 18 A and one T in the reads.

How can we get the same information in VCF file? It is useless to have the Depth but not knowing what is what?
ashrafi_h is offline   Reply With Quote
Old 08-23-2011, 09:47 AM   #24
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Quote:
Originally Posted by ashrafi_h View Post
Hi,

In the old pileup file of pileup command we could calculate or at least see the allele composition of reads at each position. For instance, if ref base is A and the reads are ......,,,,,,.T.... was meaning 18 A and one T in the reads.

How can we get the same information in VCF file? It is useless to have the Depth but not knowing what is what?
The DP4 value tells you how many high quality reads, across all samples in the vcf
1) match reference, in the forward direction
2) match reference, in the reverse direction
3) match alternate, in the forward direction
4) match alternate, in the reverse direction

The DP includes all the reads, and the DP4 filters poor quality ones, so the sum of the DP4 can be less than the DP value.
swbarnes2 is offline   Reply With Quote
Old 10-26-2011, 02:36 PM   #25
curious_mapper
Junior Member
 
Location: St Louis

Join Date: Feb 2010
Posts: 4
Default

Quote:
Originally Posted by ashrafi_h View Post
Hi,

In the old pileup file of pileup command we could calculate or at least see the allele composition of reads at each position. For instance, if ref base is A and the reads are ......,,,,,,.T.... was meaning 18 A and one T in the reads.

How can we get the same information in VCF file? It is useless to have the Depth but not knowing what is what?
Hi ashrafi_h

Did you find an answer for your question? I stumbled upon this post looking to understand vcf file in detail and am exactly looking on how to get the allele composition frequency information from the vcf file.
curious_mapper is offline   Reply With Quote
Old 10-26-2011, 10:54 PM   #26
marcela
Junior Member
 
Location: Sweden

Join Date: Feb 2011
Posts: 7
Smile

Hi there!

I guess you can have that info from the BaseCounts or AD:

chr1 724189 . G A 52.24 .

AB=0.500
AC=1
AF=0.50
AN=2
BaseCounts=3,0,3,0
BaseQRankSum=-1.537
DB
DP=6
QD=8.71 . . .

GT:AD: DP:GQ:PL 0/1:3,3:6:82.23:82,0,105

If you don't have this info, you could annotate your SNVs with GATK
marcela is offline   Reply With Quote
Old 10-27-2011, 08:13 AM   #27
curious_mapper
Junior Member
 
Location: St Louis

Join Date: Feb 2010
Posts: 4
Default

Thanks marcela, but my vcf file doesn't seem to have the AD tag information. I called the SNPs using samtools mpileup on the CLC generated alignments. Is that information suppressed somewhere while generating the SNPs?

Here is an example SNP from the vcf file:

BACT_1513|gi|293366021|ref|NZ_GG749271.1| 97966 . C A,G,T 66 . DP=35;VDB=0.0042;AF1=1;AC1=2;DP4=0,0,7,25;MQ=31;FQ=-82 GT:PL:GQ 1/1:182,138,83,107,0,82,125,29,14,107:99
curious_mapper is offline   Reply With Quote
Old 11-15-2011, 10:41 AM   #28
curious_mapper
Junior Member
 
Location: St Louis

Join Date: Feb 2010
Posts: 4
Default

Quote:
Originally Posted by marcela View Post
Hi!

I have the following PL's

REF=A
ALT=C,G

PL=159,39,137,0,6,137

P(D|AA)=10^{-15.9}
P(D|AC)=10^{-.39}
P(D|CC)=10^{-13.7}
P(D|AG)=1
P(D|CG)=10^{-0.06}
P(D|GG)=10^{-13.7}

From where I assumed the genotype would be AG, however, looking at the alignment:
A CCCgCcCCcCCCCCCCccccc

I would think it is AC instead, is the order of the genotypes calculated in a different way?
How do I assign the order for:
REF=G
ALT=T,C,A
PL:236,157,228,235,0,131,138,225,224,232

Thanks!
Hi marcela,

I don't know if you were able to figure this out, but I thought I'd write down the order as an exercise.

GG,GT,TT,GC,TC,CC,GA,TA,CA,AA

Karthik
curious_mapper is offline   Reply With Quote
Old 12-29-2011, 10:24 PM   #29
wanguan2000
Member
 
Location: shanghai

Join Date: Nov 2010
Posts: 24
Default GQ The Genotype Quality calculation

Quote:
Originally Posted by ketan_bnf View Post
chr1 10740313 . A G 188.30 PASS AC=2;AF=1.00;AN=2;DP=11;Dels=0.00;HRun=1;Haplotype Score=6.9635;MQ=26.82;MQ0=0;QD=17.12;SB=-72.04;sumGLbyD=20.12 GT:AD: DP :GQ:PL 1/1:1,10:7:21.05:221,21,0

Here PL is 221,21,0

according to samtools mpileup page

PL means SAMtools/BCFtools writes genotype likelihoods in the PL format which is a comma delimited list of phred-scaled data likelihoods of each possible genotype.

P(D|AA) = 10^(-2.21) = 0.006
P(D|AG) = 10^(-0.21) = 0.617
P(D|GG) = 10^(0) = 1

so does it means genotype is GG for this SNP?

And thanks for AD and DP, now i understood it.

GQ:21.05
PL:221,21,0
you had made a calculation error.

P(D|AA) = 10^(-22.1) = 7.943282e-23
P(D|AG) = 10^(-2.1) = 0.007943282
P(D|GG) = 10^(0) = 1
1 - 1/(1+7.943282e-23+0.007943282) = 0.007880684
GT= -10*log(0.007880684,10) = 21.03436
wanguan2000 is offline   Reply With Quote
Old 07-12-2012, 12:38 AM   #30
aforntacc
Member
 
Location: italy

Join Date: Jun 2011
Posts: 48
Default

Hello every one please i need help i am struggling to understand what to do on my analysis, i have VCF format data on variant call
"#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 110506_SN132_A_s_1_seq 110506_SN132_A_s_2_seq_ 110506_SN132_A_s_3_seq 110506_SN132_A_s_4_seq_ 110616_SN365_A_s_5_seq_ 110616_SN365_A_s_6_seq_
chr1 11433 . T C 11.4 AltSup AC1=12;AF1=1;DP4=0,0,1,1;DP=66;FQ=-26.9;MQ=39;MfGt=0/1;MinDP=0;NeqMfGt=2 GT:PL: DP:SP:GQ 0/1:0,0,0:0:0:3 1/1:29,3,0:1:0:5 1/1:15,3,0:1:0:5 0/1:0,0,0:0:0:3 0/1:0,0,0:0:0:3 0/1:0,0,0:0:0:3
i have a 6 genotype information corresponding to 1-3 wildtype and 4-6 mutant libraries). i have read the vcf documentations but still struggling to understand my data because i want to compare the difference between WT and MT.
thanks
aforntacc is offline   Reply With Quote
Old 07-12-2012, 12:44 AM   #31
laura
Senior Member
 
Location: Cambridge UK

Join Date: Sep 2008
Posts: 151
Default

What aspects of the format are you struggling with

The genotypes are shows as index values on the Ref/Alt columns so in your case T is 0 and C is 1

This gives you your give genotypes for this site being T/C, C/C, C/C, T/C, T/C, T/C
laura is offline   Reply With Quote
Old 07-12-2012, 01:29 AM   #32
aforntacc
Member
 
Location: italy

Join Date: Jun 2011
Posts: 48
Default

Quote:
Originally Posted by laura View Post
What aspects of the format are you struggling with

The genotypes are shows as index values on the Ref/Alt columns so in your case T is 0 and C is 1

This gives you your give genotypes for this site being T/C, C/C, C/C, T/C, T/C, T/C
Ok, thank you very much, at first i was reluctant to analyse this part of the data but when i saw the previous threads on this website i was encouraged so thanks once again.

Please what use are these (AltSup AC1=12;AF1=1;DP4=0,0,1,1;DP=66;FQ=-26.9;MQ=39;MfGt=0/1;MinDP=0;NeqMfGt=2) for my analysis since i am only interested on the SNPs and INDELS that pass the filtering criteria and their differences among my libararies and not the Reference.
secondly the way you interpreted the GT index is it ture for all sites that pass the quality craiteria.
thank you
aforntacc is offline   Reply With Quote
Old 07-12-2012, 01:40 AM   #33
laura
Senior Member
 
Location: Cambridge UK

Join Date: Sep 2008
Posts: 151
Default

Those fields will be determined by what ever analysis package you used to generate your vcf file

Some of them might be standard fields which are all explained in the VCF documentation
http://www.1000genomes.org/wiki/Anal...mat-version-41
laura is offline   Reply With Quote
Old 07-12-2012, 04:12 AM   #34
aforntacc
Member
 
Location: italy

Join Date: Jun 2011
Posts: 48
Default

Quote:
Originally Posted by laura View Post
Those fields will be determined by what ever analysis package you used to generate your vcf file

Some of them might be standard fields which are all explained in the VCF documentation
http://www.1000genomes.org/wiki/Anal...mat-version-41
thanks a lot laura
i am only interested in the difference among the wt and mt, from which i will select candidate regions. i am more than happy if you can point me towards the right direction, this is my very first time handling this kind of data.
thanks
aforntacc is offline   Reply With Quote
Old 07-12-2012, 04:14 AM   #35
laura
Senior Member
 
Location: Cambridge UK

Join Date: Sep 2008
Posts: 151
Default

If you want to know the difference between your WT and MT individuals you need to compare their genotypes
laura is offline   Reply With Quote
Old 07-17-2012, 01:26 AM   #36
aforntacc
Member
 
Location: italy

Join Date: Jun 2011
Posts: 48
Default

Thank you Laura, gradually i am making progress.
please i want to ask but i dont know if this is a stupid question
if i want to uncode the GT index for all SNPs that pass the filter criteria how can i do that?
specifically do i have to do this with the VCF tools (decode genotype) using the PERL5LIB environment or what? am a bit confused please.
thanks a lot
aforntacc is offline   Reply With Quote
Old 07-17-2012, 01:29 AM   #37
laura
Senior Member
 
Location: Cambridge UK

Join Date: Sep 2008
Posts: 151
Default

Unfortunately that is a bit of a how long is a piece of string question as it very much depends on what tools/programming language you wish to use to do it

If you want a vcf file with just PASS snps in it you can use the vcftools binary and its --remove-filtered-geno-all option but if you want other info than that then it depends
laura is offline   Reply With Quote
Old 07-31-2012, 01:53 AM   #38
aforntacc
Member
 
Location: italy

Join Date: Jun 2011
Posts: 48
Default

hi laura
i am still progressing small small
but i have got this error when i want to output the vcf file with passed snps ( --remove-filtered-geno-all)
bilbo@ubuntu:~/vcftools_0.1.4a$ ./cpp/vcftools --vcf /media/My\ Passport/other\ analysis\ by\ fasteris/2012-02-21_GQJ-1-6_VitisVinifera_variants.vcf --remove-filtered-geno-all --out /media/My\ Passport/other\ analysis\ by\ fasteris/lagolas.vcf

VCFtools - v0.1.4
(C) Adam Auton 2009

Parameters as interpreted:
--out /media/My Passport/other analysis by fasteris/lagolas.vcf
--remove-filtered-geno-all
--vcf /media/My Passport/other analysis by fasteris/2012-02-21_GQJ-1-6_VitisVinifera_variants.vcf

Scanning /media/My Passport/other analysis by fasteris/2012-02-21_GQJ-1-6_VitisVinifera_variants.vcf ...
Error:VCF version must be v4.0:
You are using version VCFv4.1

now i am stuck, please what should i do.
thanks
aforntacc is offline   Reply With Quote
Old 07-31-2012, 01:58 AM   #39
laura
Senior Member
 
Location: Cambridge UK

Join Date: Sep 2008
Posts: 151
Default

It looks like you either need to investigate if your problem can be solved with the vcftools perl scripts or maybe change your header from vcf4.1 to vcf4.0 and see what the vcftools binary does

These questions are now most appropriate for the vcftools-help list which you can find

http://vcftools.sourceforge.net/
laura is offline   Reply With Quote
Old 08-02-2012, 08:22 AM   #40
aforntacc
Member
 
Location: italy

Join Date: Jun 2011
Posts: 48
Default

Hi all

i dont have snp id in my data, the ID column is all in dot (.) why is this because i am able to filter out the indels but not the snps
how can i do this.
thanks
#CHROM POS ID REF ALT
chr1 8686 . T C
chr1 10802 . T C
chr1 10815 . A G
chr1 10836 . C A
chr1 11355 . C A
chr1 11433 . T C
chr1 11669 . ATTTT ATTTTT
aforntacc is offline   Reply With Quote
Reply

Tags
vcf

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:45 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO