SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to get list of column in vcf file using Vcf.pm? jessada Bioinformatics 0 01-20-2012 07:22 AM
GATK UnifiedGenotyper calling way too many SNPs in vcf swbarnes2 Bioinformatics 0 08-17-2011 01:33 PM
VCF formated bovine SNPs Moo Bioinformatics 3 05-23-2011 05:01 AM
Bovine SNPs in VCF format????? HELP! AKilleen Bioinformatics 1 05-10-2011 01:54 PM
Calling multiple BAM files for SNPs and vcf newbietonextgen Bioinformatics 3 04-19-2011 11:29 AM

Reply
 
Thread Tools
Old 04-05-2011, 01:22 PM   #1
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default Predicting true SNPs from .vcf file

The .vcf file contains so many different measurements, and a couple of different quality scores, does anyone have a good empirical idea of which values are the best guidelines to picking out real SNPs from noise and artifacts? I've done about a hundred sanger sequencing reactions on a variety of predicted SNPs at a variety of quality levels, but the picture still isn't very clear.

For instance these SNPs looked real in the sanger:

GT:PL:GQ 0/1:20,11,149:13 gcacacacacacacacacacacacacacacacacacacacac gcacacacacacacacacacacacacacacacacacacac 211 DP=95 DP4=17,0,10,2 MQ=48 FQ=214

GT:PL:GQ 1/1:30,3,0:39 GAA GA 147 DP=10 DP4=0,0,1,6 MQ=48 FQ=-37.3


These did not confirm with sanger sequencing

GT:PL:GQ 0/1:24,0,114:27 A C 110 DP=130 DP4=14,35,2,22 MQ=45 FQ=113

GT:PL:GQ 0/1:40,0,47:42 T G 98.3 DP=71 DP4=17,1,14,0 MQ=44 FQ=101

Those don't look notably worse than the ones above them, so I'm not sure what I should have looked at to predict that the bottom two were false positives.

(My a priori assumption was that these variants were all real, because I made a multi-vcf with mpileup with this samples and many sibling animals, and these variants were common to all the animals)
swbarnes2 is offline   Reply With Quote
Old 04-06-2011, 03:29 PM   #2
elisadouzi
Member
 
Location: US

Join Date: Mar 2011
Posts: 20
Default

maybe the depth of the bottom two is too high. Have you try the -D options?

Quote:
Originally Posted by swbarnes2 View Post
The .vcf file contains so many different measurements, and a couple of different quality scores, does anyone have a good empirical idea of which values are the best guidelines to picking out real SNPs from noise and artifacts? I've done about a hundred sanger sequencing reactions on a variety of predicted SNPs at a variety of quality levels, but the picture still isn't very clear.

For instance these SNPs looked real in the sanger:

GT:PL:GQ 0/1:20,11,149:13 gcacacacacacacacacacacacacacacacacacacacac gcacacacacacacacacacacacacacacacacacacac 211 DP=95 DP4=17,0,10,2 MQ=48 FQ=214

GT:PL:GQ 1/1:30,3,0:39 GAA GA 147 DP=10 DP4=0,0,1,6 MQ=48 FQ=-37.3


These did not confirm with sanger sequencing

GT:PL:GQ 0/1:24,0,114:27 A C 110 DP=130 DP4=14,35,2,22 MQ=45 FQ=113

GT:PL:GQ 0/1:40,0,47:42 T G 98.3 DP=71 DP4=17,1,14,0 MQ=44 FQ=101

Those don't look notably worse than the ones above them, so I'm not sure what I should have looked at to predict that the bottom two were false positives.

(My a priori assumption was that these variants were all real, because I made a multi-vcf with mpileup with this samples and many sibling animals, and these variants were common to all the animals)
elisadouzi is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:59 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO