![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Ideas on collecting quality scores per base in an illumina fastq file | brachysclereid | Bioinformatics | 11 | 12-05-2011 02:00 PM |
Need ideas about the advantages of Blast2Go | byou678 | Bioinformatics | 5 | 08-27-2011 11:47 AM |
no enrichment of beads --> any ideas why? | lichtfaengerin | 454 Pyrosequencing | 1 | 08-11-2011 03:45 AM |
DGE - filter or not filter | masterpiece | Bioinformatics | 0 | 07-11-2011 09:55 PM |
samtools mpileup filter SNPs | Hit | Bioinformatics | 3 | 05-25-2011 05:55 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: 45°30'25.22"N / 9°15'53.00"E Join Date: Apr 2009
Posts: 258
|
![]()
Hi all, I have a VCF file which contains a raw list of mutations/snps for my study. I would like to exclude known SNPs from dbSNP131/hg19 (which I also have in VCF format).
I was thinking about BEDTools, something like Code:
intersectBed -a MyList.vcf -b hg19.snp131.vcf.gz -v > specific.vcf Does anybody have an idea/processing pipeline to deal with this? I was looking at vcftools but couldn't find anything helpful d |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: 45°30'25.22"N / 9°15'53.00"E Join Date: Apr 2009
Posts: 258
|
![]()
Nevermind, I've realized that I can feed GATK Unified Genotyper with known SNPs and filter them out in a second step
d |
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: USA Join Date: Jan 2008
Posts: 482
|
![]()
that sounds like a useful thing to do. but looking at the GATK Unified genotyper, it seems more of a multiple samples tool than one to exclude known dbSNP variants, etc.
Did I miss something here |
![]() |
![]() |
![]() |
#4 | |
Senior Member
Location: 45°30'25.22"N / 9°15'53.00"E Join Date: Apr 2009
Posts: 258
|
![]() Quote:
I could use it to identify variants *and* to filter out known ones. d |
|
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: Los Angeles, China. Join Date: Feb 2010
Posts: 106
|
![]()
I have a script you can mod to remove SNPs in dbSNP. Just holla' if you need/want it. Just tweak it to perform on a vcf of bed file.
|
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: USA Join Date: Jan 2008
Posts: 482
|
![]()
@dawe,
It would be great if you could share the gatk functionality to do so. Better use use a standard and available tool than re-invent |
![]() |
![]() |
![]() |
#7 | |
Senior Member
Location: Palo Alto Join Date: Apr 2009
Posts: 213
|
![]()
@bioinfosm: Just use the -D parameter with a dbSNP .rod file. See the GATK wiki for more about how that works (it's part of their default variant calling flow).
Quote:
Use the --exclude parameter and feed it a list of all the SNP IDs you used to mark your VCF when you ran it through the Unified Genotyper. Since you used GATK, you can make that list from the SNP .rod pretty easily: Code:
awk '{print $5}' dbsnp_130.rod > dbsnp_130_snpIDs.txt Code:
vcftools --exclude dbsnp_130_snpIDs.txt --vcf <in.vcf> --out <out.prefix> Not bad, and doesn't take too long to run. Of course, you can pretty easily grep out the lines that have an ID to accomplish the same thing almost instantaneously, but still. ![]()
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog] Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post] Projects: U87MG whole genome sequence [Website] [Paper] |
|
![]() |
![]() |
![]() |
#8 | |
Junior Member
Location: Israel Join Date: Nov 2011
Posts: 8
|
![]() Quote:
I could not find how to perform this second step. I am trying to filter out some known SNPs from dbSNP135. while I have both my variation file and the dbSNP file (both .vcf) I can't find a way to exclude the latter from the former. |
|
![]() |
![]() |
![]() |
#9 |
Junior Member
Location: Europe Join Date: Oct 2011
Posts: 6
|
![]()
Hi,
vcf-isec is exactly what you are looking for: intersections, complements etc. on VCF and TAB delimited files. |
![]() |
![]() |
![]() |
#10 |
Member
Location: italy Join Date: Jun 2011
Posts: 48
|
![]()
Hello guys,
please i cant see SNP ID in my data the ID column is represented with dot (.) and now i am trying to filter out snp from the indel. samtools was used for the calling. how can i do this. thanks chr1 8686 . T C 38.7 MfGtMis;AltSup AC1=12;AF1=1;DP4=0,0,1,5;DP=6;FQ=-28.6;MQ=16;MfGt=1/1;MinDP=0;NeqMfGt=0 GT:PL : DP:SP:GQ 1/1:0,0,0:0:0:5 1/1:40,9,0:3:0:13 1/1:0,0,0:0:0:5 1/1:0,0,0:0:0:5 1/1:34,9,0:3:0:13 1/1:0,0,0:0:0:5 chr1 10802 . T C,A 999 MfGtMis AC1=12;AF1=1;DP4=0,0,5,17;DP=284;FQ=-38.1;MQ=33;MfGt=1/1;MinDP=2;NeqMfGt=0 GT:PL : DP:SP:GQ 1/1:91,15,0,91,15,91:5:0:31 1/1:131,18,0,131,18,131:6:0:34 1/1:53,6,0,53,6,53:2:0:22 1/1:44,6,0,44,6,44:2:0:22 1/1:67,9,0,67,9,67:3:0:25 1/1:70,21,12,55,0,52:4:0:25 chr1 10815 . A G 999 MfGtMis AC1=12;AF1=1;DP4=0,0,26,11;DP=315;FQ=-42.4;MQ=38;MfGt=1/1;MinDP=3;NeqMfGt=0 GT:PL : DP:SP:GQ 1/1:109,18,0:6:0:38 1/1:188,39,0:13:0:59 1/1:120,15,0:5:0:35 1/1:69,9,0:3:0:29 1/1:43,9,0:3:0:29 1/1:89,21,0:7:0:41 chr1 10836 . C A 999 MfGtMis AC1=11;AF1=0.9836;DP4=2,0,32,5;DP=313;FQ=-28.4;MQ=33;MfGt=1/1;MinDP=1;NeqMfGt=0;PV4=1,4.1e-10,1,1 GT:PL : DP:SP:GQ 1/1:49,12,0:4:0:21 1/1:15,3,0:1:0:12 1/1:90,23,0:11:0:32 1/1:83,0,8:7:0:3 1/1:130,39,0:13:0:48 1/1:56,9,0:3:0:18 |
![]() |
![]() |
![]() |
Tags |
novel variants, snp, variants, vcf, vcftools |
Thread Tools | |
|
|