SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to filter out SNPs with Minor allele frequency less than 5% in VCF file jdpr_100 Bioinformatics 1 10-01-2014 02:14 PM
Filter variants by DP4 (total)? Genomics101 Bioinformatics 2 09-23-2013 09:46 AM
what is indelFS and indelQD in the filter column of VCF file seraphin De novo discovery 0 07-05-2013 07:46 AM
How and what to use to filter out any known SNPs (with rs#) from the vcf file? BhariD Bioinformatics 0 05-16-2013 01:55 PM
VCF file - no ALT but DP4 alternative exist Rachelly Bioinformatics 2 04-04-2012 02:12 AM

Reply
 
Thread Tools
Old 03-02-2015, 04:14 AM   #1
clarissaboschi
Member
 
Location: US

Join Date: Apr 2010
Posts: 63
Default vcf file - filter based on DP4 (not total) and removal alternative alleles

Hi all,

I'm using samtools/mpileup to find variants in my resequencing data of multiple samples together. I and got one vcf file with SNPs and Indels.
I am applying many different filtrations but I would like to filter based on strand bias, for example

Chr1_66769 G A 135 PASS DP=259;VDB=9.53273e-06;SGB=-37.5721;RPB=0.988496;MQB=2.5526e-0
6;MQSB=0.0499642;BQB=0.235831;MQ0F=0;ICB=0.0671329;HOB=0.0246914;AC=6;AN=54;DP4=87,156,0,10 ...

I would like to remove variants with 0 reads in the forward or reverse strand for the alternative allele. In other words variant supported by both forward and reverse strands.

I did not find any command to do it, only manual. Any suggestion?

Other question is how to remove from vcf file the alternative alleles, I do not want this info in my filtered file.
Example

Chr11_6676982 C G,A .....

I would like to have
Chr11_6676982 C G .....

Thanks very much
Clarissa

Last edited by clarissaboschi; 03-17-2015 at 08:41 AM. Reason: grammar
clarissaboschi is offline   Reply With Quote
Old 03-02-2015, 04:38 AM   #2
sarvidsson
Senior Member
 
Location: Berlin, Germany

Join Date: Jan 2015
Posts: 137
Default

Try vcf-annotate in VCFtools http://vcftools.sourceforge.net/perl...l#vcf-annotate - look under "read even more" to get examples on how to create custom filters for DP4.

Last edited by sarvidsson; 03-02-2015 at 04:39 AM. Reason: grammar
sarvidsson is offline   Reply With Quote
Old 03-02-2015, 10:19 AM   #3
clarissaboschi
Member
 
Location: US

Join Date: Apr 2010
Posts: 63
Default

Thanks sarvidsson

I saw this option. In my case I think the one that I need is:

#Only loci with enough reads supporting the variant will pass the filter
{
tag => 'INFO/DP4',
name => 'FewAlts',
desc => 'Too few reads supporting the variant',
apply_to => 'SNPs',
test => sub {
if ( !($MATCH =~ /^([^,]+),([^,]+),([^,]+),(.+)$/) )
{
error("Could not parse INFO/DP4: $CHROM:$POS [$MATCH]");
}
if ( 0.1*($1+$2) > $3+$4 ) { return $PASS; }
return $FAIL;
},
},

But I tried to edit this script to my need and did not worked... The way that the script is, is not what I want
clarissaboschi is offline   Reply With Quote
Old 03-02-2015, 02:12 PM   #4
lindenb
Senior Member
 
Location: France

Join Date: Apr 2010
Posts: 143
Default

using my tool vcffilterjs: https://github.com/lindenb/jvarkit/wiki/VCFFilterJS

Code:
curl -Ls "https://raw.githubusercontent.com/cs418-familysnps/familysnps/3ab8d34c7a7afd7336a40c8f79a32787dcb01389/SNPPhasingProject/vcf/test/bcftools.vcf" |\
java -jar dist-1.128/vcffilterjs.jar -e 'function accept(v) { if(!v.hasAttribute("DP4")) return false; var tokens=v.getAttribute("DP4"); return tokens.get(2)>0 && tokens.get(3)>0;} accept(variant);'
output:

Code:
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	sample
MT	321	.	T	.	283	.	AC1=0;AF1=0;DP=289;DP4=270,3,1,1;FQ=-282;MQ=56;PV4=0.029,0.00028,0.12,0.0089;VDB=0.0302	PL	0
MT	344	.	T	.	283	.	AC1=0;AF1=0;DP=392;DP4=319,49,1,1;FQ=-282;MQ=55;PV4=0.25,6.8e-05,0.16,1;VDB=0.0394	PL	0
MT	354	.	C	.	283	.	AC1=0;AF1=0;DP=481;DP4=328,104,1,1;FQ=-282;MQ=53;PV4=0.43,0.00091,0.012,1;VDB=0.0447	PL	0
MT	360	.	A	.	283	.	AC1=0;AF1=0;DP=515;DP4=355,125,1,1;FQ=-282;MQ=52;PV4=0.45,2.4e-06,0.29,0.0068;VDB=0.0090	PL	0
MT	366	.	G	.	283	.	AC1=0;AF1=0;DP=553;DP4=385,146,1,2;FQ=-282;MQ=52;PV4=0.19,5.9e-09,0.12,1;VDB=0.0429	PL	0
MT	375	.	C	.	283	.	AC1=0;AF1=0;DP=620;DP4=406,186,7,4;FQ=-282;MQ=51;PV4=0.75,1,0.49,1;VDB=0.0394	PL	0
MT	405	.	T	.	283	.	AC1=0;AF1=0;DP=766;DP4=479,272,1,1;FQ=-282;MQ=51;PV4=1,2.2e-21,1,0.19;VDB=0.0227	PL	0
MT	416	.	T	.	283	.	AC1=0;AF1=0;DP=788;DP4=473,306,1,1;FQ=-282;MQ=52;PV4=1,5.1e-12,0.31,1;VDB=0.0205	PL	0
MT	434	.	C	.	283	.	AC1=0;AF1=0;DP=779;DP4=442,285,3,1;FQ=-282;MQ=53;PV4=1,8.7e-16,1,0.0067;VDB=0.0367	PL	0
MT	450	.	T	.	283	.	AC1=0;AF1=0;DP=758;DP4=446,248,1,1;FQ=-282;MQ=56;PV4=1,0.00099,0.031,0.3;VDB=0.0242	PL	0
lindenb is offline   Reply With Quote
Old 03-03-2015, 03:15 AM   #5
clarissaboschi
Member
 
Location: US

Join Date: Apr 2010
Posts: 63
Default

Thanks lindenb, I will try your script

Your script does maintain the header of the vcf? Because I need.

Actually I want the alternative allele (SNP allele), but sometimes there are different alleles for it, and the first one is the more frequent, so I would like to have only the more frequent alt allele, for example

Chr11_667698 ....C G,A .....
I would like to have
Chr11_667698 ....C G .....

thanks
Clarissa
clarissaboschi is offline   Reply With Quote
Reply

Tags
vcf dp4 filter

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:52 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO