david.tamborero 08-20-2012 07:07 AM

varscan - filter out those snps close to indels

just a quick question about the somaticFilter provided by Varscan:

--indel-file File of indels for filtering nearby SNPs

What does (exactly) it mean 'nearby'? within the same read? at 'x' bp of distance?


Jane M 08-22-2012 12:36 AM

I don't know but I am also interested in the answer...

dkoboldt 10-22-2012 09:13 AM

Hey guys, thanks for this question and I'm sorry it took me this long to find it and answer! The --indel-filter parameter lets you specify a list of indels (called by VarScan pileup2indel) that will be used for filtering false positive SNPs due to local read mis-alignments due to indels. It will remove SNP calls at or within 1 bp of an indel's position (as reported in mpileup).

I'll see about making this distance a user-defined parameter in the next release.

For future help, please try to post in the VarScan Help forum:

Jane M 10-28-2012 10:43 AM

Ok, so 'nearby' means 1bp, thank you.

vd4mindia 11-19-2014 02:41 AM

Filtering varscan variants
I would like to ask removing the snps closer to indels at 1bp thus removes a lot of snps for me. But it is not a test for false positive right? I believe if am using the local realignment around indel step with with GATK so the mis matches due to indel should not be a reason to work if you used GATK processed bam files for varscan and other standard variant calling tools. I am having typical normal/tumor sequenced at 70X for which I am calling variants with the varscan and if I do the somaticfilter with the sample.snp and sample.indel I lose a lot of SNPs. I get around 200 variants for my sample which I was thinking to be good numbers but then on annotating I miss out most on the exons. Also when I compare this results with mutect I do not get most of the mutations I receive with Mutect. So I ran again the VarScan with below command


samtools mpileup -f /scratch/GT/vdas/test_exome/exome/hg19.fa -q 1 -B /scratch/GT/vdas/pietro/exome_seq/results/N_S8980/N_S8980.realigned.recal.bam /scratch/GT/vdas/pietro/exome_seq/results/T_S7998/T_S7998.realigned.recal.bam | java -Xmx14G -jar /scratch/GT/softwares/VarScan.v2.3.6.jar somatic - /scratch/GT/vdas/pietro/exome_seq/results/varscan_out_17112014/S_313_T_soma_vcf.output --output-vcf 1 --mpileup 1 --min-var-freq 0.05 --min-coverage-normal 10 --min-coverage-tumor 8 --p-value 0.05
Now am getting the sample.snps.vcf with over 11k variants. I am thinking of not using somaticfilter, rather use process somatic to have the high confidence snps and then use script to extract most confident ones. How does this sound?

