SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Questions about VarScan alexbmp Bioinformatics 7 02-20-2014 10:05 AM
mpileup-varscan bioman1 Bioinformatics 5 06-26-2012 11:03 AM
One Question about Varscan LiLin Bioinformatics 5 06-14-2012 10:38 PM
Varscan and mpileup aunderwo Bioinformatics 2 11-05-2011 03:43 AM
error from VarScan shuang Bioinformatics 1 10-13-2011 07:35 AM

Reply
 
Thread Tools
Old 08-15-2012, 11:31 AM   #1
stqa8350
Member
 
Location: cambridge, ma

Join Date: Apr 2011
Posts: 13
Default varscan somaticFilterResults

Hi

While using Varscan (V.2.3.1); I have used default options but get different results (for the same dataset) using indel.vcf filters (on snp.vcf) and simply using indelvcf filter (on snp dataset). In theory, this is just a file format variation and the stats should remain the same. However I see a difference due to p-value.

time java -jar ~/tools/VarScan.v2.3.1.jar somaticFilter 65_varscan-out.snp -min-var-freq 0.5 --indel-file 65_varscan-out.indel --output-file 65-filtered
Window size: 10
Window SNPs: 3
Indel margin: 3
Reading input from 65_varscan-out.snp
2962 cluster SNPs identified
Reading input from 65_varscan-out.snp
88168 variants in input stream
13612 failed to meet coverage requirement
5579 failed to meet reads2 requirement
24230 failed to meet varfreq requirement
40748 failed to meet p-value requirement
45 in SNP clusters were removed
1 were removed near indels
3953 passed filters

real 0m3.532s
user 0m2.268s
sys 0m0.188s

time java -jar ~/tools/VarScan.v2.3.1.jar somaticFilter 65_varscan-out.snp.vcf -min-var-freq 0.5 --indel-file 65_varscan-out.indel.vcf --output-file 65-filtered.vcf
Window size: 10
Window SNPs: 3
Indel margin: 3
Reading input from 65_varscan-out.snp.vcf
2972 cluster SNPs identified
Reading input from 65_varscan-out.snp.vcf
88177 variants in input stream
13615 failed to meet coverage requirement
5583 failed to meet reads2 requirement
24231 failed to meet varfreq requirement
2862 failed to meet p-value requirement
367 in SNP clusters were removed
39 were removed near indels
41480 passed filters

real 0m1.506s
user 0m2.504s
sys 0m0.272s

Any particular reason for these differences ? Please note that on a quick comparison between *.vcf files and its corresponding snp and indel files, there are no differences when compared by its chr and position.

Many Thanks
stqa8350 is offline   Reply With Quote
Old 08-16-2012, 10:17 AM   #2
dkoboldt
Member
 
Location: St. Louis

Join Date: Mar 2009
Posts: 62
Default

That's a curious result, and it could reflect an error in the new VCF parsing code. Would you be able to send me your 4 files (SNP and indel, original and VCF) or at least the first 1,000 lines or so? Send it to dkoboldt (at) genome [dot] wustl (dot) edu.

Thanks,
Dan Koboldt
dkoboldt is offline   Reply With Quote
Old 08-16-2012, 11:54 AM   #3
stqa8350
Member
 
Location: cambridge, ma

Join Date: Apr 2011
Posts: 13
Default

The difference in results (native output Vs VCF output) occurs after the somaticFilter step. In general I use something like the following args -

java -jar ~/tools/VarScan.v2.3.1.jar somaticFilter ./varscan-out/65_varscan13output.snps --min-var-freq 0.5 --indel-file ./varscan-out/65_varscan13output.indel --output ./varscan-out/65_varscan13output.snps.filtered

java -jar ~/tools/VarScan.v2.3.1.jar somaticFilter ./varscan-out/65_varscan13output.snps --min-var-freq 0.5 --indel-file ./varscan-out/65_varscan13output.indel.vcf --output-vcf > ./varscan-out/65_varscan13output.snps.filtered.vcf

On the somatic results (java -jar ~/tools/VarScan.v2.3.1.jar somatic); the native and vcf results are about the same row lines, so I reckon the difference might not occur therein.

Also if it was a parsing error and the above somaticFilter commanline arg is correct, then I would assume native-output should atleast be a subset of vcf-output, but instead I get uniques.

I can send you 1000 lines of the somaticFilter output.

Thanks

Uma
stqa8350 is offline   Reply With Quote
Old 08-16-2012, 11:55 AM   #4
stqa8350
Member
 
Location: cambridge, ma

Join Date: Apr 2011
Posts: 13
Default

Just to be sure, I can do an intersect given by their chrom, position between the native and vcf results (after just somatic command line arg).

Uma
stqa8350 is offline   Reply With Quote
Old 08-16-2012, 12:11 PM   #5
stqa8350
Member
 
Location: cambridge, ma

Join Date: Apr 2011
Posts: 13
Default

Hi Dan,

There is a difference in results and sent you sample data.

Many thanks for your help

Uma
stqa8350 is offline   Reply With Quote
Old 08-17-2012, 09:14 AM   #6
dkoboldt
Member
 
Location: St. Louis

Join Date: Mar 2009
Posts: 62
Default

Uma,

Thank you for providing the files, which helped me track down the issue. As I'd suspected, the entries in the VCF file for SNVs were slightly more numerous than the native SNV output file.

This is because we output "indelError" calls (positions where normal shows a SNV but tumor shows an indel or vice-versa) to the VCF for the sake of completeness. Their filter status is "indelError" to indicate that these are likely artifactual calls. We don't output them to the native output format for that reason.

If I remove "indelError" positions from your SNP VCF and then apply somaticFilter, the results are identical to running it on the native output files. That being said, you might wish to use the filtering results from the unmodified VCF, because SNVs clustering around "indelError" calls should probably be removed. After that, any "indelError" calls that passed somaticFilter can be removed using grep.

Thank you for your help on this!

Yours,

Dan Koboldt
dkoboldt is offline   Reply With Quote
Old 11-28-2013, 06:05 AM   #7
franka
Junior Member
 
Location: Italy

Join Date: Oct 2013
Posts: 1
Default

Hi,
I'm using the latest version of varScan
Does the somatic filter support output in .vcf format?
I used the indel and snp .vcf files obtained with somatic command line:
java -Xmx6g -jar VarScan.v2.3.6.jar somaticFilter 34-C-S.varScan.output.snp.vcf --min-strands2 2 --min-avg-qual 25 --min-var-freq 0.3 --p-value 0.05 --min-strands2 2 --min-reads2 3 --indel-file 34-C-S.varScan.output.indel.vcf --output-vcf 1 34-C-S.vcf

The software starts:
Reading input from /Users/mac2/Documents/trasferimento/Napoli_2013/bam_bgi_pileup/34-C-S.varScan.output.snp.vcf
1927 cluster SNPs identified
Reading input from /Users/mac2/Documents/trasferimento/Napoli_2013/bam_bgi_pileup/34-C-S.varScan.output.snp.vcf
97671 variants in input stream
395 failed to meet coverage requirement
969 failed to meet reads2 requirement
5504 failed to meet varfreq requirement
3510 failed to meet p-value requirement
1340 in SNP clusters were removed
329 were removed near indels
85624 passed filters

but no output file has been created
Many thanks,
Francesco
franka is offline   Reply With Quote
Reply

Tags
varscan

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:15 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO