Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • varscan somaticFilterResults

    Hi

    While using Varscan (V.2.3.1); I have used default options but get different results (for the same dataset) using indel.vcf filters (on snp.vcf) and simply using indelvcf filter (on snp dataset). In theory, this is just a file format variation and the stats should remain the same. However I see a difference due to p-value.

    time java -jar ~/tools/VarScan.v2.3.1.jar somaticFilter 65_varscan-out.snp -min-var-freq 0.5 --indel-file 65_varscan-out.indel --output-file 65-filtered
    Window size: 10
    Window SNPs: 3
    Indel margin: 3
    Reading input from 65_varscan-out.snp
    2962 cluster SNPs identified
    Reading input from 65_varscan-out.snp
    88168 variants in input stream
    13612 failed to meet coverage requirement
    5579 failed to meet reads2 requirement
    24230 failed to meet varfreq requirement
    40748 failed to meet p-value requirement
    45 in SNP clusters were removed
    1 were removed near indels
    3953 passed filters

    real 0m3.532s
    user 0m2.268s
    sys 0m0.188s

    time java -jar ~/tools/VarScan.v2.3.1.jar somaticFilter 65_varscan-out.snp.vcf -min-var-freq 0.5 --indel-file 65_varscan-out.indel.vcf --output-file 65-filtered.vcf
    Window size: 10
    Window SNPs: 3
    Indel margin: 3
    Reading input from 65_varscan-out.snp.vcf
    2972 cluster SNPs identified
    Reading input from 65_varscan-out.snp.vcf
    88177 variants in input stream
    13615 failed to meet coverage requirement
    5583 failed to meet reads2 requirement
    24231 failed to meet varfreq requirement
    2862 failed to meet p-value requirement
    367 in SNP clusters were removed
    39 were removed near indels
    41480 passed filters

    real 0m1.506s
    user 0m2.504s
    sys 0m0.272s

    Any particular reason for these differences ? Please note that on a quick comparison between *.vcf files and its corresponding snp and indel files, there are no differences when compared by its chr and position.

    Many Thanks

  • #2
    That's a curious result, and it could reflect an error in the new VCF parsing code. Would you be able to send me your 4 files (SNP and indel, original and VCF) or at least the first 1,000 lines or so? Send it to dkoboldt (at) genome [dot] wustl (dot) edu.

    Thanks,
    Dan Koboldt

    Comment


    • #3
      The difference in results (native output Vs VCF output) occurs after the somaticFilter step. In general I use something like the following args -

      java -jar ~/tools/VarScan.v2.3.1.jar somaticFilter ./varscan-out/65_varscan13output.snps --min-var-freq 0.5 --indel-file ./varscan-out/65_varscan13output.indel --output ./varscan-out/65_varscan13output.snps.filtered

      java -jar ~/tools/VarScan.v2.3.1.jar somaticFilter ./varscan-out/65_varscan13output.snps --min-var-freq 0.5 --indel-file ./varscan-out/65_varscan13output.indel.vcf --output-vcf > ./varscan-out/65_varscan13output.snps.filtered.vcf

      On the somatic results (java -jar ~/tools/VarScan.v2.3.1.jar somatic); the native and vcf results are about the same row lines, so I reckon the difference might not occur therein.

      Also if it was a parsing error and the above somaticFilter commanline arg is correct, then I would assume native-output should atleast be a subset of vcf-output, but instead I get uniques.

      I can send you 1000 lines of the somaticFilter output.

      Thanks

      Uma

      Comment


      • #4
        Just to be sure, I can do an intersect given by their chrom, position between the native and vcf results (after just somatic command line arg).

        Uma

        Comment


        • #5
          Hi Dan,

          There is a difference in results and sent you sample data.

          Many thanks for your help

          Uma

          Comment


          • #6
            Uma,

            Thank you for providing the files, which helped me track down the issue. As I'd suspected, the entries in the VCF file for SNVs were slightly more numerous than the native SNV output file.

            This is because we output "indelError" calls (positions where normal shows a SNV but tumor shows an indel or vice-versa) to the VCF for the sake of completeness. Their filter status is "indelError" to indicate that these are likely artifactual calls. We don't output them to the native output format for that reason.

            If I remove "indelError" positions from your SNP VCF and then apply somaticFilter, the results are identical to running it on the native output files. That being said, you might wish to use the filtering results from the unmodified VCF, because SNVs clustering around "indelError" calls should probably be removed. After that, any "indelError" calls that passed somaticFilter can be removed using grep.

            Thank you for your help on this!

            Yours,

            Dan Koboldt

            Comment


            • #7
              Hi,
              I'm using the latest version of varScan
              Does the somatic filter support output in .vcf format?
              I used the indel and snp .vcf files obtained with somatic command line:
              java -Xmx6g -jar VarScan.v2.3.6.jar somaticFilter 34-C-S.varScan.output.snp.vcf --min-strands2 2 --min-avg-qual 25 --min-var-freq 0.3 --p-value 0.05 --min-strands2 2 --min-reads2 3 --indel-file 34-C-S.varScan.output.indel.vcf --output-vcf 1 34-C-S.vcf

              The software starts:
              Reading input from /Users/mac2/Documents/trasferimento/Napoli_2013/bam_bgi_pileup/34-C-S.varScan.output.snp.vcf
              1927 cluster SNPs identified
              Reading input from /Users/mac2/Documents/trasferimento/Napoli_2013/bam_bgi_pileup/34-C-S.varScan.output.snp.vcf
              97671 variants in input stream
              395 failed to meet coverage requirement
              969 failed to meet reads2 requirement
              5504 failed to meet varfreq requirement
              3510 failed to meet p-value requirement
              1340 in SNP clusters were removed
              329 were removed near indels
              85624 passed filters

              but no output file has been created
              Many thanks,
              Francesco

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X