Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • VarScan mpileup2snp missing a call - can't figure out why

    VarScan mpileup to snp seems to be missing a call

    the mpileup file looks like this...

    Code:
    chr5	13894894	T		40	.a.AAAAAaAA,aaaaA,,aAA,AaaAaAaAaAAa,.,.,	B02/8/0///0G/3/10GH///C/5//0511//10HHH3A	55	a,....Aa,.AAa,aAAA.a,,,,...,,a..,,A,.,.,AaAAaA.aaA.,,,a	!DHHHH0/HH1//H////G/HHGHHGFHH/GGHH/HHHFC////1/G/00FHHH)
    Then, I'm using the following options in VarScan
    Code:
    mpileup2snp --min-coverage 20 --min-var-freq 0.2 --min-reads2 4 --strand-filter 1 --output-vcf 1 >whatever.vcf
    And this is the line in the VCF where the problem is...

    Code:
    chr5	13894894	.	T	A	.	PASS	ADP=32;WT=1;HET=2;HOM=0;NC=1	GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR		0/1:53:40:25:11:14:56%:4.7528E-6:33:16:4:7:7:7		0/0:32:55:37:32:5:13.51%:2.706E-2:38:15:15:17:3:2
    Attached is an image from IVG of the two alignments (front to back replicates of the same specimen, variant is clearly in both, but not in VCF)

    Any thoughts? Apologies is the answer is obvious, but I can't seem to figure it out.

    Thanks in advance
    Attached Files

  • #2
    Thank you for posting. From the VarScan VCF entry for sample 2:
    0/0:32:55:37:32:5:13.51%:2.706E-2:38:15:15:17:3:2

    You can see that the VAF as computed by VarScan is 13.51%, which is below your minimum threshold of 20%.

    Notably, VarScan depth (DP, field 4) is 37 whereas the raw SAMtools depth (SDP, field 3) is 55. This suggests that the raw pileup and IGV are showing about 18 reads whose base qualities are below VarScan's threshold.

    Comment


    • #3
      Originally posted by dkoboldt View Post
      Thank you for posting. From the VarScan VCF entry for sample 2:
      0/0:32:55:37:32:5:13.51%:2.706E-2:38:15:15:17:3:2

      You can see that the VAF as computed by VarScan is 13.51%, which is below your minimum threshold of 20%.

      Notably, VarScan depth (DP, field 4) is 37 whereas the raw SAMtools depth (SDP, field 3) is 55. This suggests that the raw pileup and IGV are showing about 18 reads whose base qualities are below VarScan's threshold.
      Thanks very much for the reply.

      I had wondered whether this was a result of the base quality being downgraded by samtools, and therefore not read by varscan. I've tried mpileup with -B and -E, but neither seems to make much difference.

      It's not a huge issue as overall concordance between the two specimens is very good, but I like to chase down the "misses". I might try a different quality threshold in varscan and see if that helps at all.

      Comment


      • #4
        I'm sorry to re-ask the same question again, but I've noticed another missed variant call using the same above pipeline, and this one I can't figure out based on the base qualities. If anyone can point out what I'm missing, I'd really appreciate it.

        mpileup at position
        Code:
        chr11	89017961	G	26	..A.A,a....,a....A,,.a,.,,	GG5?5H6GEGGG52GG06FHC6BAHH
        As far as I can tell, the variant bases have an average quality of around 20 and the reference bases have an average quality of around 30 (am I wrong?). There should be no strand bias issues.

        Yet, using the following VarScan parameters does not yield the variant in the vcf.

        Code:
        --min-coverage 20 --min-var-freq 0.2 --min-reads2 4 --output-vcf 1 --strand-filter 1
        Any help greatly appreciated, I'm really lost on this one.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        17 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Working...
        X