Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Confused about adjacent somatic SNVs within a 100bp window

    Hi all,

    I'm calling somatic SNVs with the matched tumor/normal-pairs, and the called candidates overlapped among multiple callers were further visualized with the IGV and analyzed manually.
    Theoretically, the vast majority of true positive SNVs sould be singletons within a window of 100 bp. However, I found several adjacent high quality candidates were not singletons and emanated from the same reads or read-pairs. I was confused by this kind of adjacent SNVs.

    What can explain this kind of high quality non-singleton SNVs ?

    I don't know if they were false positives resulted from mapping/aligment errors.
    At the preprocessing stage, I removed reads containing Ns, trimmed adaptors and low quality bases (Q30). Then, only paired reads were used and aligned to ref with bwa default settings. When calling SNVs, only reads with mapping quality score > 30 (MAPQ>30) were counted, and threshold for calling an SNV was alternate reads >= 3.

    Best regards.
    Attached Files
    Last edited by lovenlong; 01-02-2014, 07:01 AM.

  • #2
    You may need to filter strand specific errors from your data. In addition, some indels can result in alignment anomalies that result in multiple SNPs appearing in close proximity near the ends of the aligned portion of the reads containing the indel.

    Many tools have filters in place to address these artifacts. Varscan includes a filtering tool that you may be able to apply to your data. See http://tvap.genome.wustl.edu/tools/varscan/.

    If you see somatic mutations with support on both strands in close proximity you may want to refer to this manuscript:



    An APOBEC cytidine deaminase mutagenesis pattern is widespread in human cancersNature Genetics 45, 970–976 (2013) doi:10.1038/ng.2702

    "...throughout cancer genomes APOBEC-mediated mutagenesis is pervasive and correlates with APOBEC mRNA levels. Mutation clusters in whole-genome and exome data sets conformed to the stringent criteria indicative of an APOBEC mutation pattern. "
    Last edited by m_two; 01-02-2014, 09:46 AM.

    Comment


    • #3
      Hi, m_two
      Thanks for your reply.

      Actually, my samples were not real somatic tissues, but were pooled of 100 rice plant individuals following mutagenesis. The theoretical variant allele frequency would be very low (<1%), and we thought it would help to improve calling accuracy of these heterozygous SNVs by using the matched tumor-normal pairs.

      As marked in the attached figure, the Tumor_1_bwa.bam and Tumor_2_bwa.bam was the pool of mutagenized generation 1 and 2 respectively,the corresponding wild-type generation 1 and 2 was Normal_1_bwa.bam and Normal_2_bwa.bam. My wild-type was a pure rice cultivar which had been self-crossing at least 15 generations.

      Due to low effective coverage, I merged the two generations of pools respectively, and called "somatics" with muTect, Strelka and Varscan2. My analysis pipeline was as follows:
      1.calling somatic: (MAPQ≥30, Base_Q≥30)
      Clean reads > bwa mapping > merged_BAMs > calling somatic with different callers > filtering
      Clean reads > Stampy mapping > merged_BAMs > calling somatic with different callers > filtering

      2. Eliminate mapping errors by combining calls of bwa and stampy.
      calls_of_bwa_callerA + calls_of_stampy_callerA > overlapped SNVs of callerA
      calls_of_bwa_callerB + calls_of_stampy_callerB > overlapped SNVs of callerB
      ...

      muTect_passed_overlapped = 519
      Strelka_passed_overlapped =55
      Varscan2_passed_overlapped = 60

      3. Finding concordant overlapped_SNVs among mutliple callers
      muTect_call = Varsc_call =29 SNVs, muTect_call = Strelka_call =43 SNVs, Varsc_call = Strelka_call =23, 3_callers_overlapped=20.

      4. Implement hard filtration of SNVs
      (1) No more than 1 ALT read or read pair has additional mismatch/gap;
      (2) No more than 3 additional mismatches/gaps exist within 50 bp either side of ALT site;
      (3) ALT reads maximum MAPQ > 40;
      (4) When ALT reads ≥ 4, they should not emanate exclusively from one strand;
      (5) At least 2 mismatches or gaps are not in the 10 bp beginning or end of ALT reads;
      (6) Mismatches or gaps are not at the beginning or end of homopolymers or SSRs (n>4)

      Finally, 35 SNVs called by at least 2 callers were selected for validation.
      I'm not sure if my analysis workflow was correct.
      I would be very grateful if any body could give me some suggestions.

      Best regards.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM
      • seqadmin
        The Impact of AI in Genomic Medicine
        by seqadmin



        Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
        02-26-2024, 02:07 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 03-14-2024, 06:13 AM
      0 responses
      33 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-08-2024, 08:03 AM
      0 responses
      72 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-07-2024, 08:13 AM
      0 responses
      80 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-06-2024, 09:51 AM
      0 responses
      68 views
      0 likes
      Last Post seqadmin  
      Working...
      X