Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Confused about adjacent somatic SNVs within a 100bp window

    Hi all,

    I'm calling somatic SNVs with the matched tumor/normal-pairs, and the called candidates overlapped among multiple callers were further visualized with the IGV and analyzed manually.
    Theoretically, the vast majority of true positive SNVs sould be singletons within a window of 100 bp. However, I found several adjacent high quality candidates were not singletons and emanated from the same reads or read-pairs. I was confused by this kind of adjacent SNVs.

    What can explain this kind of high quality non-singleton SNVs ?

    I don't know if they were false positives resulted from mapping/aligment errors.
    At the preprocessing stage, I removed reads containing Ns, trimmed adaptors and low quality bases (Q30). Then, only paired reads were used and aligned to ref with bwa default settings. When calling SNVs, only reads with mapping quality score > 30 (MAPQ>30) were counted, and threshold for calling an SNV was alternate reads >= 3.

    Best regards.
    Attached Files
    Last edited by lovenlong; 01-02-2014, 07:01 AM.

  • #2
    You may need to filter strand specific errors from your data. In addition, some indels can result in alignment anomalies that result in multiple SNPs appearing in close proximity near the ends of the aligned portion of the reads containing the indel.

    Many tools have filters in place to address these artifacts. Varscan includes a filtering tool that you may be able to apply to your data. See http://tvap.genome.wustl.edu/tools/varscan/.

    If you see somatic mutations with support on both strands in close proximity you may want to refer to this manuscript:



    An APOBEC cytidine deaminase mutagenesis pattern is widespread in human cancersNature Genetics 45, 970–976 (2013) doi:10.1038/ng.2702

    "...throughout cancer genomes APOBEC-mediated mutagenesis is pervasive and correlates with APOBEC mRNA levels. Mutation clusters in whole-genome and exome data sets conformed to the stringent criteria indicative of an APOBEC mutation pattern. "
    Last edited by m_two; 01-02-2014, 09:46 AM.

    Comment


    • #3
      Hi, m_two
      Thanks for your reply.

      Actually, my samples were not real somatic tissues, but were pooled of 100 rice plant individuals following mutagenesis. The theoretical variant allele frequency would be very low (<1%), and we thought it would help to improve calling accuracy of these heterozygous SNVs by using the matched tumor-normal pairs.

      As marked in the attached figure, the Tumor_1_bwa.bam and Tumor_2_bwa.bam was the pool of mutagenized generation 1 and 2 respectively,the corresponding wild-type generation 1 and 2 was Normal_1_bwa.bam and Normal_2_bwa.bam. My wild-type was a pure rice cultivar which had been self-crossing at least 15 generations.

      Due to low effective coverage, I merged the two generations of pools respectively, and called "somatics" with muTect, Strelka and Varscan2. My analysis pipeline was as follows:
      1.calling somatic: (MAPQ≥30, Base_Q≥30)
      Clean reads > bwa mapping > merged_BAMs > calling somatic with different callers > filtering
      Clean reads > Stampy mapping > merged_BAMs > calling somatic with different callers > filtering

      2. Eliminate mapping errors by combining calls of bwa and stampy.
      calls_of_bwa_callerA + calls_of_stampy_callerA > overlapped SNVs of callerA
      calls_of_bwa_callerB + calls_of_stampy_callerB > overlapped SNVs of callerB
      ...

      muTect_passed_overlapped = 519
      Strelka_passed_overlapped =55
      Varscan2_passed_overlapped = 60

      3. Finding concordant overlapped_SNVs among mutliple callers
      muTect_call = Varsc_call =29 SNVs, muTect_call = Strelka_call =43 SNVs, Varsc_call = Strelka_call =23, 3_callers_overlapped=20.

      4. Implement hard filtration of SNVs
      (1) No more than 1 ALT read or read pair has additional mismatch/gap;
      (2) No more than 3 additional mismatches/gaps exist within 50 bp either side of ALT site;
      (3) ALT reads maximum MAPQ > 40;
      (4) When ALT reads ≥ 4, they should not emanate exclusively from one strand;
      (5) At least 2 mismatches or gaps are not in the 10 bp beginning or end of ALT reads;
      (6) Mismatches or gaps are not at the beginning or end of homopolymers or SSRs (n>4)

      Finally, 35 SNVs called by at least 2 callers were selected for validation.
      I'm not sure if my analysis workflow was correct.
      I would be very grateful if any body could give me some suggestions.

      Best regards.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      25 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      29 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      24 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Working...
      X