Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Detecting SNPs, SNVs using CLC genomics wqorkbench NGS_New_User Bioinformatics 0 10-10-2012 06:20 AM
Truth set for NA12878 SNVs alonie Bioinformatics 3 09-20-2012 10:53 PM
Condel Predict outcome of nonsynonymous SNVs Lien Bioinformatics 4 02-15-2012 08:50 AM
Cufflinks merges adjacent genes proteomania Bioinformatics 1 11-20-2010 02:58 PM
Compare different files of SNVs Lien Bioinformatics 1 06-01-2010 06:10 PM

Thread Tools
Old 01-02-2014, 05:51 AM   #1
Location: Guangzhou, CN

Join Date: Jan 2013
Posts: 16
Default Confused about adjacent somatic SNVs within a 100bp window

Hi all,

I'm calling somatic SNVs with the matched tumor/normal-pairs, and the called candidates overlapped among multiple callers were further visualized with the IGV and analyzed manually.
Theoretically, the vast majority of true positive SNVs sould be singletons within a window of 100 bp. However, I found several adjacent high quality candidates were not singletons and emanated from the same reads or read-pairs. I was confused by this kind of adjacent SNVs.

What can explain this kind of high quality non-singleton SNVs ?

I don't know if they were false positives resulted from mapping/aligment errors.
At the preprocessing stage, I removed reads containing Ns, trimmed adaptors and low quality bases (Q30). Then, only paired reads were used and aligned to ref with bwa default settings. When calling SNVs, only reads with mapping quality score > 30 (MAPQ>30) were counted, and threshold for calling an SNV was alternate reads >= 3.

Best regards.
Attached Images
File Type: png ??5.png (144.9 KB, 10 views)

Last edited by lovenlong; 01-02-2014 at 06:01 AM.
lovenlong is offline   Reply With Quote
Old 01-02-2014, 08:37 AM   #2
Location: USA

Join Date: Mar 2010
Posts: 50

You may need to filter strand specific errors from your data. In addition, some indels can result in alignment anomalies that result in multiple SNPs appearing in close proximity near the ends of the aligned portion of the reads containing the indel.

Many tools have filters in place to address these artifacts. Varscan includes a filtering tool that you may be able to apply to your data. See

If you see somatic mutations with support on both strands in close proximity you may want to refer to this manuscript:

An APOBEC cytidine deaminase mutagenesis pattern is widespread in human cancersNature Genetics 45, 970–976 (2013) doi:10.1038/ng.2702

"...throughout cancer genomes APOBEC-mediated mutagenesis is pervasive and correlates with APOBEC mRNA levels. Mutation clusters in whole-genome and exome data sets conformed to the stringent criteria indicative of an APOBEC mutation pattern. "

Last edited by m_two; 01-02-2014 at 08:46 AM.
m_two is offline   Reply With Quote
Old 01-02-2014, 04:47 PM   #3
Location: Guangzhou, CN

Join Date: Jan 2013
Posts: 16

Hi, m_two
Thanks for your reply.

Actually, my samples were not real somatic tissues, but were pooled of 100 rice plant individuals following mutagenesis. The theoretical variant allele frequency would be very low (<1%), and we thought it would help to improve calling accuracy of these heterozygous SNVs by using the matched tumor-normal pairs.

As marked in the attached figure, the Tumor_1_bwa.bam and Tumor_2_bwa.bam was the pool of mutagenized generation 1 and 2 respectively,the corresponding wild-type generation 1 and 2 was Normal_1_bwa.bam and Normal_2_bwa.bam. My wild-type was a pure rice cultivar which had been self-crossing at least 15 generations.

Due to low effective coverage, I merged the two generations of pools respectively, and called "somatics" with muTect, Strelka and Varscan2. My analysis pipeline was as follows:
1.calling somatic: (MAPQ≥30, Base_Q≥30)
Clean reads > bwa mapping > merged_BAMs > calling somatic with different callers > filtering
Clean reads > Stampy mapping > merged_BAMs > calling somatic with different callers > filtering

2. Eliminate mapping errors by combining calls of bwa and stampy.
calls_of_bwa_callerA + calls_of_stampy_callerA > overlapped SNVs of callerA
calls_of_bwa_callerB + calls_of_stampy_callerB > overlapped SNVs of callerB

muTect_passed_overlapped = 519
Strelka_passed_overlapped =55
Varscan2_passed_overlapped = 60

3. Finding concordant overlapped_SNVs among mutliple callers
muTect_call = Varsc_call =29 SNVs, muTect_call = Strelka_call =43 SNVs, Varsc_call = Strelka_call =23, 3_callers_overlapped=20.

4. Implement hard filtration of SNVs
(1) No more than 1 ALT read or read pair has additional mismatch/gap;
(2) No more than 3 additional mismatches/gaps exist within 50 bp either side of ALT site;
(3) ALT reads maximum MAPQ > 40;
(4) When ALT reads ≥ 4, they should not emanate exclusively from one strand;
(5) At least 2 mismatches or gaps are not in the 10 bp beginning or end of ALT reads;
(6) Mismatches or gaps are not at the beginning or end of homopolymers or SSRs (n>4)

Finally, 35 SNVs called by at least 2 callers were selected for validation.
I'm not sure if my analysis workflow was correct.
I would be very grateful if any body could give me some suggestions.

Best regards.
lovenlong is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 12:23 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO