SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics
Similar Threads
Thread Thread Starter Forum Replies Last Post
Somatic Mutations in dbSNP qqcandy Bioinformatics 14 07-27-2015 02:34 PM
Pipeline to find somatic mutations david.tamborero Bioinformatics 6 08-09-2013 02:05 AM
Samtools mpileup_Paired Tumoral / Germline_keep only somatic mutations Sam64 Genomic Resequencing 2 02-29-2012 11:01 AM
Option "calmd"; Reporting indels and Somatic mutations for Whole Exome Seq data: angerusso Bioinformatics 0 01-10-2012 03:32 PM
Paired-sample (tumor/normal) somatic mutation detection software alexischr Bioinformatics 1 04-14-2011 04:56 AM

Reply
 
Thread Tools
Old 03-16-2012, 06:17 AM   #1
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 239
Default Detection of somatic mutations in normal & tumour paired NGS data

Hello,

I am trying to detect mutations (SNPs, insertions and deletions) from DNA-seq data. I have both reference and tumoral samples.
I am convinced that a simultaneous comparison of both samples is more rigorous than a "subtraction" method based on independent analyses. Thus, I'm interested in tools allowing simultaneous comparison.

I have tried VarScan but there are important bugs in it.
I have found JointSNVMix combined with mutationSeq.

Could you suggest me other tools for this problematic?

Thanks,
Jane
Jane M is offline   Reply With Quote
Old 03-16-2012, 09:38 AM   #2
ernfrid
Member
 
Location: Saint Louis

Join Date: Jul 2011
Posts: 26
Default

You might try:
SomaticSniper (http://gmt.genome.wustl.edu/somatic-sniper/current/), but it will only report SNVs.

The GATK's somatic indel detector (http://www.broadinstitute.org/gsa/wi...Indel_Detector).

The samtools package's mpileup command plus bcftools in "paired mode" ( http://samtools.sourceforge.net/samtools.shtml)
ernfrid is offline   Reply With Quote
Old 03-16-2012, 10:34 AM   #3
dkoboldt
Member
 
Location: St. Louis

Join Date: Mar 2009
Posts: 62
Default

Jane M,

Thank you for your message. I must respectfully disagree with your statement that Varscan "has important bugs in it." There are dozens of groups using it with great success to detect variants in humans and model organisms, and to call somatic mutations in cancer datasets.

However, I did realize that a few of your questions from this thread were outstanding, and I've done my best to answer them:

http://seqanswers.com/forums/showthread.php?p=67930

I would like to also recommend another tool developed at our institute for somatic mutation calling, SomaticSniper:

http://gmt.genome.wustl.edu/somatic-sniper/
dkoboldt is offline   Reply With Quote
Old 03-16-2012, 01:51 PM   #4
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 239
Default

Thank both of you for your answers.

Dan, thank you for your answers to my questions on the other topic.
I must say that my main question/issue isn't solved: http://seqanswers.com/forums/showthread.php?t=16599.
I'm sure that my "missing reads" have not been filtered out due to low mapping or base quality. And as I specified, some other people have the same problem.

Could the problem not come from the fact that the model isn't adapted to all kinds of data? Or different versions of JDK, JVF? Or libraries, machine configuration?...
Jane M is offline   Reply With Quote
Old 03-17-2012, 06:50 PM   #5
patternist
Junior Member
 
Location: Maryland

Join Date: Jul 2009
Posts: 5
Default

Hi,

Check the bambino out, it reports both SNVs and indels, then you can annotate with ANNOVAR.

https://cgwb.nci.nih.gov/goldenPath/...ion/index.html

Last edited by patternist; 03-18-2012 at 09:12 AM.
patternist is offline   Reply With Quote
Old 03-18-2012, 12:02 PM   #6
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 239
Default

Thanks patternist, I didn't know bambino! I have found a publication (Bambino: a variant detector and alignment viewer for next-generation sequencing data in the SAM/BAM format, from January 2011) but the model is not described.
Do you know where I can find the details concerning what is done in it?
Jane M is offline   Reply With Quote
Old 03-20-2012, 06:21 AM   #7
airtime
Member
 
Location: Germany

Join Date: Jan 2011
Posts: 14
Default

Hi,

VarScan2 is published try it out and share your experience.
It should be better than the first version (mentioned in the paper) and also outperformce the SomaticSniper.
airtime is offline   Reply With Quote
Old 03-20-2012, 08:50 AM   #8
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 239
Default

I have tried VarScan2 (2.2.8) in both modes (simple and somatic) and what I think is at the beginning of the topic and here: http://seqanswers.com/forums/showthread.php?t=16599.
Jane M is offline   Reply With Quote
Old 03-21-2012, 02:12 AM   #9
airtime
Member
 
Location: Germany

Join Date: Jan 2011
Posts: 14
Default

Hi,

Ifound this info at VarScan 2 description:

Base alignment quality (BAQ) computation is turned on by default. BAQ is a phred-like score representing the probability that a read base is mis-aligned; it lowers the base quality score of mismatches that are near indels. This is to help rule out false positive SNP calls due to alignment artifacts near small indels. There have been recent suggestions, however, that BAQ may be too strict and cause real SNPs to be missed. Several users of the VarScan variant caller have reported that its read counts disagree with what is seen in IGV, or somatic mutations were missed when mpileup was used instead of pileup. These issues are almost always due to BAQ’s downgrade of base qualities to 0 or 1. This adjustment can’t be seen in IGV, but it’s below VarScan’s default base quality threshold. You can disable BAQ with the -B parameter, or perform a more sensitive BAQ calculation with -E. I’ve heard that the latter option will be turned on by default in the next version of SAMtools.

I hope it help's
airtime is offline   Reply With Quote
Old 03-22-2012, 12:53 AM   #10
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 239
Default

Hi,
Thank you for the info, I haven' read it.
Well, I'm rather in the case: "Several users of the VarScan variant caller have reported that its read counts disagree with what is seen in IGV".
From what I read, I could solve the problem with -B or -E parameters.
Could you please tell me where you got this info? I am wondering since Dan Kobold, who is VarScan maintainer, didn't suggest me that few days ago... Was the solution that you found proposed by the author?

Last edited by Jane M; 03-22-2012 at 07:08 AM.
Jane M is offline   Reply With Quote
Old 03-22-2012, 11:06 PM   #11
airtime
Member
 
Location: Germany

Join Date: Jan 2011
Posts: 14
Default

Hi,

I found it here:
http://www.massgenomics.org/2012/03/...s-mpileup.html

It depends on samtool parameters, this could be the reason that Kobold didn't find out.

regards

Air
airtime is offline   Reply With Quote
Old 03-23-2012, 02:50 AM   #12
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 239
Default

Hi airtime,

Thanks for the link.
Yesterday, I reran samtools with -B option then VarScan2 and all the "bugs"= wrong read counts that I had noticed were now correct!

So thank you very much for the info !!!! I have been experiencing this issue for 2-3 months and you solved it Thanks a lot !

I must admit that I don't understand yet why this option can change so much the results:
For example, at one position, I have:
Quote:
In IGV: 185 (normal sample, reference) 165 (normal sample, variant) 8(tumoral sample,reference) 359(tumoral sample, variant)
In VarScan2 (without -B option in samtools) : 183 (normal sample, reference) 4 (normal sample, variant) 8(tumoral sample,reference) 14(tumoral sample, variant)
In VarScan2 (with -B option in samtools) : 184 (normal sample, reference) 164 (normal sample, variant) 8(tumoral sample,reference) 359(tumoral sample, variant)
I am much more confident in the results now

Now, I should apologize to Dan Kobold... The bugs were not in VarScan, sorry!
Dan, you told me that dozens of groups are using VarScan to detect variants. Maybe you could try to warn them about this issue, because the ones who are not using -B or -E option for samtools are probably working on incorrect data.

The last issue that I'm experiencing with VarScan2 is the strand filter. I am running it this way:
Quote:
java -Xmx10g -jar VarScan.v2.2.8.jar somatic /data/fibros_convertedAB_sorted.pileup /data/296_convertedAB_sorted.pileup --output-snp /data/output_varscan_AB.snp --output-indel /data/output_varscan_AB.indel --min-coverage 10 --min-coverage-normal 10 --min-coverage-tumor 10 --min-var-freq 0.1 --min-freq-for-hom 0.75 --normal-purity 1 --tumor-purity 1 --p-value 0.01 --somatic-p-value 0.01 --strand-filter 1 --min-avg-qual 25 --min-strands2 2 --min-reads2 3
then SomaticFilter:
Quote:
java -Xmx20g -jar VarScan.v2.2.8.jar somaticFilter /data/output_varscan_AB.snp --min-strands2 2 --min-avg-qual 25 --min-var-freq 0.1 --p-value 0.05 --indel-file /data/output_varscan_AB.indel --output-file /data/output_somaticFilter_varscan_AB.snp
But I get such an output:
Quote:
chrom position ref var normal_reads1 normal_reads2 normal_var_freq normal_gt tumor_reads1 tumor_reads2 tumor_var_freq tumor_gt somatic_status variant_p_value somatic_p_value tumor_reads1_plus tumor_reads1_minus tumor_reads2_plus tumor_reads2_minus
chr4 114260538 C T 35 40 53,33% Y 0 86 100% T LOH 1.0 9.50234823641282E-15 0 0 0 86
Why this position has not been filtered out by "--strand-filter 1". For me, there is clearly a strand bias here...

Last edited by Jane M; 03-23-2012 at 02:54 AM.
Jane M is offline   Reply With Quote
Old 03-28-2012, 08:20 AM   #13
dkoboldt
Member
 
Location: St. Louis

Join Date: Mar 2009
Posts: 62
Default

Jane,

Thank you for this detailed post, and for following up on this strand question. Your site is homozygous in the tumor (due to LOH) but VarScan's strand filter currently only works on sites that are heterozygous in the tumor.

This is because it compares the strand representation of the reference allele to the strand representation of the variant allele. If no reference alleles are seen in the tumor, that comparison can't be made.

Your comment has me thinking, however, that the strand filtering capabilities in VarScan need some improvement. I'll work on that for the next release.

In the meantime, you might try the filtering strategy we outlined in the VarScan 2 paper, in which you run bam-readcount on all sites and then process the results with the VarScan 2 accessory script fpfilter.pl.
dkoboldt is offline   Reply With Quote
Old 03-28-2012, 12:53 PM   #14
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

For the record, I observed the same thing with mpileup and BAQ calculations on a few occaions. Specifically, I observed that some SNPs that were called fine with pileup were vanishing in mpileup, including seom which had been verified with sanger sequencing. When I looked at the pileup files made by mpileup, and compared them to the .sam files, it was clear that mpileup was representing the quality scores of the alternate letters as being almost 0, while in the .sam, the quality scores were high. The older pileup was faithfully carrying over the quality scores in the pileup output file. A little investigating, and I saw that it was the BAQ calculations responsible, on by default in mpileup. When I disengaged them with -B, the quality scores in the pileup output files matched the quality scores in the .sam files, and the SNPs were callable.
swbarnes2 is offline   Reply With Quote
Old 03-29-2012, 07:21 AM   #15
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 239
Default

Thank you for the explanation Dan. Do you do a FET for the strand filter on sites that are heterozygous in the tumor? I guess that you can take the number of reads supporting the reference (in forward and reverse strands) in tumoral sample as theoretical counts and the number of reads supporting the variant (in forward and reverse strands) in tumoral sample as observed counts...

I don't enough experiment yet, but I assume that handling only the cases where sites are heterozygous in the tumor allows to filter half of data? Or is it known that in tumour, we observed more heterozygous sites than homozygous mutated sites?

I have developed a basic filter to handle the "other half" of the cases, but for now, it's not very good. When do you think to have the next release ready? Any idea?

I will try the bam-readcount and fpfilter.pl to filter more false positives !
Thanks,
Jane
Jane M is offline   Reply With Quote
Old 06-22-2012, 12:18 PM   #16
david.tamborero
Member
 
Location: spain

Join Date: Feb 2011
Posts: 60
Default

Just for your records, Varscan really had a bug when filtering bases according to its quality. This happened regardless of the mpileup recalibration, since I inputted dummy pileup files created by myself. Actually, samtools also goes wrong for base quality filtering (I do not know if they fixed it, no one answered me).

Anyway, the last version of Varscan states that this was fixed. I have to thank Dan Kobold for his work and for updating his tool according to user's feedback!

Soon I will use the Varscan2, I am really interested in the somatic SNP false positive filter and in the detection of copy number changes from exome seq. I will feedback my experience!
david.tamborero is offline   Reply With Quote
Old 10-11-2013, 09:49 AM   #17
Kingdom
Junior Member
 
Location: Nashville

Join Date: Oct 2013
Posts: 1
Default a comparison of mutation callers

This paper (at http://genomemedicine.com/content/5/10/91/) compares popular mutation callers, including VarScan 2 and MuTect, using validated data.
Kingdom is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



All times are GMT -8. The time now is 03:17 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO