![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Somatic Mutations in dbSNP | qqcandy | Bioinformatics | 14 | 07-27-2015 02:34 PM |
Pipeline to find somatic mutations | david.tamborero | Bioinformatics | 6 | 08-09-2013 02:05 AM |
Samtools mpileup_Paired Tumoral / Germline_keep only somatic mutations | Sam64 | Genomic Resequencing | 2 | 02-29-2012 11:01 AM |
Option "calmd"; Reporting indels and Somatic mutations for Whole Exome Seq data: | angerusso | Bioinformatics | 0 | 01-10-2012 03:32 PM |
Paired-sample (tumor/normal) somatic mutation detection software | alexischr | Bioinformatics | 1 | 04-14-2011 04:56 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: Paris Join Date: Aug 2011
Posts: 239
|
![]()
Hello,
I am trying to detect mutations (SNPs, insertions and deletions) from DNA-seq data. I have both reference and tumoral samples. I am convinced that a simultaneous comparison of both samples is more rigorous than a "subtraction" method based on independent analyses. Thus, I'm interested in tools allowing simultaneous comparison. I have tried VarScan but there are important bugs in it. I have found JointSNVMix combined with mutationSeq. Could you suggest me other tools for this problematic? Thanks, Jane |
![]() |
![]() |
![]() |
#2 |
Member
Location: Saint Louis Join Date: Jul 2011
Posts: 26
|
![]()
You might try:
SomaticSniper (http://gmt.genome.wustl.edu/somatic-sniper/current/), but it will only report SNVs. The GATK's somatic indel detector (http://www.broadinstitute.org/gsa/wi...Indel_Detector). The samtools package's mpileup command plus bcftools in "paired mode" ( http://samtools.sourceforge.net/samtools.shtml) |
![]() |
![]() |
![]() |
#3 |
Member
Location: St. Louis Join Date: Mar 2009
Posts: 62
|
![]()
Jane M,
Thank you for your message. I must respectfully disagree with your statement that Varscan "has important bugs in it." There are dozens of groups using it with great success to detect variants in humans and model organisms, and to call somatic mutations in cancer datasets. However, I did realize that a few of your questions from this thread were outstanding, and I've done my best to answer them: http://seqanswers.com/forums/showthread.php?p=67930 I would like to also recommend another tool developed at our institute for somatic mutation calling, SomaticSniper: http://gmt.genome.wustl.edu/somatic-sniper/ |
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: Paris Join Date: Aug 2011
Posts: 239
|
![]()
Thank both of you for your answers.
Dan, thank you for your answers to my questions on the other topic. I must say that my main question/issue isn't solved: http://seqanswers.com/forums/showthread.php?t=16599. I'm sure that my "missing reads" have not been filtered out due to low mapping or base quality. And as I specified, some other people have the same problem. Could the problem not come from the fact that the model isn't adapted to all kinds of data? Or different versions of JDK, JVF? Or libraries, machine configuration?... |
![]() |
![]() |
![]() |
#5 |
Junior Member
Location: Maryland Join Date: Jul 2009
Posts: 5
|
![]()
Hi,
Check the bambino out, it reports both SNVs and indels, then you can annotate with ANNOVAR. https://cgwb.nci.nih.gov/goldenPath/...ion/index.html Last edited by patternist; 03-18-2012 at 09:12 AM. |
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: Paris Join Date: Aug 2011
Posts: 239
|
![]()
Thanks patternist, I didn't know bambino! I have found a publication (Bambino: a variant detector and alignment viewer for next-generation sequencing data in the SAM/BAM format, from January 2011) but the model is not described.
Do you know where I can find the details concerning what is done in it? |
![]() |
![]() |
![]() |
#7 |
Member
Location: Germany Join Date: Jan 2011
Posts: 14
|
![]()
Hi,
VarScan2 is published try it out and share your experience. It should be better than the first version (mentioned in the paper) and also outperformce the SomaticSniper. |
![]() |
![]() |
![]() |
#8 |
Senior Member
Location: Paris Join Date: Aug 2011
Posts: 239
|
![]()
I have tried VarScan2 (2.2.8) in both modes (simple and somatic) and what I think is at the beginning of the topic and here: http://seqanswers.com/forums/showthread.php?t=16599.
|
![]() |
![]() |
![]() |
#9 |
Member
Location: Germany Join Date: Jan 2011
Posts: 14
|
![]()
Hi,
Ifound this info at VarScan 2 description: Base alignment quality (BAQ) computation is turned on by default. BAQ is a phred-like score representing the probability that a read base is mis-aligned; it lowers the base quality score of mismatches that are near indels. This is to help rule out false positive SNP calls due to alignment artifacts near small indels. There have been recent suggestions, however, that BAQ may be too strict and cause real SNPs to be missed. Several users of the VarScan variant caller have reported that its read counts disagree with what is seen in IGV, or somatic mutations were missed when mpileup was used instead of pileup. These issues are almost always due to BAQ’s downgrade of base qualities to 0 or 1. This adjustment can’t be seen in IGV, but it’s below VarScan’s default base quality threshold. You can disable BAQ with the -B parameter, or perform a more sensitive BAQ calculation with -E. I’ve heard that the latter option will be turned on by default in the next version of SAMtools. I hope it help's |
![]() |
![]() |
![]() |
#10 |
Senior Member
Location: Paris Join Date: Aug 2011
Posts: 239
|
![]()
Hi,
Thank you for the info, I haven' read it. Well, I'm rather in the case: "Several users of the VarScan variant caller have reported that its read counts disagree with what is seen in IGV". From what I read, I could solve the problem with -B or -E parameters. Could you please tell me where you got this info? I am wondering since Dan Kobold, who is VarScan maintainer, didn't suggest me that few days ago... Was the solution that you found proposed by the author? Last edited by Jane M; 03-22-2012 at 07:08 AM. |
![]() |
![]() |
![]() |
#11 |
Member
Location: Germany Join Date: Jan 2011
Posts: 14
|
![]()
Hi,
I found it here: http://www.massgenomics.org/2012/03/...s-mpileup.html It depends on samtool parameters, this could be the reason that Kobold didn't find out. regards Air |
![]() |
![]() |
![]() |
#12 | ||||
Senior Member
Location: Paris Join Date: Aug 2011
Posts: 239
|
![]()
Hi airtime,
Thanks for the link. Yesterday, I reran samtools with -B option then VarScan2 and all the "bugs"= wrong read counts that I had noticed were now correct! So thank you very much for the info !!!! I have been experiencing this issue for 2-3 months and you solved it ![]() I must admit that I don't understand yet why this option can change so much the results: For example, at one position, I have: Quote:
![]() Now, I should apologize to Dan Kobold... The bugs were not in VarScan, sorry! Dan, you told me that dozens of groups are using VarScan to detect variants. Maybe you could try to warn them about this issue, because the ones who are not using -B or -E option for samtools are probably working on incorrect data. The last issue that I'm experiencing with VarScan2 is the strand filter. I am running it this way: Quote:
Quote:
Quote:
Last edited by Jane M; 03-23-2012 at 02:54 AM. |
||||
![]() |
![]() |
![]() |
#13 |
Member
Location: St. Louis Join Date: Mar 2009
Posts: 62
|
![]()
Jane,
Thank you for this detailed post, and for following up on this strand question. Your site is homozygous in the tumor (due to LOH) but VarScan's strand filter currently only works on sites that are heterozygous in the tumor. This is because it compares the strand representation of the reference allele to the strand representation of the variant allele. If no reference alleles are seen in the tumor, that comparison can't be made. Your comment has me thinking, however, that the strand filtering capabilities in VarScan need some improvement. I'll work on that for the next release. In the meantime, you might try the filtering strategy we outlined in the VarScan 2 paper, in which you run bam-readcount on all sites and then process the results with the VarScan 2 accessory script fpfilter.pl. |
![]() |
![]() |
![]() |
#14 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
For the record, I observed the same thing with mpileup and BAQ calculations on a few occaions. Specifically, I observed that some SNPs that were called fine with pileup were vanishing in mpileup, including seom which had been verified with sanger sequencing. When I looked at the pileup files made by mpileup, and compared them to the .sam files, it was clear that mpileup was representing the quality scores of the alternate letters as being almost 0, while in the .sam, the quality scores were high. The older pileup was faithfully carrying over the quality scores in the pileup output file. A little investigating, and I saw that it was the BAQ calculations responsible, on by default in mpileup. When I disengaged them with -B, the quality scores in the pileup output files matched the quality scores in the .sam files, and the SNPs were callable.
|
![]() |
![]() |
![]() |
#15 |
Senior Member
Location: Paris Join Date: Aug 2011
Posts: 239
|
![]()
Thank you for the explanation Dan. Do you do a FET for the strand filter on sites that are heterozygous in the tumor? I guess that you can take the number of reads supporting the reference (in forward and reverse strands) in tumoral sample as theoretical counts and the number of reads supporting the variant (in forward and reverse strands) in tumoral sample as observed counts...
I don't enough experiment yet, but I assume that handling only the cases where sites are heterozygous in the tumor allows to filter half of data? Or is it known that in tumour, we observed more heterozygous sites than homozygous mutated sites? I have developed a basic filter to handle the "other half" of the cases, but for now, it's not very good. When do you think to have the next release ready? Any idea? I will try the bam-readcount and fpfilter.pl to filter more false positives ! Thanks, Jane |
![]() |
![]() |
![]() |
#16 |
Member
Location: spain Join Date: Feb 2011
Posts: 60
|
![]()
Just for your records, Varscan really had a bug when filtering bases according to its quality. This happened regardless of the mpileup recalibration, since I inputted dummy pileup files created by myself. Actually, samtools also goes wrong for base quality filtering (I do not know if they fixed it, no one answered me).
Anyway, the last version of Varscan states that this was fixed. I have to thank Dan Kobold for his work and for updating his tool according to user's feedback! Soon I will use the Varscan2, I am really interested in the somatic SNP false positive filter and in the detection of copy number changes from exome seq. I will feedback my experience! |
![]() |
![]() |
![]() |
#17 |
Junior Member
Location: Nashville Join Date: Oct 2013
Posts: 1
|
![]()
This paper (at http://genomemedicine.com/content/5/10/91/) compares popular mutation callers, including VarScan 2 and MuTect, using validated data.
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|