![]() |
A heretically simple approach to variant calling
Hi everyone,
we had a look at the distribution of heterozygous allele frequencies in NGS datasets and found that their variance is larger than expected by a bionomial distribution (http://www.ncbi.nlm.nih.gov/pubmed?term=22127862). For every variant caller this means a binomial prior distribution is not the right choice and might lead to false negative calls. We also found that a simple frequency classifier (heterozygous if covered by more the 20 reads and variant allele between 14% and 86%) is more sensitive at comparable specificity for high quality data, compared to default setting of most standard calling tools. Is anyone aware of a fast tool, that allows to apply such a frequency filter directly on a .bam file? cheers, peter |
You can ask samtools mpileup to print out the nucleotide pileups for each position in bam file. Parsing that should be fairly simple with a script.
|
the samtools mpileup output can be piped into VarScan to apply a coverage and frequency filter:
samtools pileup -f reference.fasta myData.bam | java -jar VarScan.v2.2.jar pileup2snp --min-coverage 20 --min-var-freq 0.14 see: http://varscan.sourceforge.net/using-varscan.html |
Very nice paper.
Sometimes simpler is better. |
This is awesome. Moving away from big fancy well-established tools to something like the "14-86%" rule is scary though.
|
All times are GMT -8. The time now is 04:42 AM. |
Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.