SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   A heretically simple approach to variant calling (http://seqanswers.com/forums/showthread.php?t=16379)

krawitz 12-21-2011 12:16 AM

A heretically simple approach to variant calling
 
Hi everyone,

we had a look at the distribution of heterozygous allele frequencies in NGS datasets and found that their variance is larger than expected by a bionomial distribution (http://www.ncbi.nlm.nih.gov/pubmed?term=22127862). For every variant caller this means a binomial prior distribution is not the right choice and might lead to false negative calls. We also found that a simple frequency classifier (heterozygous if covered by more the 20 reads and variant allele between 14% and 86%) is more sensitive at comparable specificity for high quality data, compared to default setting of most standard calling tools.

Is anyone aware of a fast tool, that allows to apply such a frequency filter directly on a .bam file?

cheers,

peter

Hena 12-21-2011 03:58 AM

You can ask samtools mpileup to print out the nucleotide pileups for each position in bam file. Parsing that should be fairly simple with a script.

krawitz 02-04-2012 05:07 AM

the samtools mpileup output can be piped into VarScan to apply a coverage and frequency filter:
samtools pileup -f reference.fasta myData.bam | java -jar VarScan.v2.2.jar pileup2snp --min-coverage 20 --min-var-freq 0.14
see: http://varscan.sourceforge.net/using-varscan.html

NGSfan 04-24-2012 09:10 AM

Very nice paper.

Sometimes simpler is better.

Rocketknight 04-25-2012 04:01 AM

This is awesome. Moving away from big fancy well-established tools to something like the "14-86%" rule is scary though.


All times are GMT -8. The time now is 04:42 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.