View Single Post
Old 08-26-2011, 10:43 AM   #12
donthu
Junior Member
 
Location: Chicago

Join Date: Oct 2009
Posts: 6
Default

Quote:
Originally Posted by TuA View Post
Dear all,

For some reasons I get way too less SNPs called with samtools mpileup (0.1.17) that I'm supposed to get.

I work with paired-end SOLiD data (partial genome resequencing of a diploid species, ~20x coverage)

Here is the code I'm using:

Code:
samtools mpileup -u -Q 20 -f ref.fa infile.bam | bcftools view -cgbu - > outfile_raw.bcf
bcftools view outfile_raw.bcf | ./vcfutils.pl varFilter -D1000 > outfile.vcf

I tried different versions of samtools, but consistently get the same result.

For example:
For one sample I got 3250 SNPs with samtools and >21000 SNPs with the SNP tool of the CLC genomics workbench (which is close to what I expect) - although the settings of the CLC workbench were in my opinion much more stringent. When I run samtools without the '-Q 20' option, I get only slightly more SNPs (4225). All SNPs I detect with samtools I also find with the CLC workbench. However, with the CLC workbench I detect many more SNPs in addition to them (see examples below).

Settings CLC workbench SNP calling:
- min base coverage: 2x
- min frequency of variant: 15
- max coverage: 1000x
- min variant count: 2
- sufficient variant count: 4 (SNP called also if frequency < 15)

Attached are two example files
- sample output CLC workbench
- sample output samtools mpileup for same genomic region



Any suggestions why this discrepancy is so large?

Many thanks and best wishes
I had similar problem. When I carefully checked my results I found that samtools had output only SNPs from one chromosome. It did not output SNPs from all other chromosomes. I reran mpileup command similar to yours after adding -I. Then I got ten times more SNPs
donthu is offline   Reply With Quote