Dear all,
For some reasons I get way too less SNPs called with samtools mpileup (0.1.17) that I'm supposed to get.
I work with paired-end SOLiD data (partial genome resequencing of a diploid species, ~20x coverage)
Here is the code I'm using:
I tried different versions of samtools, but consistently get the same result.
For example:
For one sample I got 3250 SNPs with samtools and >21000 SNPs with the SNP tool of the CLC genomics workbench (which is close to what I expect) - although the settings of the CLC workbench were in my opinion much more stringent. When I run samtools without the '-Q 20' option, I get only slightly more SNPs (4225). All SNPs I detect with samtools I also find with the CLC workbench. However, with the CLC workbench I detect many more SNPs in addition to them (see examples below).
Settings CLC workbench SNP calling:
- min base coverage: 2x
- min frequency of variant: 15
- max coverage: 1000x
- min variant count: 2
- sufficient variant count: 4 (SNP called also if frequency < 15)
Attached are two example files
- sample output CLC workbench
- sample output samtools mpileup for same genomic region
Any suggestions why this discrepancy is so large?
Many thanks and best wishes
For some reasons I get way too less SNPs called with samtools mpileup (0.1.17) that I'm supposed to get.
I work with paired-end SOLiD data (partial genome resequencing of a diploid species, ~20x coverage)
Here is the code I'm using:
Code:
samtools mpileup -u -Q 20 -f ref.fa infile.bam | bcftools view -cgbu - > outfile_raw.bcf bcftools view outfile_raw.bcf | ./vcfutils.pl varFilter -D1000 > outfile.vcf
I tried different versions of samtools, but consistently get the same result.
For example:
For one sample I got 3250 SNPs with samtools and >21000 SNPs with the SNP tool of the CLC genomics workbench (which is close to what I expect) - although the settings of the CLC workbench were in my opinion much more stringent. When I run samtools without the '-Q 20' option, I get only slightly more SNPs (4225). All SNPs I detect with samtools I also find with the CLC workbench. However, with the CLC workbench I detect many more SNPs in addition to them (see examples below).
Settings CLC workbench SNP calling:
- min base coverage: 2x
- min frequency of variant: 15
- max coverage: 1000x
- min variant count: 2
- sufficient variant count: 4 (SNP called also if frequency < 15)
Attached are two example files
- sample output CLC workbench
- sample output samtools mpileup for same genomic region
Any suggestions why this discrepancy is so large?
Many thanks and best wishes
Comment