shuang 09-07-2011 09:31 AM

SNP base calling for multiple samples
My goal is to find SNPs from multiple Sanger sequences, which cover varied regions, not necessary the same fragments. When a SNP is found, I want to see the base (or hetero/homo-zygote) of any samples which cover that position. What kind of software should I use? I would prefer a command-line software.

I tried to align sequences via bwasw and pileup all via samtools. However, when I pileuped all samples together, the output did NOT clearly distinguish between when a SNP location not covered in a sample and when a sample had hetero in that SNP location. My commands below.

samtools mpileup -uf Sorbil.fasta a1.bam a2.bam a3.bam | bcftools/bcftools view -bvcg - > raw.bcf

bcftools/bcftools view raw.bcf | varFilter -D100 > flt.vcf

swbarnes2 09-07-2011 10:44 AM

I'm not sure what you mean by "clearly", but you should see a 1/1, 0/1 or 0/0 in each entry. That tells you if the SNP is predicted to be homozygous alternate, heterozygous, or homozygous reference. Unfortunately, what you really want is the DP4 for each sample separately, and doing a multi-vcf with samtools won't do that. You'll need the individual vcf file to see that.

shuang 09-07-2011 03:06 PM

In my case, some sequences showed 1/0 for a SNP position, however, some of those sequences did not cover that position.

I want to know whether a SNP base is a polymorphic, or wide type, or uncovered in a given sequence/file.

