SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
samtools mpileup calls way too less SNPs TuA Bioinformatics 17 03-01-2018 04:17 PM
Samtools mpileup calls drastically more SNPs with -I agel Bioinformatics 0 01-20-2012 01:20 PM
Compute the consensus quality, SNP quality in SAMTools lyz1030 Bioinformatics 0 04-13-2011 05:09 PM
samtools mpileup for SNP genotyping (VCF4) kkoh Bioinformatics 0 03-02-2011 12:36 PM
Supporting Reads in SAMTools SNP Calls Lee Sam Bioinformatics 2 07-09-2010 06:16 AM

Reply
 
Thread Tools
Old 02-03-2012, 01:26 PM   #1
myi
Junior Member
 
Location: Maryland

Join Date: Feb 2012
Posts: 2
Exclamation Impact on quality of SNP calls using samtools mpileup

I have a question on the impact on quality of SNP calls using samtools mpileup: calling each sample individually vs calling all samples altogether
I basically follow the samtools URL:
http://samtools.sourceforge.net/mpileup.shtml and
http://samtools.sourceforge.net/samtools.shtml (except using -D 2000 or -d 10 for filtering)

the commands I used basically like below:
Call all samples together (each bam file is from exome-seq of one sample )
samtools mpileup -ugf ref.fa aln1.bam aln2.bam aln3.bam ... *.bam| bcftools view -bvcg - > var.raw.bcf
bcftools view var.raw.bcf | vcfutils.pl varFilter -d 10 > var.flt.vcf

Call one samples a time (each bam file is from one sample)
for loop (for each bam file x.bam)
samtools mpileup -ugf ref.fa x.bam| bcftools view -bvcg - > var.raw.bcf
bcftools view var.raw.bcf | vcfutils.pl varFilter -d 10 > var.flt.vcf
end of loop

We observed that if we use samtools to call each sample (bam file) individually one at a time or call all samples (multiple bam files) together, then we seem get a better SNP calls if we call individually than if we call them altogether (of course we get one vcf file for each sample/bam file if call individually, whereas just one single vcf file for all SNPs of all samples for calling them altogether. We do lose homo reference SNPs in vcf file if calling individually, since for single sample, we only see hetero or homo variant, no homo reference unless we call multiple sample altogether, then some samples would have homo reference calls for sites that other samples were called as hetero or homo variants)

Does any one have similar observation or any comments or insight for such observation (get better SNP calls if we call individually than if call together)? Since I know most people would call all bam files altogether and would not think that calling each sample individually would get a better quality SNPs (using samtools mpileup SNP call). Any interpretation from algorithm side of samtools? We all thought it shall not matter, but we do have bench mark data to show that is true at least in our hand. We use -d 10 or -D 2000 in the filtering step for both cases.
myi is offline   Reply With Quote
Old 02-03-2012, 02:46 PM   #2
dan
wiki wiki
 
Location: Cambridge, England

Join Date: Jul 2008
Posts: 265
Default

Have you asked on the samtools mailing list?

http://sourceforge.net/mail/?group_id=246254

I'd be interested to hear the reply.
__________________
Homepage: Dan Bolser
MetaBase the database of biological databases.
dan is offline   Reply With Quote
Old 02-06-2012, 04:57 AM   #3
myi
Junior Member
 
Location: Maryland

Join Date: Feb 2012
Posts: 2
Default

Thanks, Dan! yes, I did, but not much luck and input from there...
myi is offline   Reply With Quote
Old 03-03-2014, 07:18 AM   #4
Ele
Junior Member
 
Location: Spain

Join Date: Mar 2014
Posts: 1
Default mpileup vs single calling

Did you obtaned any answer for this??
I am wondering why I obtain a higher total amount of variants during single callings than if i do mpileup with samtools even filtering in the same way for both approaches

for several files
Quote:
Originally Posted by myi View Post
I have a question on the impact on quality of SNP calls using samtools mpileup: calling each sample individually vs calling all samples altogether
I basically follow the samtools URL:
http://samtools.sourceforge.net/mpileup.shtml and
http://samtools.sourceforge.net/samtools.shtml (except using -D 2000 or -d 10 for filtering)

the commands I used basically like below:
Call all samples together (each bam file is from exome-seq of one sample )
samtools mpileup -ugf ref.fa aln1.bam aln2.bam aln3.bam ... *.bam| bcftools view -bvcg - > var.raw.bcf
bcftools view var.raw.bcf | vcfutils.pl varFilter -d 10 > var.flt.vcf

Call one samples a time (each bam file is from one sample)
for loop (for each bam file x.bam)
samtools mpileup -ugf ref.fa x.bam| bcftools view -bvcg - > var.raw.bcf
bcftools view var.raw.bcf | vcfutils.pl varFilter -d 10 > var.flt.vcf
end of loop

We observed that if we use samtools to call each sample (bam file) individually one at a time or call all samples (multiple bam files) together, then we seem get a better SNP calls if we call individually than if we call them altogether (of course we get one vcf file for each sample/bam file if call individually, whereas just one single vcf file for all SNPs of all samples for calling them altogether. We do lose homo reference SNPs in vcf file if calling individually, since for single sample, we only see hetero or homo variant, no homo reference unless we call multiple sample altogether, then some samples would have homo reference calls for sites that other samples were called as hetero or homo variants)

Does any one have similar observation or any comments or insight for such observation (get better SNP calls if we call individually than if call together)? Since I know most people would call all bam files altogether and would not think that calling each sample individually would get a better quality SNPs (using samtools mpileup SNP call). Any interpretation from algorithm side of samtools? We all thought it shall not matter, but we do have bench mark data to show that is true at least in our hand. We use -d 10 or -D 2000 in the filtering step for both cases.
Ele is offline   Reply With Quote
Reply

Tags
samtools snp calls

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:21 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO