SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
TopHat samtools snp calling JasonR Bioinformatics 5 05-13-2013 09:43 AM
Samtools SNP calling vidhya Bioinformatics 3 04-07-2011 07:17 AM
Samtools: SNP calling from only coding regions newbietonextgen Bioinformatics 5 12-10-2010 02:42 PM
SAMTOOLS SNP calling question harrb Bioinformatics 2 12-10-2010 07:37 AM
SAMtools and SNP calling Jan Bioinformatics 2 09-16-2010 02:01 PM

Reply
 
Thread Tools
Old 06-10-2011, 07:29 AM   #1
wwmm933
Member
 
Location: ct

Join Date: Jan 2010
Posts: 17
Default Samtools snp calling -- old version seems to be better for my dataset?

Hi,

I used samtools pileup (0.1.7) and mpileup/bcftool (0.1.14) to call SNPs for my datasets. And we also perform taqman assay of one SNP for the same 28 samples. (One SNP is a small sample, however I still compare the results between samtools and taqman assay.)

We found that results of old version with pileup can match 27/28 samples from Taqman; results of new version with mpileup can only match 7/28 samples from taqman. So old version seems to be better for my dataset. Did anybody do the similar validation before? What is your conclusion?

Thanks!
wwmm933 is offline   Reply With Quote
Old 06-10-2011, 08:26 AM   #2
oiiio
Senior Member
 
Location: USA

Join Date: Jan 2011
Posts: 105
Default

Can you post the command lines you are using for the pileup and mpileup?
oiiio is offline   Reply With Quote
Old 06-10-2011, 10:37 AM   #3
wwmm933
Member
 
Location: ct

Join Date: Jan 2010
Posts: 17
Default

Quote:
Originally Posted by oiiio View Post
Can you post the command lines you are using for the pileup and mpileup?
I used the example command lines in Samtools manual.
After gaining the sorted bam files,

samtools pileup -vcf hg18_genome.fna sample1-sorted.nosoft.bam | tee sample1.raw.txt | samtools.pl varFilter -D6000 > sample1.flt.txt
awk '($3=="*"&&$6>=50)||($3!="*"&&$6>=20)' sample1.flt.txt > sample1.final.txt

samtools mpileup -uf hg18_genome.fna sample1-sorted.bam | bcftools view -p 0.99 -bvcg - > sample1-var.raw.bcf
bcftools view sample1-var.raw.bcf | awk '$6>=3' | vcfutils.pl varFilter -d8 -D10000 -11e-5 -20 -41e-7 > sample1-var.flt.vcf
wwmm933 is offline   Reply With Quote
Old 06-10-2011, 11:55 AM   #4
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

I've had too many experiences where the BAQ calculations ate real SNPs, I always turn it off now. So you might be losing lots of real SNPs in mpileup, which is why the ones you do get aren't as likely to be real.
swbarnes2 is offline   Reply With Quote
Old 06-10-2011, 12:00 PM   #5
wwmm933
Member
 
Location: ct

Join Date: Jan 2010
Posts: 17
Default

Quote:
Originally Posted by swbarnes2 View Post
I've had too many experiences where the BAQ calculations ate real SNPs, I always turn it off now. So you might be losing lots of real SNPs in mpileup, which is why the ones you do get aren't as likely to be real.
Thanks! May I ask which one is BAQ? What should I remove from my command line?
wwmm933 is offline   Reply With Quote
Old 06-10-2011, 02:42 PM   #6
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

samtools mpileup -Buf

Will disengage thr BAQ calculations.

Granted, I do this because I align with bwa, which will find indels; I think the point of the BAQ calculations is to correct for aligners that don't, so if you are using something that doesn't find indels, then maybe you should keep the BAQ calculation in.

But either way, it's worth trying.
swbarnes2 is offline   Reply With Quote
Old 06-12-2011, 12:49 PM   #7
wwmm933
Member
 
Location: ct

Join Date: Jan 2010
Posts: 17
Default

Quote:
Originally Posted by swbarnes2 View Post
samtools mpileup -Buf

Will disengage thr BAQ calculations.

Granted, I do this because I align with bwa, which will find indels; I think the point of the BAQ calculations is to correct for aligners that don't, so if you are using something that doesn't find indels, then maybe you should keep the BAQ calculation in.

But either way, it's worth trying.

Thanks a lot! I will give it a try and report my results here again.
wwmm933 is offline   Reply With Quote
Old 06-13-2011, 12:34 PM   #8
wwmm933
Member
 
Location: ct

Join Date: Jan 2010
Posts: 17
Default

Quote:
Originally Posted by swbarnes2 View Post
samtools mpileup -Buf

Will disengage thr BAQ calculations.

Granted, I do this because I align with bwa, which will find indels; I think the point of the BAQ calculations is to correct for aligners that don't, so if you are using something that doesn't find indels, then maybe you should keep the BAQ calculation in.

But either way, it's worth trying.
You are right! The results look good after I add -B in the command line. Many thanks to you!
wwmm933 is offline   Reply With Quote
Old 11-14-2011, 02:25 AM   #9
pengchy
Senior Member
 
Location: China

Join Date: Feb 2009
Posts: 116
Default

Quote:
Originally Posted by swbarnes2 View Post
I've had too many experiences where the BAQ calculations ate real SNPs, I always turn it off now. So you might be losing lots of real SNPs in mpileup, which is why the ones you do get aren't as likely to be real.
hi,

I have checked the difference between the mpileup results with and without -B. The following SNP is filtered without -B but retained with -B
Code:
scaffold1454    124026  .       C       T       74      .       DP=8;VDB=0.0000;AF1=1;AC1=2;DP4=0,0,0,8;MQ=25;FQ=-51    GT:PL:DP:SP:GQ  1/1:107,24,0:8:0:45
but when i check the reads quality from the cns file generated from pileup
Code:
scaffold1454    124026  C       N       0       0       0       8       ^:t^:t^:t^:t^:t^:t^:t^:t        !!!!!!!!
It seems this position locates at the end of the read and the quality is not very high. So the filtering is reasonable.
pengchy is offline   Reply With Quote
Old 12-09-2011, 06:21 AM   #10
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 415
Default

I've also found BAQ correction to be unhelpful when comparing triplicates of closely related bacteria.

Without BAQ, about 90-95 % of reads covering SNPs are listed as "high quality" in Samtools VCF output format.

With BAQ, frequently only about 5-10 % of reads covering SNPs are high quality.

I am not sure if BAQ has been extensively tested on bacteria, which in this example have higher rates of variation, i.e. more SNPs in any given region, than in humans.
colindaven is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:35 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO