SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Quality score after Illumina run - should it be coverted before samtools and gatk? gfmgfm Bioinformatics 9 08-31-2013 10:01 PM
GATK vs Samtools FQ marcela Bioinformatics 1 03-02-2012 09:56 AM
minimum depth variant calling samtools/gatk m_elena_bioinfo Bioinformatics 1 12-06-2011 08:31 AM
SAMTOOLS AND GATK input file (base qualities) hrajasim Illumina/Solexa 0 05-13-2011 05:58 AM
Default quality encoding system of SAMTools&GATK dingxiaofan1 Bioinformatics 11 03-03-2011 11:27 PM

Reply
 
Thread Tools
Old 05-01-2012, 11:39 PM   #1
caswater
Member
 
Location: Canada

Join Date: Jun 2011
Posts: 47
Default concordance between GATK and Samtools

Hi, I am just not sure about the concordance between GATK and Samtools mpileup.

I have one sample, and the variants called by GATK and Samtools showed 99% concordance for a total of ~3.1 million SNPs. is this number a typical one? The concordance was computed using GATK's SelectVariants module.

Thanks a lot for your advice
caswater is offline   Reply With Quote
Old 05-05-2012, 08:13 PM   #2
adaptivegenome
Super Moderator
 
Location: US

Join Date: Nov 2009
Posts: 437
Default

So I believe in the recently published HUGESEQ paper, they report greater than 90% concordance between GATK and samtools.
adaptivegenome is offline   Reply With Quote
Old 05-05-2012, 09:31 PM   #3
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

99% seems impossible
lh3 is offline   Reply With Quote
Old 05-08-2012, 11:55 AM   #4
caswater
Member
 
Location: Canada

Join Date: Jun 2011
Posts: 47
Default

I tested the overall overlap between SNPs called by GATK and Samtools. In total, 4,115,639 was called by GATK (without imposing filters), and 3,836,479 were overlapped with samtool calls. Then the overlap rate is 93%.

After GATK filtering, 3,097,504 GATK calls among 3,111,272 total are overlapped with samtool calls, giving the overlap rate 99%

I am not sure if my procedure is correct? I used GATK SelectVarints module and get the concordance between the two sets.

Thanks a lot for your advice on this.
caswater is offline   Reply With Quote
Old 05-08-2012, 02:08 PM   #5
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

Quote:
Originally Posted by caswater View Post
I tested the overall overlap between SNPs called by GATK and Samtools. In total, 4,115,639 was called by GATK (without imposing filters), and 3,836,479 were overlapped with samtool calls. Then the overlap rate is 93%.

After GATK filtering, 3,097,504 GATK calls among 3,111,272 total are overlapped with samtool calls, giving the overlap rate 99%

I am not sure if my procedure is correct? I used GATK SelectVarints module and get the concordance between the two sets.

Thanks a lot for your advice on this.
When you calculate concordance like that you are ignoring the calls made by SAMtools that were not made by GATK. So, if SAMtools is calling a lot more SNPs (after GATK filtering) and the GATK calls are almost all in SAMtools, that implies with those parameters that SAMtools is some combination of more sensitive and less specific.
Heisman is offline   Reply With Quote
Old 06-05-2012, 11:36 PM   #6
ramirob
Member
 
Location: Vermont

Join Date: Apr 2012
Posts: 14
Default

Hi everyone,

In relationship to this, I am trying to assess concordance as well, but on the contrary, I am getting an overlap of around 30%! (as described by vcf-compare in vcftools). I am not sure what I am doing that leads to this result. My process is as follows:

bwaAlignment with hg19 -> remove duplicates | realign | recalibrate -> newAlignment

For samtools:
samtools mpileup -f ucsc.hg19.fasta -Q 30 -D -g -S newAlignment.bam | bcftools view -bvcg - > samtools.raw.vcf

for gatk:
java -Xmx4g -jar GenomeAnalysisTK.jar --min_base_quality_score 30 -I newAlignment.bam -R ucsc.hg19.fasta -T UnifiedGenotyper -o gatk.raw.vcf

Should this be giving more SNPs in common? Do I have any extra/missing parameters?

Thanks in advance,
Ramiro
ramirob is offline   Reply With Quote
Old 06-05-2012, 11:42 PM   #7
ramirob
Member
 
Location: Vermont

Join Date: Apr 2012
Posts: 14
Default

Hi everyone,

In relationship to this, I am trying to assess concordance as well, but on the contrary, I am getting an overlap of around 30%! (as described by vcf-compare in vcftools). I am not sure what I am doing that leads to this result. My process is as follows:

bwaAlignment with hg19 -> remove duplicates | realign | recalibrate -> newAlignment

For samtools:
samtools mpileup -f ucsc.hg19.fasta -Q 30 -D -g -S newAlignment.bam | bcftools view -bvcg - > samtools.raw.vcf

for gatk:
java -Xmx4g -jar GenomeAnalysisTK.jar --min_base_quality_score 30 -I newAlignment.bam -R ucsc.hg19.fasta -T UnifiedGenotyper -o gatk.raw.vcf

Should this be giving more SNPs in common? Do I have any extra/missing parameters?

Thanks in advance,
Ramiro
ramirob is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:22 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO