SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Who is the culprit? Picard, Samtools or GATK? newbietonextgen Bioinformatics 2 07-16-2012 05:47 PM
concordance between GATK and Samtools caswater Bioinformatics 6 06-05-2012 11:42 PM
GATK vs Samtools FQ marcela Bioinformatics 1 03-02-2012 09:56 AM
minimum depth variant calling samtools/gatk m_elena_bioinfo Bioinformatics 1 12-06-2011 08:31 AM
SAMTOOLS AND GATK input file (base qualities) hrajasim Illumina/Solexa 0 05-13-2011 05:58 AM

Reply
 
Thread Tools
Old 07-15-2012, 04:40 PM   #1
ramirob
Member
 
Location: Vermont

Join Date: Apr 2012
Posts: 14
Default Little concordance between GATK and Samtools

Hello,

I have read that GATK and Samtools should give you about 90% concordance (Hugeseq test). I am not able to reproduce that and was wondering if anyone has done it and can tell me what parameters they used??

I am doing:

java -Xmx4g -jar aln.bam -R ucsc.hg19.fasta --min_base_quality_score 30 -T UnifiedGenotyper -o gatk.vcf

samtools mpileup -uf ucsc.hg19.fasta --Q 30 -D -g -I -S aln.bam | bcftools view -bvcg - > samtools.bcf
bcftools view samtools.bcf > samtools.vcf

Any insight?

Thanks in advance,
Ramiro
ramirob is offline   Reply With Quote
Old 07-17-2012, 01:58 AM   #2
Bukowski
Senior Member
 
Location: UK

Join Date: Jan 2010
Posts: 390
Default

So what concordance *are* you getting?
Bukowski is offline   Reply With Quote
Old 07-17-2012, 05:50 AM   #3
ramirob
Member
 
Location: Vermont

Join Date: Apr 2012
Posts: 14
Default

Well, with vcf-compare I am getting about 30%!

# The command line was: vcf-compare(r731) chr10gatk.raw.vcf.gz chr10.samtools.raw.vcf.gz
#
#VN 'Venn-Diagram Numbers'. Use `grep ^VN | cut -f 2-` to extract this part.
#VN The columns are:
#VN 1 .. number of sites unique to this particular combination of files
#VN 2- .. combination of files and space-separated number, a fraction of sites in the file
VN 271 chr10.samtools.raw.vcf.gz (35.4%) chr10gatk.raw.vcf.gz (38.2%)
VN 438 chr10gatk.raw.vcf.gz (61.8%)
VN 494 chr10.samtools.raw.vcf.gz (64.6%)
#SN Summary Numbers. Use `grep ^SN | cut -f 2-` to extract this part.
SN Number of REF matches: 271
SN Number of ALT matches: 270
SN Number of REF mismatches: 0
SN Number of ALT mismatches: 1
SN Number of samples in GT comparison: 0
ramirob is offline   Reply With Quote
Old 07-17-2012, 05:59 AM   #4
Bukowski
Senior Member
 
Location: UK

Join Date: Jan 2010
Posts: 390
Default

Quote:
Originally Posted by ramirob View Post
Well, with vcf-compare I am getting about 30%!

# The command line was: vcf-compare(r731) chr10gatk.raw.vcf.gz chr10.samtools.raw.vcf.gz
#
#VN 'Venn-Diagram Numbers'. Use `grep ^VN | cut -f 2-` to extract this part.
#VN The columns are:
#VN 1 .. number of sites unique to this particular combination of files
#VN 2- .. combination of files and space-separated number, a fraction of sites in the file
VN 271 chr10.samtools.raw.vcf.gz (35.4%) chr10gatk.raw.vcf.gz (38.2%)
VN 438 chr10gatk.raw.vcf.gz (61.8%)
VN 494 chr10.samtools.raw.vcf.gz (64.6%)
#SN Summary Numbers. Use `grep ^SN | cut -f 2-` to extract this part.
SN Number of REF matches: 271
SN Number of ALT matches: 270
SN Number of REF mismatches: 0
SN Number of ALT mismatches: 1
SN Number of samples in GT comparison: 0
OK I admit that seems quite low, but do you really want to be considering the raw, unfiltered output for your comparisons?
Bukowski is offline   Reply With Quote
Old 07-17-2012, 02:19 PM   #5
ramirob
Member
 
Location: Vermont

Join Date: Apr 2012
Posts: 14
Default

Not really, but if I filter the gatk output I would just get less variants wouldn't I? The main question I have is, are the parameters I am using ok? Maybe I can check other alignments. I have been trying to find out what parameters people use that they report 90% concordance, do you know?

Thank you very much for your attention and help!
Ramiro
ramirob is offline   Reply With Quote
Old 07-17-2012, 11:47 PM   #6
Bukowski
Senior Member
 
Location: UK

Join Date: Jan 2010
Posts: 390
Default

Quote:
Originally Posted by ramirob View Post
Not really, but if I filter the gatk output I would just get less variants wouldn't I? The main question I have is, are the parameters I am using ok? Maybe I can check other alignments. I have been trying to find out what parameters people use that they report 90% concordance, do you know?

Thank you very much for your attention and help!
Ramiro
Yes you would get less variants, but they would also be high quality ones.

I don't know much about samtools mpileup parameters but assuming you're working on human data, the GATK best practice documents would be a good place to start for refining GATK parameters.

http://www.broadinstitute.org/gsa/wi...th_the_GATK_v3

There's plenty of pipelines on Github and the like with other peoples parameters exposed. I wasn't able to find information on the parameters in the HugeSeq paper, but I note it was for a very old version of GATK.
Bukowski is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:45 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO