SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
VCF file for the Mouse genome (mm9) used for GATK gap Bioinformatics 6 05-23-2014 01:10 PM
GATK VariantRecalibrator tomato2 Bioinformatics 1 05-06-2012 03:28 AM
VariantRecalibrator input file error desmo Bioinformatics 1 04-27-2012 01:44 AM
How to set filter for frequency of reads AND HapMap exome sample results: angerusso Bioinformatics 2 03-22-2012 09:41 AM
Anyone has experience with illumina Omni SNP array? Jiang21 Bioinformatics 0 01-23-2012 05:54 PM

Reply
 
Thread Tools
Old 08-29-2012, 07:51 AM   #1
freshtuo
Junior Member
 
Location: New York

Join Date: Feb 2012
Posts: 4
Default Where to get mouse training set (hapmap/omni) for GATK:VariantRecalibrator

I was calling SNP on mouse samples using GATK and was in the step of "Variant quality score recalibration". The VariantRecalibrator walker asked for training sets for mouse SNPs.

Quote:
Please provide sets of known polymorphic loci marked with the training=true ROD binding tag. For example, -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.b37.sites.vcf
-resourcemni,known=false,training=true,truth=false,prior=12.0 1000G_omni2.5.b37.sites.vcf
Does anyone know where to get those training files (specific for UCSC mm9 assembly)? Thanks!
freshtuo is offline   Reply With Quote
Old 08-29-2012, 08:20 AM   #2
vivek_
PhD Student
 
Location: Denmark

Join Date: Jul 2012
Posts: 164
Default

For mouse you can use the dbSNP. I don't know of any other publicly available resources like Hapmap.
vivek_ is offline   Reply With Quote
Old 08-29-2012, 09:08 AM   #3
freshtuo
Junior Member
 
Location: New York

Join Date: Feb 2012
Posts: 4
Default

Thanks for the quick reply.

I downloaded the dbsnp file (snp128.txt) from http://hgdownload.cse.ucsc.edu/goldenPath/mm9/database/ and converted it into vcf file.

Then I fed that file into the VariantRecalibrator Walker.

java -Xmx4g -jar GenomeAnalysisTK.jar \
-T VariantRecalibrator \
-R Refseq.fa \
-input snps.raw.vcf \
-resource:dbsnp,known=true,training=false,truth=false,prior=6.0 snp128.vcf \
-an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ -an InbreedingCoeff \
-mode BOTH \
-recalFile output.recal \
-tranchesFile output.tranches \
-rscriptFile output.plots.R

The program exited with error messages:

##### ERROR MESSAGE: Invalid command line: No training set found! Please provide sets of known polymorphic loci marked with the training=true ROD binding tag. For example, -resource:hapmap,VCF,known=false,training=true,truth=true,prior=12.0 hapmapFile.vcf
##### ERROR ------------------------------------------------------------------------------------------

I was wondering if I can change the parameters by setting both the training/truth to true:

-resource:dbsnp,known=true,training=true,truth=true,prior=6.0 snp128.vcf

Am I doing in the right way? other solutions? Thanks!

Tuo




Quote:
Originally Posted by vivek_ View Post
For mouse you can use the dbSNP. I don't know of any other publicly available resources like Hapmap.
freshtuo is offline   Reply With Quote
Old 08-30-2012, 12:32 AM   #4
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Depending on exactly what you're doing (i.e. trying to find SNPs within or between strains) the dbSNP may or may not be helpful. For mouse, it lists differences between other strains and the C57Bl/6J reference assembly. If you're interested in SNPs within strains, you'll probably need to bootstrap (i.e. run the process more than once). Also, if you are looking for differences between some strain and the reference, I'm not sure how useful dbSNP would be unless a highly related strain is included (and if so I would filter out all but those SNPs).
dpryan is offline   Reply With Quote
Reply

Tags
gatk, mm9, mouse, variantrecalibrator

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:05 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO