Seqanswers Leaderboard Ad

**kmcarr** · 07-07-2014, 08:30 AM

When working with a non-model organism for which little or no known variant information is available the GATK developers recommend that you "bootstrap" your own list of known variants following this procedure:

I'm working on a genome that doesn't really have a good SNP database yet. I'm wondering if it still makes sense to run base quality score recalibration without known SNPs.

The base quality score recalibrator treats every reference mismatch as indicative of machine error. True polymorphisms are legitimate mismatches to the reference and shouldn't be counted against the quality of a base. We use a database of known polymorphisms to skip over most polymorphic sites. Unfortunately without this information the data becomes almost completely unusable since the quality of the bases will be inferred to be much much lower than it actually is as a result of the reference-mismatching SNP sites.

However, all is not lost if you are willing to experiment a bit. You can bootstrap a database of known SNPs. Here's how it works:

• First do an initial round of SNP calling on your original, unrecalibrated data.

• Then take the SNPs that you have the highest confidence in and use that set as the database of known SNPs by feeding it as a VCF file to the base quality score recalibrator.

• Finally, do a real round of SNP calling with the recalibrated data. These steps could be repeated several times until convergence.

Taken from http://gatkforums.broadinstitute.org...libration-bqsr

**likebiology** · 07-07-2014, 11:13 AM

I thank you Kmcarr very much!

I read the website you recommend and find the following suggestions:

First do an initial round of SNP calling on your original, unrecalibrated data.
Then take the SNPs that you have the highest confidence in and use that set as the database of known SNPs by feeding it as a VCF file to the base quality score recalibrator.
Finally, do a real round of SNP calling with the recalibrated data. These steps could be repeated several times until convergence.

so I did the first round SNP calling using the following command lines:
java -jar GenomeAnalysisTK-2.5-2-gf57256b/GenomeAnalysisTK.jar -T UnifiedGenotyper -R mergedunigene.fa -I sample1_indelrealn7.bam -l INFO -o sample1.vcf -stand_call_conf 10 -stand_emit_conf 30
java -jar GenomeAnalysisTK-2.5-2-gf57256b/GenomeAnalysisTK.jar -T VariantFiltration -R mergedunigen.fa -V sample1.vcf -window 35 -cluster 3 -filterName FS -filter "FS > 30.0" -filterName QD -filter "QD < 2.0" -o samples_final.vcf

I did get a vcf file with SNPs, but how can I use it, just use the SNP with PASS? or other standard? Thank you Kmcarr!!!

**likebiology** · 07-07-2014, 11:59 PM

Hi again,

I am calling SNPs on 7 transcriptome, and now running BaseRecalibrator using GATK. as there is no reference genome and knownSites of SNP, so I called snp directly after IndelRealigner. After filtering, the vcf file(with only PASS sites) was used as knownSites. The command line is:

java -jar GenomeAnalysisTK-2.5-2-gf57256b/GenomeAnalysisTK.jar -R Acomyref.fa -T BaseRecalibrator -I sample1_indelrealn7.bam -knownSites sample1_filter.vcf -o sample1.grp

The error is:
##### ERROR A USER ERROR has occurred (version 2.5-2-gf57256b):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Invalid command line: No tribble type was provided on the command line and the type of the file could not be determined dynamically. Please add an explicit type tag :NAME listing the correct type from among the supported types:
##### ERROR Name FeatureType Documentation
##### ERROR BCF2 VariantContext http://www.broadinstitute.org/gatk/g...BCF2Codec.html
##### ERROR BEAGLE BeagleFeature http://www.broadinstitute.org/gatk/g...agleCodec.html
##### ERROR BED BEDFeature http://www.broadinstitute.org/gatk/g..._BEDCodec.html
##### ERROR BEDTABLE TableFeature http://www.broadinstitute.org/gatk/g...ableCodec.html
##### ERROR EXAMPLEBINARY Feature http://www.broadinstitute.org/gatk/g...naryCodec.html
##### ERROR GELITEXT GeliTextFeature http://www.broadinstitute.org/gatk/g...TextCodec.html
##### ERROR OLDDBSNP OldDbSNPFeature http://www.broadinstitute.org/gatk/g...bSNPCodec.html
##### ERROR RAWHAPMAP RawHapMapFeature http://www.broadinstitute.org/gatk/g...pMapCodec.html
##### ERROR REFSEQ RefSeqFeature http://www.broadinstitute.org/gatk/g...fSeqCodec.html
##### ERROR SAMPILEUP SAMPileupFeature http://www.broadinstitute.org/gatk/g...leupCodec.html
##### ERROR SAMREAD SAMReadFeature http://www.broadinstitute.org/gatk/g...ReadCodec.html
##### ERROR TABLE TableFeature http://www.broadinstitute.org/gatk/g...ableCodec.html
##### ERROR VCF VariantContext http://www.broadinstitute.org/gatk/g..._VCFCodec.html
##### ERROR VCF3 VariantContext http://www.broadinstitute.org/gatk/g...VCF3Codec.html

If anybody met such problems, please show me how to fix it, Thanks!

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 11 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Asking for vcf file when calling SNP using GATK

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News