SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
SNP calling using GATK and haplotypes greggrant Bioinformatics 0 08-01-2013 06:29 AM
SNP calling using GATK UnifiedGenotyper baika Bioinformatics 1 02-25-2013 01:41 PM
GATK with non-model organism (Help with making SNP VCF file)) newbietonextgen Bioinformatics 7 09-10-2012 07:59 AM
GATK snp calling wanguan2000 Bioinformatics 0 11-24-2011 08:23 PM
GATK UnifiedGenotyper calling way too many SNPs in vcf swbarnes2 Bioinformatics 0 08-17-2011 01:33 PM

Reply
 
Thread Tools
Old 07-07-2014, 07:56 AM   #1
likebiology
Member
 
Location: haifa, Israel

Join Date: Aug 2013
Posts: 14
Default Asking for vcf file when calling SNP using GATK

Hi all,

I sequenced transcriptome of 7 samples including 3 from one environment and the other 4 from another environment. I did de novo assembly and want to call SNPs using GATK. I merged the unigenes as a reference and now I plan to call the SNP from each sample. There is no reference genome neither knownSites of SNPs.


The command lines I used are listed below:


1. java -jar GenomeAnalysisTK-2.5-2-gf57256b/GenomeAnalysisTK.jar -R mergeunigene_ref.fa -T RealignerTargetCreator -I sample1_dedup.bam -o sample1.intervals

2. java -jar GenomeAnalysisTK-2.5-2-gf57256b/GenomeAnalysisTK.jar -R mergeunigene_ref.fa -T IndelRealigner -targetIntervals sample1.intervals -I sample1_dedup.bam -o sample1_deduprealn.bam

It runs well till here, but when I run BaseRecalibrator, Error is below:

3. java -jar GenomeAnalysisTK-2.5-2-gf57256b/Genome
AnalysisTK.jar -R mergeunigene_ref.fa -T BaseRecalibrator -I sample1_deduprealn7.bam -o sample1.grp
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.5-2-gf57256b):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Invalid command line: This calculation is critically dependent on being able to skip over known variant sites. Please provide a VCF file containing known sites of genetic variation.

Does anybody meet this problem? Any comment is appreciated.
likebiology is offline   Reply With Quote
Old 07-07-2014, 08:30 AM   #2
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,169
Default

When working with a non-model organism for which little or no known variant information is available the GATK developers recommend that you "bootstrap" your own list of known variants following this procedure:

Quote:
I'm working on a genome that doesn't really have a good SNP database yet. I'm wondering if it still makes sense to run base quality score recalibration without known SNPs.

The base quality score recalibrator treats every reference mismatch as indicative of machine error. True polymorphisms are legitimate mismatches to the reference and shouldn't be counted against the quality of a base. We use a database of known polymorphisms to skip over most polymorphic sites. Unfortunately without this information the data becomes almost completely unusable since the quality of the bases will be inferred to be much much lower than it actually is as a result of the reference-mismatching SNP sites.

However, all is not lost if you are willing to experiment a bit. You can bootstrap a database of known SNPs. Here's how it works:

First do an initial round of SNP calling on your original, unrecalibrated data.

Then take the SNPs that you have the highest confidence in and use that set as the database of known SNPs by feeding it as a VCF file to the base quality score recalibrator.

Finally, do a real round of SNP calling with the recalibrated data. These steps could be repeated several times until convergence.
Taken from http://gatkforums.broadinstitute.org...libration-bqsr
kmcarr is offline   Reply With Quote
Old 07-07-2014, 11:13 AM   #3
likebiology
Member
 
Location: haifa, Israel

Join Date: Aug 2013
Posts: 14
Default

I thank you Kmcarr very much!

I read the website you recommend and find the following suggestions:

First do an initial round of SNP calling on your original, unrecalibrated data.
Then take the SNPs that you have the highest confidence in and use that set as the database of known SNPs by feeding it as a VCF file to the base quality score recalibrator.
Finally, do a real round of SNP calling with the recalibrated data. These steps could be repeated several times until convergence.

so I did the first round SNP calling using the following command lines:
java -jar GenomeAnalysisTK-2.5-2-gf57256b/GenomeAnalysisTK.jar -T UnifiedGenotyper -R mergedunigene.fa -I sample1_indelrealn7.bam -l INFO -o sample1.vcf -stand_call_conf 10 -stand_emit_conf 30
java -jar GenomeAnalysisTK-2.5-2-gf57256b/GenomeAnalysisTK.jar -T VariantFiltration -R mergedunigen.fa -V sample1.vcf -window 35 -cluster 3 -filterName FS -filter "FS > 30.0" -filterName QD -filter "QD < 2.0" -o samples_final.vcf

I did get a vcf file with SNPs, but how can I use it, just use the SNP with PASS? or other standard? Thank you Kmcarr!!!
likebiology is offline   Reply With Quote
Old 07-07-2014, 11:59 PM   #4
likebiology
Member
 
Location: haifa, Israel

Join Date: Aug 2013
Posts: 14
Default

Hi again,

I am calling SNPs on 7 transcriptome, and now running BaseRecalibrator using GATK. as there is no reference genome and knownSites of SNP, so I called snp directly after IndelRealigner. After filtering, the vcf file(with only PASS sites) was used as knownSites. The command line is:

java -jar GenomeAnalysisTK-2.5-2-gf57256b/GenomeAnalysisTK.jar -R Acomyref.fa -T BaseRecalibrator -I sample1_indelrealn7.bam -knownSites sample1_filter.vcf -o sample1.grp

The error is:
##### ERROR A USER ERROR has occurred (version 2.5-2-gf57256b):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Invalid command line: No tribble type was provided on the command line and the type of the file could not be determined dynamically. Please add an explicit type tag :NAME listing the correct type from among the supported types:
##### ERROR Name FeatureType Documentation
##### ERROR BCF2 VariantContext http://www.broadinstitute.org/gatk/g...BCF2Codec.html
##### ERROR BEAGLE BeagleFeature http://www.broadinstitute.org/gatk/g...agleCodec.html
##### ERROR BED BEDFeature http://www.broadinstitute.org/gatk/g..._BEDCodec.html
##### ERROR BEDTABLE TableFeature http://www.broadinstitute.org/gatk/g...ableCodec.html
##### ERROR EXAMPLEBINARY Feature http://www.broadinstitute.org/gatk/g...naryCodec.html
##### ERROR GELITEXT GeliTextFeature http://www.broadinstitute.org/gatk/g...TextCodec.html
##### ERROR OLDDBSNP OldDbSNPFeature http://www.broadinstitute.org/gatk/g...bSNPCodec.html
##### ERROR RAWHAPMAP RawHapMapFeature http://www.broadinstitute.org/gatk/g...pMapCodec.html
##### ERROR REFSEQ RefSeqFeature http://www.broadinstitute.org/gatk/g...fSeqCodec.html
##### ERROR SAMPILEUP SAMPileupFeature http://www.broadinstitute.org/gatk/g...leupCodec.html
##### ERROR SAMREAD SAMReadFeature http://www.broadinstitute.org/gatk/g...ReadCodec.html
##### ERROR TABLE TableFeature http://www.broadinstitute.org/gatk/g...ableCodec.html
##### ERROR VCF VariantContext http://www.broadinstitute.org/gatk/g..._VCFCodec.html
##### ERROR VCF3 VariantContext http://www.broadinstitute.org/gatk/g...VCF3Codec.html

If anybody met such problems, please show me how to fix it, Thanks!
likebiology is offline   Reply With Quote
Reply

Tags
rna, snp calling

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:35 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO