SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
- N options with maq 0.7.1 seq_GA Bioinformatics 1 10-21-2011 04:14 AM
GATK: -I parameter in RealignerTargetCreator jorge Bioinformatics 2 10-18-2011 04:10 AM
GATK RealignerTargetCreator Error = Contigs do not match Hkins552 Bioinformatics 1 07-01-2011 05:59 AM
GATK Queue and options kasthuri Bioinformatics 2 06-10-2011 08:57 AM
Tophat options to report unaligned reads and controlling Bowtie options Siva Bioinformatics 0 10-15-2010 08:38 PM

Reply
 
Thread Tools
Old 09-11-2011, 11:55 AM   #1
efoss
Member
 
Location: Seattle

Join Date: Jul 2011
Posts: 98
Default GATK RealignerTargetCreator -B and -D options

The GATK RealignerTargetCreator has two options for inputting data about known SNPs:

http://www.broadinstitute.org/gsa/wi..._around_indels

java -Xmx1g -jar /path/to/GenomeAnalysisTK.jar \
-T RealignerTargetCreator \
-R /path/to/reference.fasta \
-o /path/to/output.intervals \
[-I /path/to/input.bam] \
[-L intervals] \
[-B:snps,VCF /path/to/SNP_calls.vcf] \
[-B:indels,VCF /path/to/indel_calls.vcf] \
[-D /path/to/dbsnp.rod]
Explanation of Arguments
The -L option is used to restrict the search to a specific region or set of regions instead of the whole genome.
The -o argument is used to specify the list of intervals being output and that should in turn be passed to the realigner in the next step.
The -B snps binding would be used to pass in SNP calls so that the target creator can find clustered SNPs.
The -B indels and dbsnp bindings would be used to pass in known indel sites for the realigner to target.

I don't understand the difference between the -B and the -D options. I have used the -B option often with this file (from the GATK resource bundle):

00-All.vcf

I saw that the resource bundle also has a file called "dbsnp_132.b37.vcf", and I'm tempted to use that with the -D option, but I really don't know what I'm doing with that. Does anyone understand the difference between these options?

Thank you.

Eric
efoss is offline   Reply With Quote
Old 10-03-2011, 09:13 AM   #2
yasashiku
Member
 
Location: Salt Lake City, UT

Join Date: Jul 2011
Posts: 12
Default

I think
Code:
-D <file name>
was the old way of doing
Code:
-B:dbsnp,VCF <file name>
But now (with the latest version of GATK), it looks like the -B option is deprecated? Now I think you want to use:
Code:
-known:dbsnp,VCF <file name>
I could be wrong, especially about the recent changes. If I am, someone please correct me

Given that they're the same option, I doubt giving it two different files is a good idea (I bet it will crash if you try).

Last edited by yasashiku; 10-03-2011 at 10:21 AM.
yasashiku is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:51 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO