Seqanswers Leaderboard Ad

**shyam_la** · 06-27-2012, 09:03 AM

Hey Jane,

Where did you find the documentation for these 40/50 additional parameters? I did not find them on the MuTect website. I am going to run it with fraction_contamination as my tumor sample is not as pure as yours since I am working with tiny solid tumors (I am assuming yours are leukemias given the very low contamination amount)..

Thank you for that new piece of info!

**shyam_la** · 06-27-2012, 10:24 AM

I'm reporting back on the parameter "fraction_contamination". It reduced my variant number by 25%, which is natural given my contamination was somewhere between 0.15 and 0.2.
Also the tumor alternate allelic fraction (t_alt_count / t_ref_count), has improved significantly..

**Jane M** · 06-28-2012, 12:52 AM

Originally posted by shyam_la View Post

Hey Jane,

Where did you find the documentation for these 40/50 additional parameters? I did not find them on the MuTect website.

Simply:

java -Xmx10g -jar muTect-1.0.27783.jar --analysis_type MuTect

usage: java -jar muTect-1.0.27783.jar -T <analysis_type> [-I <input_file>] [-SM <sample_metadata>] [-rbs <read_buffer_size>]
[-et <phone_home>] [-rf <read_filter>] [-L <intervals>] [-XL <excludeIntervals>] [-R <reference_sequence>] [-B
<rodBind>] [-BTI <rodToIntervalTrackName>] [-BTIMR <BTI_merge_rule>] [-ndrs] [-D <DBSNP>] [-dt <downsampling_type>]
[-dfrac <downsample_to_fraction>] [-dcov <downsample_to_coverage>] [-baq <baq>] [-baqGOP <baqGapOpenPenalty>] [-PF
<performanceLog>] [-OQ] [-DBQ <defaultBaseQualities>] [-S <validation_strictness>] [-U <unsafe>] [-nt <num_threads>]
[-im <interval_merging>] [-rgbl <read_group_black_list>] [--disable_experimental_low_memory_sharding] [-l
<logging_level>] [-log <log_to_file>] [-h] [-o <out>] [--noop] [--tumor_sample_name <tumor_sample_name>]
[--bam_tumor_sample_name <bam_tumor_sample_name>] [--normal_sample_name <normal_sample_name>] [--force_output]
[--force_alleles] [--initial_tumor_lod <initial_tumor_lod>] [--initial_tumor_fstar_lod <initial_tumor_fstar_lod>]
[--tumor_lod <tumor_lod>] [--fraction_contamination <fraction_contamination>] [--minimum_mutation_cell_fraction
<minimum_mutation_cell_fraction>] [--normal_lod <normal_lod>] [--dbsnp_normal_lod <dbsnp_normal_lod>]
[--minimum_normal_allele_fraction <minimum_normal_allele_fraction>] [--tumor_f_pretest <tumor_f_pretest>] [--min_qscore
<min_qscore>] [--gap_events_threshold <gap_events_threshold>] [--heavily_clipped_read_fraction
<heavily_clipped_read_fraction>] [--clipping_bias_pvalue_threshold <clipping_bias_pvalue_threshold>]
[--min_mutant_sum_pretest <min_mutant_sum_pretest>] [--fraction_mapq0_threshold <fraction_mapq0_threshold>]
[--pir_median_threshold <pir_median_threshold>] [--pir_mad_threshold <pir_mad_threshold>]
[--max_alt_alleles_in_normal_count <max_alt_alleles_in_normal_count>] [--max_alt_alleles_in_normal_qscore_sum
<max_alt_alleles_in_normal_qscore_sum>] [--max_alt_allele_in_normal_fraction <max_alt_allele_in_normal_fraction>] [-cov
<coverage_file>] [-cov_q20 <coverage_20_q20_file>] [-pow <power_file>] [-tdf <tumor_depth_file>] [-ndf
<normal_depth_file>] [--power_constant_qscore <power_constant_qscore>] [--absolute_copy_number_data
<absolute_copy_number_data>] [--power_constant_af <power_constant_af>]

-T,--analysis_type <analysis_type> Type of analysis to run
-I,--input_file <input_file> SAM or BAM file(s)
-SM,--sample_metadata <sample_metadata> Sample file(s) in JSON format
-rbs,--read_buffer_size <read_buffer_size> Number of reads per SAM file to buffer in memory
-et,--phone_home <phone_home> What kind of GATK run report should we generate? Standard is the default, can be verbose or NO_ET so nothing is posted to the run repository (NO_ET|STANDARD|STDOUT|AWS_S3)
-rf,--read_filter <read_filter> Specify filtration criteria to apply to each read
individually.
-L,--intervals <intervals> A list of genomic intervals over which to operate. Can be explicitly specified on the command line or in a file.
-XL,--excludeIntervals <excludeIntervals> A list of genomic intervals to exclude from processing. Can be explicitly specified on the command line or in a file.
-R,--reference_sequence <reference_sequence> Reference sequence file
-B,--rodBind <rodBind> Bindings for reference-ordered data, in the form :<name>,<type> <file>
-BTIMR,--BTI_merge_rule <BTI_merge_rule> Indicates the merging approach the interval parser should use to combine the BTI track with other -L options (UNION| INTERSECTION)
-ndrs,--nonDeterministicRandomSeed Makes the GATK behave non deterministically, that is, the
random numbers generated will be different in every run
CALCULATE_AS_NECESSARY|RECALCULATE)
etc ...

You can find these info in most of the java scripts (VarScan2 for example).

I am going to run it with fraction_contamination as my tumor sample is not as pure as yours since I am working with tiny solid tumors (I am assuming yours are leukemias given the very low contamination amount)..

Good guess

Thank you for that new piece of info!

You are welcome, that's the aim of the topic!

**shyam_la** · 06-28-2012, 08:49 AM

Oh, thank you! I will try out a few of those parameters too..

**huyvuong** · 02-12-2013, 04:19 PM

Thank very much for the information. I wonder if you can show the default values of those parameters such as tumor_f_pretest, initial_tumor_lob, tumor_lob etc... I followed the documentation on Mutect website and it seems like using default values is fine. But I want to know these values nevertheless. Thank in advance.

**rnahar** · 01-07-2014, 01:48 AM

Was anyone able to get the default values for all the parameters of mutect ?
Also is there any parameter in mutect (which I do not find using the -h option) which could change the cutoffs for read counts (total and alt allele) for the tumor sample ?
Also it seems that the --minimum_normal_allele_fraction parameter is not functioning (has no effect) or may be I do not understand it properly. Any inputs would be appreciated.
Thanks

**jp.** · 04-06-2014, 03:47 AM

I am using GenomeAnalysis-3.1-1/GenomeAnalysisTK.jar -T BaseRecalibrator --defaultBaseQualities 1; is it okay ? My samples are from PE101 Illumina of Cancer Cell lines with no normal control.
My questions are: --defaultBaseQualities 1
1. can I manually filter out low quality bases later ?
2. I think, using --defaultBaseQualities 1 will keep all bad/low quality bases and I can mannualy filter them out later ...
May someone please guide me...
thank you

**EugeniaD** · 08-11-2014, 07:05 AM

Mutect default parameters

Hello everyone,

I see the thread is little bit old but I guess there are even more people interested in Mutect now so hope the info could be useful.

About Mutect default parameters. I was also searching for a long.
Some filters are described in Mutect paper (http://www.nature.com/nbt/journal/v3.../nbt.2514.html).

And for the rest: I just realized that they are all listed in the vcf header.
Eg: clipping_bias_pvalue_threshold=0.05 ; tumor_f_pretest=0.0050 ; initial_tumor_lod=4.0 ; tumor_lod=6.3 ...
The thing is that Mutect doesn't output vcf by default (use -vcf parameter) - maybe that's why it took me so long to find it out

**MrFronk** · 09-02-2014, 05:38 AM

Hi all,

I realise this is an old thread, but I too only recently dove into MuTect.
What we've found out so far (running mutect 1.1.4), is that when you use the options:

java -Xmx4g -jar ../MuTect/muTect-1.1.4.jar -h

It will show the help options of GATK, not MuTect (mutect is based on the GATK framework
as far as I saw.

output shown here:

---------------------------------------------------------------------------------
The Genome Analysis Toolkit (GATK) v2.2-25-g2a68eab, Compiled 2012/11/08 10:30:02
Copyright (c) 2010 The Broad Institute
For support and documentation go to http://www.broadinstitute.org/gatk
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
usage: java -jar muTect-1.1.4.jar -T <analysis_type> [-args <arg_file>] [-I <input_file>] [-rbs <read_buffer_size>] [-et
<phone_home>] [-K <gatk_key>] [-tag <tag>] [-rf <read_filter>] [-L <intervals>] [-XL <excludeIntervals>] [-isr
<interval_set_rule>] [-im <interval_merging>] [-ip <interval_padding>] [-R <reference_sequence>] [-ndrs]
[--disableRandomization] [-maxRuntime <maxRuntime>] [-maxRuntimeUnits <maxRuntimeUnits>] [-dt <downsampling_type>]
[-dfrac <downsample_to_fraction>] [-dcov <downsample_to_coverage>] [-baq <baq>] [-baqGOP <baqGapOpenPenalty>] [-PF
<performanceLog>] [-OQ] [-BQSR <BQSR>] [-DIQ] [-EOQ] [-preserveQ <preserve_qscores_less_than>] [-DBQ
<defaultBaseQualities>] [-S <validation_strictness>] [-rpr] [-kpr] [-U <unsafe>] [-nt <num_threads>] [-nct
<num_cpu_threads_per_data_thread>] [-mte] [-bfh <num_bam_file_handles>] [-rgbl <read_group_black_list>] [-ped
<pedigree>] [-pedString <pedigreeString>] [-pedValidationType <pedigreeValidationType>] [-l <logging_level>] [-log
<log_to_file>] [-h]

-T,--analysis_type <analysis_type> Type of analysis to run
-args,--arg_file <arg_file> Reads arguments from the specified file
-I,--input_file <input_file> SAM or BAM file(s)
-rbs,--read_buffer_size <read_buffer_size> Number of reads per SAM file to buffer in
memory
-et,--phone_home <phone_home> What kind of GATK run report should we
generate? STANDARD is the default, can be
NO_ET so nothing is posted to the run
repository. Please see
-phone-home-and-how-does-it-affect-me#latest
for details. (NO_ET|STANDARD|STDOUT)
-K,--gatk_key <gatk_key> GATK Key file. Required if running with -et
NO_ET. Please see
-phone-home-and-how-does-it-affect-me#latest
for details.
-tag,--tag <tag> Arbitrary tag string to identify this GATK
run as part of a group of runs, for later
analysis
-rf,--read_filter <read_filter> Specify filtration criteria to apply to each
read individually
-L,--intervals <intervals> One or more genomic intervals over which to
operate. Can be explicitly specified on the
command line or in a file (including a rod
file)
-XL,--excludeIntervals <excludeIntervals> One or more genomic intervals to exclude
from processing. Can be explicitly specified
on the command line or in a file (including
a rod file)
-isr,--interval_set_rule <interval_set_rule> Indicates the set merging approach the
interval parser should use to combine the
various -L or -XL inputs (UNION|
INTERSECTION)
-im,--interval_merging <interval_merging> Indicates the interval merging rule we
should use for abutting intervals (ALL|
OVERLAPPING_ONLY)
-ip,--interval_padding <interval_padding> Indicates how many basepairs of padding to
include around each of the intervals
specified with the -L/--intervals argument
-R,--reference_sequence <reference_sequence> Reference sequence file
-ndrs,--nonDeterministicRandomSeed Makes the GATK behave non deterministically,
that is, the random numbers generated will
be different in every run
--disableRandomization Completely eliminates randomization from
nondeterministic methods. To be used mostly
in the testing framework where dynamic
parallelism can result in differing numbers
of calls to the generator.
-maxRuntime,--maxRuntime <maxRuntime> If provided, that GATK will stop execution
cleanly as soon after maxRuntime has been
exceeded, truncating the run but not exiting
with a failure. By default the value is
interpreted in minutes, but this can be
changed by maxRuntimeUnits
-maxRuntimeUnits,--maxRuntimeUnits <maxRuntimeUnits> The TimeUnit for maxRuntime (NANOSECONDS|
MICROSECONDS|MILLISECONDS|SECONDS|MINUTES|
HOURS|DAYS)
-dt,--downsampling_type <downsampling_type> Type of reads downsampling to employ at a
given locus. Reads will be selected
randomly to be removed from the pile based
on the method described here (NONE|ALL_READS|
BY_SAMPLE)
-dfrac,--downsample_to_fraction <downsample_to_fraction> Fraction [0.0-1.0] of reads to downsample to
-dcov,--downsample_to_coverage <downsample_to_coverage> Coverage [integer] to downsample to at any
given locus; note that downsampled reads are
randomly selected from all possible reads at
a locus
-baq,--baq <baq> Type of BAQ calculation to apply in the
engine (OFF|CALCULATE_AS_NECESSARY|
RECALCULATE)
-baqGOP,--baqGapOpenPenalty <baqGapOpenPenalty> BAQ gap open penalty (Phred Scaled).
Default value is 40. 30 is perhaps better
for whole genome call sets
-PF,--performanceLog <performanceLog> If provided, a GATK runtime performance log
will be written to this file
-OQ,--useOriginalQualities If set, use the original base quality scores
from the OQ tag when present instead of the
standard scores
-BQSR,--BQSR <BQSR> The input covariates table file which
enables on-the-fly base quality score
recalibration
-DIQ,--disable_indel_quals If true, disables printing of base insertion
and base deletion tags (with -BQSR)
-EOQ,--emit_original_quals If true, enables printing of the OQ tag with
the original base qualities (with -BQSR)
-preserveQ,--preserve_qscores_less_than <preserve_qscores_less_than> Bases with quality scores less than this
threshold won't be recalibrated (with -BQSR)
-DBQ,--defaultBaseQualities <defaultBaseQualities> If reads are missing some or all base
quality scores, this value will be used for
all base quality scores
-S,--validation_strictness <validation_strictness> How strict should we be with validation
(STRICT|LENIENT|SILENT)
-rpr,--remove_program_records Should we override the Walker's default and
remove program records from the SAM header
-kpr,--keep_program_records Should we override the Walker's default and
keep program records from the SAM header
-U,--unsafe <unsafe> If set, enables unsafe operations: nothing
will be checked at runtime. For expert
users only who know what they are doing. We
do not support usage of this argument.
(ALLOW_UNINDEXED_BAM|
ALLOW_UNSET_BAM_SORT_ORDER|
NO_READ_ORDER_VERIFICATION|
ALLOW_SEQ_DICT_INCOMPATIBILITY|
LENIENT_VCF_PROCESSING|ALL)
-nt,--num_threads <num_threads> How many data threads should be allocated to
running this analysis.
-nct,--num_cpu_threads_per_data_thread <num_cpu_threads_per_data_thread> How many CPU threads should be allocated per
data thread to running this analysis?
-mte,--monitorThreadEfficiency Enable GATK threading efficiency monitoring
-bfh,--num_bam_file_handles <num_bam_file_handles> The total number of BAM file handles to keep
open simultaneously
-rgbl,--read_group_black_list <read_group_black_list> Filters out read groups matching
<TAG>:<STRING> or a .txt file containing the
filter strings one per line.
-ped,--pedigree <pedigree> Pedigree files for samples
-pedString,--pedigreeString <pedigreeString> Pedigree string for samples
-pedValidationType,--pedigreeValidationType <pedigreeValidationType> How strict should we be in validating the
pedigree information? (STRICT|SILENT)
-l,--logging_level <logging_level> Set the minimum level of logging, i.e.
setting INFO get's you INFO up to FATAL,
setting ERROR gets you ERROR and FATAL level
logging.
-log,--log_to_file <log_to_file> Set the logging location
-h,--help Generate this help message

To get the mutect specific options, add -T MuTect to the command like this:

java -Xmx4g -jar ../MuTect/muTect-1.1.4.jar -T MuTect -h

Which will now show additional options like shown below.

Arguments for MalformedReadFilter:
-filterMBQ,--filter_mismatching_base_and_quals if a read has mismatching number of bases and base qualities, filter
out the read instead of blowing up.

Arguments for MuTect:
--noop used for debugging, basically
exit as soon as we get the
reads
--tumor_sample_name <tumor_sample_name> name to use for tumor in
output files
--bam_tumor_sample_name <bam_tumor_sample_name> if the tumor bam contains
multiple samples, only use
read groups with SM equal to
this value
--normal_sample_name <normal_sample_name> name to use for normal in
output files
--force_output force output for each site
--force_alleles force output for all alleles
at each site
--only_passing_calls only emit passing calls
--initial_tumor_lod <initial_tumor_lod> Initial LOD threshold for
calling tumor variant
--tumor_lod <tumor_lod> LOD threshold for calling
tumor variant
--fraction_contamination <fraction_contamination> estimate of fraction (0-1) of
physical contamination with
other unrelated samples
--minimum_mutation_cell_fraction <minimum_mutation_cell_fraction> minimum fraction of cells
which are presumed to have a
mutation, used to handle
non-clonality and
contamination
--normal_lod <normal_lod> LOD threshold for calling
normal non-germline
--dbsnp_normal_lod <dbsnp_normal_lod> LOD threshold for calling
normal non-variant at dbsnp
sites
--somatic_classification_normal_power_threshold Power threshold for normal to
<somatic_classification_normal_power_threshold> determine germline vs variant
--minimum_normal_allele_fraction <minimum_normal_allele_fraction> minimum allele fraction to be
considered in normal, useful
for normal sample contaminated
with tumor
--tumor_f_pretest <tumor_f_pretest> for computational efficiency,
reject sites with allelic
fraction below this threshold
--min_qscore <min_qscore> threshold for minimum base
quality score
--gap_events_threshold <gap_events_threshold> how many gapped events
(ins/del) are allowed in
proximity to this candidate
--heavily_clipped_read_fraction <heavily_clipped_read_fraction> if this fraction or more of
the bases in a read are
soft/hard clipped, do not use
this read for mutation calling
--clipping_bias_pvalue_threshold <clipping_bias_pvalue_threshold> pvalue threshold for fishers
exact test of clipping bias in
mutant reads vs ref reads
--fraction_mapq0_threshold <fraction_mapq0_threshold> threshold for determining if
there is relatedness between
the alt and ref allele read
piles
--pir_median_threshold <pir_median_threshold> threshold for clustered read
position artifact median
--pir_mad_threshold <pir_mad_threshold> threshold for clustered read
position artifact MAD
--required_maximum_alt_allele_mapping_quality_score required minimum value for
<required_maximum_alt_allele_mapping_quality_score> tumor alt allele maximum
mapping quality score
--max_alt_alleles_in_normal_count <max_alt_alleles_in_normal_count> threshold for maximum
alternate allele counts in
normal
--max_alt_alleles_in_normal_qscore_sum <max_alt_alleles_in_normal_qscore_sum> threshold for maximum
alternate allele quality score
sum in normal
--max_alt_allele_in_normal_fraction <max_alt_allele_in_normal_fraction> threshold for maximum
alternate allele fraction in
normal
--power_constant_qscore <power_constant_qscore> Phred scale quality score
constant to use in power
calculations
--absolute_copy_number_data <absolute_copy_number_data> Absolute Copy Number Data, as
defined by Absolute, to use in
power calculations
--power_constant_af <power_constant_af> Allelic fraction constant to
use in power calculations
-o,--out <out> Call-stats output
-vcf,--vcf <vcf> VCF output of mutation
candidates
-dbsnp,--dbsnp <dbsnp> VCF file of DBSNP information
-cosmic,--cosmic <cosmic> VCF file of COSMIC sites
-cov,--coverage_file <coverage_file> write out coverage in WIGGLE
format to this file
-cov_q20,--coverage_20_q20_file <coverage_20_q20_file> write out 20x of Q20 coverage
in WIGGLE format to this file
-pow,--power_file <power_file> write out power in WIGGLE
format to this file
-tdf,--tumor_depth_file <tumor_depth_file> write out tumor read depth in
WIGGLE format to this file
-ndf,--normal_depth_file <normal_depth_file> write out normal read depth in
WIGGLE format to this file

Available Reference Ordered Data types:
Name FeatureType Documentation
BCF2 VariantContext http://www.broadinstitute.org/gatk/g...BCF2Codec.html
BED BEDFeature http://www.broadinstitute.org/gatk/g..._BEDCodec.html
EXAMPLEBINARY Feature http://www.broadinstitute.org/gatk/g...naryCodec.html
GELITEXT GeliTextFeature http://www.broadinstitute.org/gatk/g...TextCodec.html
OLDDBSNP OldDbSNPFeature http://www.broadinstitute.org/gatk/g...bSNPCodec.html
VCF VariantContext http://www.broadinstitute.org/gatk/g..._VCFCodec.html

For a full description of this walker, see its GATKdocs at:
http://www.broadinstitute.org/gatk/g...ct_MuTect.html

So yes, I found the documentation not very helpfull at all, and I'm struggling with the package as well. Hope this info helps.

- Frank.

**patterja** · 05-29-2015, 10:08 AM

Mutect Parameters

I'm not sure how pertinent this will be considering the age of the thread but I thought I would reply.
I found the Mutect default parameters by running Mutect once with the following parameters:
--enable_extended_output \
--vcf
This is not the default. The resulting vcf header will have the defaults thresholds for Mutect. Here's an example of clip of my vcf header. THIS JUST AN EXAMPLE. I tweaked some of these parameters so you are not viewing the defaults. So make sure you run your own version of Mutect and look at the vcf headerto find the defaults. It seems silly to bury it in the vcf header and not posting it ANYWHERE else.
...
downsample_to_coverage=1000 enable_experimental_downsampling=false baq=OFF baqGapOpenPenalty=40.0 performanceLog=null useOriginalQualities=false BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 defaultBaseQualities=-1 validation_strictness=SILENT remove_program_records=false keep_program_records=false unsafe=null num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false logging_level=INFO log_to_file=null help=false noop=false enable_extended_output=true artifact_detection_mode=false tumor_sample_name=1002_tumor_52 bam_tumor_sample_name=null normal_sample_name=1002_Normal_53 force_output=false force_alleles=false only_passing_calls=false initial_tumor_lod=4.0 tumor_lod=6.3 fraction_contamination=0.02 minimum_mutation_cell_fraction=0.0 normal_lod=2.2 normal_artifact_lod=1.0 strand_artifact_lod=2.0 strand_artifact_power_threshold=0.9 dbsnp_normal_lod=5.5 somatic_classification_normal_power_threshold=0.95 minimum_normal_allele_fraction=0.0 tumor_f_pretest=0.0050 min_qscore=5 gap_events_threshold=3 heavily_clipped_read_fraction=0.3 clipping_bias_pvalue_threshold=0.05 fraction_mapq0_threshold=0.5 pir_median_threshold=10.0 pir_mad_threshold=3.0 required_maximum_alt_allele_mapping_quality_score=20 max_alt_alleles_in_normal_count=2 max_alt_alleles_in_normal_qscore_sum=20 max_alt_allele_in_normal_fraction=0.03 power_constant_qscore=30 absolute_copy_number_data=null power_constant_af=0.30000001192092896 vcf=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub
...

So it's a complicated process to figure out why a single mutation may have failed Mutect filters. Here's how I do it.
1) Check your Mutect callstats.txt file and find your mutation. Check out the "failure reason" column and it should give you a reason, ie. normal_lod, f_star_tumor_lod, alt_allele_in_normal, etc.
2) Go to this website: http://gatkforums.broadinstitute.org...date-mutations
This site connects the failure reason to the column name in the callstats.txt output that it's associated with.
For example, if the failure reason is "alt_allele_in_normal" then go to the "n_alt_count" or "normal_f" column in your extended output callstats.txt file to find the value.
3) Look at the vcf header to find the default threshold:
(from above)
max_alt_alleles_in_normal_count=2
max_alt_allele_in_normal_fraction=0.03
4) Rerun mutect with these thresholds lowered accordingly to include your mutation. So adding parameters to your command like:
--max_alt_alleles_in_normal_count=5 \
--max_alt_allele_in_normal_fraction=0.2

You could also filter callstats.txt manually if you weren't concerned about getting corrected VCF and other files.

It's a wonder that Mutect finds anything interesting at all with the stringency of their filters. 90% of the time Mutect defaults filter out my primary somatic mutation in most of the cancer types I study. This is most likely because of the heterogeity of the tumor and the often mixed tumor contamainated nature of the paired normal. However, even with a lowered threshold of alt allelic fraction to 20% and the a count of about 6, it still will sometimes miss some mutations that are near indels. There's is a gap_threshold parameter in Mutect but if you've followed the GATK best practices protocol and realigned around indels those gaps are pretty frequent, and I've noticed a decrease sensitivity in mutation detection around these areas, while lowering the default threshold will explode the number of false positives.
Broad has yet to fix it's deprecated indel detector, but I predict future best practices pipeline will start with an indel caller first which will be used by the SNP detector (Mutect) to discover SNPs. Knowing the location of the indels, might fix these problems.

**patterja** · 06-04-2015, 01:38 PM

Mutect Parameters

I'm not sure how pertinent this will be considering the age of the thread but I thought I would reply, since I wasn't able to find any sources of information when I was struggling with this.
I found the Mutect default filters by running Mutect once with the following parameters:
--enable_extended_output \
--vcf
This is not the default. The resulting vcf header will have the defaults thresholds for Mutect. Here's an example of a clip of my vcf header. THIS JUST AN EXAMPLE. I tweaked some of these parameters so you are not viewing the defaults. So make sure you run your own version of Mutect and look at the vcf header to find the defaults. It seems silly to bury it in the vcf header and not posting it ANYWHERE else.

##fileformat=VCFv4.1
##FILTER=<ID=PASS,Description="Accept as a confident somatic mutation">
...
##MuTect="analysis_type=MuTect ...
...
downsample_to_coverage=1000 enable_experimental_downsampling=false baq=OFF baqGapOpenPenalty=40.0 performanceLog=null useOriginalQualities=false BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 defaultBaseQualities=-1 validation_strictness=SILENT remove_program_records=false keep_program_records=false unsafe=null num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false logging_level=INFO log_to_file=null help=false noop=false enable_extended_output=true artifact_detection_mode=false tumor_sample_name=1002_tumor_52 bam_tumor_sample_name=null normal_sample_name=1002_Normal_53 force_output=false force_alleles=false only_passing_calls=false initial_tumor_lod=4.0 tumor_lod=6.3 fraction_contamination=0.02 minimum_mutation_cell_fraction=0.0 normal_lod=2.2 normal_artifact_lod=1.0 strand_artifact_lod=2.0 strand_artifact_power_threshold=0.9 dbsnp_normal_lod=5.5 somatic_classification_normal_power_threshold=0.95 minimum_normal_allele_fraction=0.0 tumor_f_pretest=0.0050 min_qscore=5 gap_events_threshold=3 heavily_clipped_read_fraction=0.3 clipping_bias_pvalue_threshold=0.05 fraction_mapq0_threshold=0.5 pir_median_threshold=10.0 pir_mad_threshold=3.0 required_maximum_alt_allele_mapping_quality_score=20 max_alt_alleles_in_normal_count=2 max_alt_alleles_in_normal_qscore_sum=20 max_alt_allele_in_normal_fraction=0.03 power_constant_qscore=30 absolute_copy_number_data=null power_constant_af=0.30000001192092896 vcf=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub
...

So to figure out why a single mutation may have failed Mutect filters is a complicated process. Here's how I do it.
1) Check your Mutect callstats.txt file and find your mutation. Check out the "failure reason" column and it should give you a reason, ie. normal_lod, f_star_tumor_lod, alt_allele_in_normal, etc.
2) Go to this website: http://gatkforums.broadinstitute.org...date-mutations
This site connects the failure reason to the column name in the callstats.txt output that it's associated with.
For example, if the failure reason is "alt_allele_in_normal" then go to the "n_alt_count" or "normal_f" column in your extended output callstats.txt file to find the value.
3) Look at the vcf header to find the default threshold:
(from above)
max_alt_alleles_in_normal_count=2
max_alt_allele_in_normal_fraction=0.03
4) Rerun mutect with these thresholds lowered/adjusted accordingly to include your mutation. So adding parameters to your command like:
--max_alt_alleles_in_normal_count=5 \
--max_alt_allele_in_normal_fraction=0.1

You could also filter callstats.txt manually if you weren't concerned about getting corrected VCF and other files.

It's a wonder that Mutect finds anything interesting at all with the stringency of their filters, it really depends on the purity of the your paired normal. 90% of the time Mutect's defaults filter out my primary somatic mutation in most of the cancer types I study. This is most likely because of the heterogeneity of the tumor and the often mixed tumor contamainated nature of the paired normal. However, even with a lowered threshold of alt allelic fraction to 20% and the a count of about 6, it still will sometimes miss some mutations that are near indels. There's is a gap_threshold parameter in Mutect but if you've followed the GATK best practices protocol and realigned around indels those gaps occur pretty frequently because you're realigned around them, and I've noticed a marked decrease in sensitivity in mutation detection around these areas. Lowering the gap_threshold defaults, however, will explode the number of false positives you get, so be careful which parameters you tweak and make sure you have a reason to tweak them.

Broad has yet to fix it's deprecated indel detector, but I predict future best practices pipeline will start with an indel caller first which will be subsequently used by the SNP detector (Mutect) to discover SNPs. Knowing the location of the indels, might fix this decreased sensitivity around indels.

Good luck!

Topics	Statistics	Last Post
TIGR Systems Offer a Compact Alternative to CRISPR for Gene Editing by seqadmin Started by seqadmin, 03-03-2025, 01:15 PM	0 responses 160 views 0 likes	Last Post by seqadmin 03-03-2025, 01:15 PM
Highlights from AGBT 2025 – Part II by seqadmin Started by seqadmin, 02-28-2025, 12:58 PM	0 responses 247 views 0 likes	Last Post by seqadmin 02-28-2025, 12:58 PM
Highlights from AGBT 2025 – Part I by seqadmin Started by seqadmin, 02-24-2025, 02:48 PM	0 responses 620 views 0 likes	Last Post by seqadmin 02-24-2025, 02:48 PM
Selecting the Right AI Model for Bioinformatics Research by seqadmin Started by seqadmin, 02-21-2025, 02:46 PM	0 responses 265 views 0 likes	Last Post by seqadmin 02-21-2025, 02:46 PM

Seqanswers Leaderboard Ad

Announcement

Discussion about MuTect and its parameters

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News