Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mutect Analysis Criteria: Judgement Calls

    Hello everybody, I am dealing with a dataset of cancer tumours sequenced on hiseq. I do not have a matched normal, and I have used Mutect to call somatic variants. I have the following doubts:

    1) How good is mutect at calling variants, when there is no matched normal supplied?

    2) Is the judgement criteria i.e. "KEEP" or "REJECT" an absolute criteria? On what basis is this criteria decided? Will i loose out on a lot of quality variants, if i discount all the variants marked as "REJECT" by mutect, and proceed ahead with only variants marked as "KEEP" in my downstream analysis? There is a lot of ambiguity surrounding this, and I would love to hear the communities thoughts on this subject.

    Thanks a lot for your 2 cents!

  • #2
    Hi,
    same question of ron128.

    Moreover, I'm running Mutect on an exome (Agilent sureselect v5). But in this other case I have both normal and tumor sample. Default parameters for all apart --minimum_mutation_allele_fraction 0.10 and --min_qscore 20 and --clipping_bias_pvalue_threshold 0.05.

    Resulting somatic and "keeped" variants were only 40!

    Is it "normal" to have so few somatic mutations by your experience? I know that it depends by the kind of cancer sample..but just to have a comparison metric.

    I'm also a little bit confused how to deal with the possible sample contamination (by tumor cells) of the normal germline one. I saw the parameter --minimum_normal_allele_fraction. But how to interpret it? My actual 40 somatic variants in the control have always 0-coverage for the "somatic allele" in the normal sample. This is like the tumor_allele_in_control_sample/tumor_allele_in_tumor_sample ratio has to be zero or really close to it. That is a very low control sample contamination is admitted. So, maybe by default --minimum_normal_allele_fraction is set to an high value?

    Thank in advance!
    Last edited by UltimaSeq; 06-19-2013, 02:32 AM.

    Comment


    • #3
      Same question.
      I am running MuTect on my mouse RNA seq data and I get ~600000 calls from Mutect but all of them are Reject. Has anyone experienced this before?
      Is it a problem with my analysis or MuTect?.
      Thanks for help!!!

      Comment


      • #4
        Is there anyone who can tell me normal input for MuTect.
        I have only cancer data without normal or control data of Prostate Cancer Cell Lines Exome seq.
        What --input:control or normal can I use in MuTect ?
        Thanks in advance.

        Comment


        • #5
          Mutect defaults

          I'm not sure how pertinent this will be considering the age of the thread but I thought I would reply, since I wasn't able to find any sources of information when I was struggling with this.
          I found the Mutect default filters by running Mutect once with the following parameters:
          --enable_extended_output \
          --vcf
          This is not the default. The resulting vcf header will have the defaults thresholds for Mutect. Here's an example of a clip of my vcf header. THIS JUST AN EXAMPLE. I tweaked some of these parameters so you are not viewing the defaults. So make sure you run your own version of Mutect and look at the vcf header to find the defaults. It seems silly to bury it in the vcf header and not posting it ANYWHERE else.

          ##fileformat=VCFv4.1
          ##FILTER=<ID=PASS,Description="Accept as a confident somatic mutation">
          ...
          ##MuTect="analysis_type=MuTect ...
          ...
          downsample_to_coverage=1000 enable_experimental_downsampling=false baq=OFF baqGapOpenPenalty=40.0 performanceLog=null useOriginalQualities=false BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 defaultBaseQualities=-1 validation_strictness=SILENT remove_program_records=false keep_program_records=false unsafe=null num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false logging_level=INFO log_to_file=null help=false noop=false enable_extended_output=true artifact_detection_mode=false tumor_sample_name=1002_tumor_52 bam_tumor_sample_name=null normal_sample_name=1002_Normal_53 force_output=false force_alleles=false only_passing_calls=false initial_tumor_lod=4.0 tumor_lod=6.3 fraction_contamination=0.02 minimum_mutation_cell_fraction=0.0 normal_lod=2.2 normal_artifact_lod=1.0 strand_artifact_lod=2.0 strand_artifact_power_threshold=0.9 dbsnp_normal_lod=5.5 somatic_classification_normal_power_threshold=0.95 minimum_normal_allele_fraction=0.0 tumor_f_pretest=0.0050 min_qscore=5 gap_events_threshold=3 heavily_clipped_read_fraction=0.3 clipping_bias_pvalue_threshold=0.05 fraction_mapq0_threshold=0.5 pir_median_threshold=10.0 pir_mad_threshold=3.0 required_maximum_alt_allele_mapping_quality_score=20 max_alt_alleles_in_normal_count=2 max_alt_alleles_in_normal_qscore_sum=20 max_alt_allele_in_normal_fraction=0.03 power_constant_qscore=30 absolute_copy_number_data=null power_constant_af=0.30000001192092896 vcf=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub
          ...

          So to figure out why a single mutation may have failed Mutect filters is a complicated process. Here's how I do it.
          1) Check your Mutect callstats.txt file and find your mutation. Check out the "failure reason" column and it should give you a reason, ie. normal_lod, f_star_tumor_lod, alt_allele_in_normal, etc.
          2) Go to this website: http://gatkforums.broadinstitute.org...date-mutations
          This site connects the failure reason to the column name in the callstats.txt output that it's associated with.
          For example, if the failure reason is "alt_allele_in_normal" then go to the "n_alt_count" or "normal_f" column in your extended output callstats.txt file to find the value.
          3) Look at the vcf header to find the default threshold:
          (from above)
          max_alt_alleles_in_normal_count=2
          max_alt_allele_in_normal_fraction=0.03
          4) Rerun mutect with these thresholds lowered/adjusted accordingly to include your mutation. So adding parameters to your command like:
          --max_alt_alleles_in_normal_count=5 \
          --max_alt_allele_in_normal_fraction=0.1

          You could also filter callstats.txt manually if you weren't concerned about getting corrected VCF and other files.

          It's a wonder that Mutect finds anything interesting at all with the stringency of their filters, it really depends on the purity of the your paired normal. 90% of the time Mutect's defaults filter out my primary somatic mutation in most of the cancer types I study. This is most likely because of the heterogeneity of the tumor and the often mixed tumor contamainated nature of the paired normal. However, even with a lowered threshold of alt allelic fraction to 20% and the a count of about 6, it still will sometimes miss some mutations that are near indels. There's is a gap_threshold parameter in Mutect but if you've followed the GATK best practices protocol and realigned around indels those gaps occur pretty frequently because you're realigned around them, and I've noticed a marked decrease in sensitivity in mutation detection around these areas. Lowering the gap_threshold defaults, however, will explode the number of false positives you get, so be careful which parameters you tweak and make sure you have a reason to tweak them.

          Broad has yet to fix it's deprecated indel detector, but I predict future best practices pipeline will start with an indel caller first which will be subsequently used by the SNP detector (Mutect) to discover SNPs. Knowing the location of the indels, might fix this decreased sensitivity around indels.

          Good luck!
          Last edited by patterja; 06-07-2015, 09:27 PM. Reason: More detailed description

          Comment


          • #6
            Just noticing your comment about problem with indels when using MuTect. You might want to try the RTG somatic caller (part of RTG Core), which uses the same haplotype calling engine as the regular RTG variant callers with the addition of Bayesian somatic mutation modelling, and so automatically handles SNPs, indels, and other complex calls. You do need matched tumor normal samples however.

            The somatic caller in the current release of RTG Core (3.4.5) also outputs variants in putative LOH regions and gain-of-reference calls, so you may want to filter these out for normal somatic small variant detection. Both AVR and the somatic score field are good VCF attributes with which to tailor your precision/recall tradeoff. (We are currently doing additional work on our somatic caller, so expect further improvements in the next releases)

            Cheers,
            Len.
            Len Trigg, Ph.D.
            Real Time Genomics
            www.realtimegenomics.com

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            11 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            51 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            67 views
            0 likes
            Last Post seqadmin  
            Working...
            X