Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GATK: -glm DINDEL question

    I'm following this note in the GATK documentation: http://www.broadinstitute.org/gsa/wi...fied_Genotyper

    However, I get no output when I run the UnifiedGenotyper tool with the -glm DINDEL option. If I run without the -glm DINDEL option, they work fine. There is no error or anything. The program just runs to completion and doesn't seem to report anything.

    My command looks like this:

    Code:
    java -jar GenomeAnalysisTK.jar \
    -l INFO \
    -R human_g1k_v37.fasta \
    -D dbsnp_130_b37.rod \
    -T UnifiedGenotyper \
    -baq CALCULATE_AS_NECESSARY \
    -I recal.bam \
    -o indels.raw.vcf \
    -stand_call_conf 50.0 \
    -stand_emit_conf 10.0 \
    -A AlleleBalance \
    -A DepthOfCoverage \
    -A HaplotypeScore \
    -glm DINDEL \
    -nt 8 \
    -L intervals.interval_list
    Has anyone used this successfully? If so, any tips? I know it's still a bit premature in its development, but I'd like to use it if possible.
    Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
    Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
    Projects: U87MG whole genome sequence [Website] [Paper]

  • #2
    Originally posted by Michael.James.Clark View Post
    I'm following this note in the GATK documentation: http://www.broadinstitute.org/gsa/wi...fied_Genotyper

    However, I get no output when I run the UnifiedGenotyper tool with the -glm DINDEL option. If I run without the -glm DINDEL option, they work fine. There is no error or anything. The program just runs to completion and doesn't seem to report anything.

    My command looks like this:

    Code:
    java -jar GenomeAnalysisTK.jar \
    -l INFO \
    -R human_g1k_v37.fasta \
    -D dbsnp_130_b37.rod \
    -T UnifiedGenotyper \
    -baq CALCULATE_AS_NECESSARY \
    -I recal.bam \
    -o indels.raw.vcf \
    -stand_call_conf 50.0 \
    -stand_emit_conf 10.0 \
    -A AlleleBalance \
    -A DepthOfCoverage \
    -A HaplotypeScore \
    -glm DINDEL \
    -nt 8 \
    -L intervals.interval_list
    Has anyone used this successfully? If so, any tips? I know it's still a bit premature in its development, but I'd like to use it if possible.
    I ran across the same thing and contacted GSA. They said it is a bug and it is still in development, that's why there isn't any documentation yet. I guess we just have to be patient, but I'm keeping an eye on it.

    Comment


    • #3
      whoa... indel calling in the unified genotyper... looking forward to this!

      I'm currently using indel Genotyper V2. Has anyone come up with some thresholds using some of these interesting attributes described in their VCF output?


      ##INFO=<ID=AC,Number=2,Type=Integer,Description="# of reads supporting consensus indel/any indel at the site">
      ##INFO=<ID=DP,Number=1,Type=Integer,Description="total coverage at the site">
      ##INFO=<ID=MM,Number=2,Type=Float,Description="average # of mismatches per consensus indel-supporting read/per reference-supporting read">
      ##INFO=<ID=MQ,Number=2,Type=Float,Description="average mapping quality of consensus indel-supporting reads/reference-supporting reads">
      ##INFO=<ID=NQSBQ,Number=2,Type=Float,Description="Within NQS window: average quality of bases from consensus indel-supporting reads/from reference-supporting reads">
      ##INFO=<ID=NQSMM,Number=2,Type=Float,Description="Within NQS window: fraction of mismatching bases in consensus indel-supporting reads/in reference-supporting reads">
      ##INFO=<ID=SC,Number=4,Type=Integer,Description="strandness: counts of forward-/reverse-aligned indel-supporting reads / forward-/reverse-aligned reference supporting reads">


      I know from their regular SNP calling Unified Genotyper that one can make use of Strand Bias and Quality by Depth and other calculateed variant characteristics beyond just the usual "read depth" threshold most people use and publish with. In fact, I can clean up a lot of FPs with hardfiltering. Their more sophisticated clustering approach doesn't seem to work with my data though...

      I would like to clean up indels similarly, but there are no guidelines.

      Comment


      • #4
        Originally posted by RockChalkJayhawk View Post
        I ran across the same thing and contacted GSA. They said it is a bug and it is still in development, that's why there isn't any documentation yet. I guess we just have to be patient, but I'm keeping an eye on it.
        Alright, thanks for your input. That is unfortunate, but I'll just keep using the regular Dindel program instead.

        As for filters people use, not sure on my end. I'm just dipping my toes into the new ways of analyzing indels. Right now I'm taking a look at the default filters in Dindel.
        Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
        Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
        Projects: U87MG whole genome sequence [Website] [Paper]

        Comment


        • #5
          The indel calling capabilities of the Unified Genotyper are now working as expected, and we are getting good results now on whole genomes and exomes. The exact approach to filtering indels isn't yet clear, but all of the machinery is well. Please have another go at it if you are still looking for indel calling with the GATK.

          Best,

          Mark DePristo

          Comment


          • #6
            How does this implementation compare to the original Dindel program?

            Comment


            • #7
              Originally posted by mdepristo View Post
              The indel calling capabilities of the Unified Genotyper are now working as expected, and we are getting good results now on whole genomes and exomes. The exact approach to filtering indels isn't yet clear, but all of the machinery is well. Please have another go at it if you are still looking for indel calling with the GATK.

              Best,

              Mark DePristo
              Thanks Mark, for both your response and your set of tools. You guys are doing some great stuff over there.

              BTW, do you have any specs on what the empirical sensitivity/specificity/FDR of the Unified Genotyper?

              Comment


              • #8
                Originally posted by mdepristo View Post
                The indel calling capabilities of the Unified Genotyper are now working as expected, and we are getting good results now on whole genomes and exomes. The exact approach to filtering indels isn't yet clear, but all of the machinery is well. Please have another go at it if you are still looking for indel calling with the GATK.

                Best,

                Mark DePristo
                Thanks Mark. I just updated through SVN and I'll give it a shot.

                Question: Any idea how the results compare to the Dindel program from Kees Albers? Wondering if it'll be safe to cross-compare between the two because I have some data that's already been processed by that program that likely isn't going to be rerun through the UnifiedGenotyper DINDEL program.

                (Also, thanks for pushing a release that appears to make the whole thing compatible with hard clipping from Novoalign. )
                Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                Projects: U87MG whole genome sequence [Website] [Paper]

                Comment


                • #9
                  Getting the Indel option to work

                  Hi there,
                  I'm new to working with GATK, just downloaded it today.
                  I am able to get the SNP genotyping to work correctly but when I add the -glm DINDEL option and try to run the UnifiedGenotyper, I get the following message:
                  __________________________________________________________

                  org.broadinstitute.sting.utils.cmdLine.InvalidArgumentException:
                  Argument with name 'glm' isn't defined.
                  at org.broadinstitute.sting.utils.cmdLine.ParsingEngine.validate(ParsingEngine.java:185)
                  at org.broadinstitute.sting.utils.cmdLine.ParsingEngine.validate(ParsingEngine.java:158)
                  at org.broadinstitute.sting.utils.cmdLine.CommandLineProgram.start(CommandLineProgram.java:175)
                  at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:89)
                  ------------------------------------------------------------------------------------------
                  The following error has occurred:


                  Argument with name 'glm' isn't defined.:

                  __________________________________________________________


                  I am entering the following at the command line:
                  java -jar GenomeAnalysisTK.jar \
                  -R human_g1k_v37.fasta \
                  -T UnifiedGenotyper \
                  -I NA12891.chr21.GATK.Reg.sorted.bam \
                  -o NA12891-Indels.vcf \
                  -glm DINDEL

                  Am I msising an argument?

                  Thanks

                  Comment


                  • #10
                    Hmm, perhaps it is only available in the SVN checkout version of GATK?

                    What version number do you have?

                    Comment


                    • #11
                      Ah ok, that may well be it, not very up to speed with where to find everything yet.

                      my version is..
                      The Genome Analysis Toolkit (GATK) v1.0.2695, Compiled 2010/01/26 17:53:46

                      Comment


                      • #12
                        Ignore previous comment - new version works fine now, thanks!

                        Comment


                        • #13
                          Cool. Let us know how you like it vis-a-vis against the old indel genotyper V2. I have my pipeline set to use the indel genotyper V2 so I'm waiting to see some filtering methods mature before I make the switch.. unless anyone thinks the overall quality of the calls is better ?

                          Comment


                          • #14
                            Thanks I will. Trying both out at the moment.

                            Comment


                            • #15
                              Originally posted by NGSfan View Post
                              Cool. Let us know how you like it vis-a-vis against the old indel genotyper V2. I have my pipeline set to use the indel genotyper V2 so I'm waiting to see some filtering methods mature before I make the switch.. unless anyone thinks the overall quality of the calls is better ?
                              I think a lot of us were using Dindel over the old Indel Genotyper V2 because even the GATK group were adamant about it being superior to their own tool (thus them adding it to the UnifiedGenotyper). I personally wouldn't replace part of your pipeline yet, but it might be worth making a module to do either Dindel or the UnifiedGenotyper with DINDEL on to try it out.

                              I'll try this out later today probably and let people know how it looks.

                              Also yeah, GATK is updated constantly so it's really worth using SVN to keep it updated rather than going by the releases.
                              Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                              Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                              Projects: U87MG whole genome sequence [Website] [Paper]

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X