Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GATK UnifiedGenotyper not calling any variants

    I'm trying to align RNASeq data to hg19 and call variants. Qualities look ok, the tophat alignment has been sorted (picard) and tagged with readgroups as per GATK discussions, and it all looks fine in IGV.

    However not a single variant is being called. Here's the message that's most indicative of the problem (confidently called bases = 0):

    Code:
    INFO  23:00:50,322 UnifiedGenotyper - Visited bases                                3101976562
    INFO  23:00:50,323 UnifiedGenotyper - Callable bases                               2864957043
    INFO  23:00:50,323 UnifiedGenotyper - Confidently called bases                     0
    INFO  23:00:50,323 UnifiedGenotyper - % callable bases of all loci                 92.359
    INFO  23:00:50,323 UnifiedGenotyper - % confidently called bases of all loci       0.000
    INFO  23:00:50,324 UnifiedGenotyper - % confidently called bases of callable loci  0.000
    INFO  23:00:50,324 UnifiedGenotyper - Actual calls made                            0
    INFO  23:00:50,325 TraversalEngine - Total runtime 14151.69 secs, 235.86 min, 3.93 hours
    INFO  23:00:50,325 TraversalEngine - 66936259 reads were filtered out during traversal out of 77587056 total (86.27%)
    INFO  23:00:50,326 TraversalEngine -   -> 66936259 reads (86.27% of total) failing MappingQualityUnavailableReadFilter
    Would this kind of message seem to indicate a potential problem with the hg19 reference that I'm using?

    I'm using:
    Code:
    ftp://ftp.broadinstitute.org/pub/seq/references/Homo_sapiens_assembly19.fasta
    I call GATK as follows:
    Code:
    java -jar GenomeAnalysisTK.jar -I karyotypicRG.bam \
       -R Homo_sapiens_assembly19.fasta -T UnifiedGenotyper \
        -o snpCalls.vcf  \
        -stand_call_conf 50.0 \
        -stand_emit_conf 1 \
        -dcov 5000 >& gatkGenotyper.out
    Adding a dbSNP ROD doesn't fix the problem, nor does EMIT_ALL_SITES (e.g., ignore quality scores and call everything).

    Thanks in advance for any thoughts.

  • #2
    May be there is something wrong with the score of your FASTQ files. Are you using Illumina or Phred Scores ? You could try to use BWA for the alignment.

    Comment


    • #3
      I'm using Illumina scores and tophat to find splicing junctions, and tophat only used bowtie, but I think this is a file format issue, somehow related to the reference.

      Comment


      • #4
        First of all, extract metrics from your Alignment to check if they are ok, if they are may be it's because you are using a very high coverage 5000 means 500X of coverage ... Are you sure you have this ? Try to reduce to 300 ... and stand_emit_conf should not be 1 ... this is wrong ... use at least 10

        Comment


        • #5
          I solved my problem by backing up to an earlier GATK version. Here's my hypothesis for what caused the problem:

          After upgrading GATK, the UnifiedGenotyper began filtering most of my tophat-assembled reads. Spot-checking the alignment for quality scores (thanks for the tip, raonyguimaraes), I discovered that many of the higher-quality alignments received 255 for a mapping score (MAPQ), despite the following two assertions made in the SAM spec:

          "No alignments should be assigned mapping quality 255."
          "A value 255 indicates that the mapping quality is not available."

          My guess is that an older version of GATK agrees with tophat's interpretation of 255, but more current versions follow the spec more closely.

          I haven't completely chased down this hypothesis, but I did back up to an earlier version of GATK (v1.0.5974, Compiled 2011/06/10 13:26:59), and I no longer have problems with variant-calling my tophat alignment.

          Thanks for the help!

          Comment


          • #6
            The same thing happened for me with the latest GATK, so I had to use an old version to call SNPs in TopHat alignments.

            For release 1.1 of GATK, there is something here about filtering out mappings with a read quality of 255.



            Chris

            Comment


            • #7
              any change to GATK for this?

              I recently came across this problem with GATK 1.4-14-g2e47336. I assume this was changed a long time ago. Tophat still output 255 for mapping quality. Does anyone know if tophat or GATK has changed to accommodate this?

              Comment


              • #8
                Have you found the problem, I came across the same situation, everything looks fine, except for the "Actual calls made 0"

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                9 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                50 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                67 views
                0 likes
                Last Post seqadmin  
                Working...
                X