Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GATK UnifiedGenotyper not calling any variants

    I'm trying to align RNASeq data to hg19 and call variants. Qualities look ok, the tophat alignment has been sorted (picard) and tagged with readgroups as per GATK discussions, and it all looks fine in IGV.

    However not a single variant is being called. Here's the message that's most indicative of the problem (confidently called bases = 0):

    Code:
    INFO  23:00:50,322 UnifiedGenotyper - Visited bases                                3101976562
    INFO  23:00:50,323 UnifiedGenotyper - Callable bases                               2864957043
    INFO  23:00:50,323 UnifiedGenotyper - Confidently called bases                     0
    INFO  23:00:50,323 UnifiedGenotyper - % callable bases of all loci                 92.359
    INFO  23:00:50,323 UnifiedGenotyper - % confidently called bases of all loci       0.000
    INFO  23:00:50,324 UnifiedGenotyper - % confidently called bases of callable loci  0.000
    INFO  23:00:50,324 UnifiedGenotyper - Actual calls made                            0
    INFO  23:00:50,325 TraversalEngine - Total runtime 14151.69 secs, 235.86 min, 3.93 hours
    INFO  23:00:50,325 TraversalEngine - 66936259 reads were filtered out during traversal out of 77587056 total (86.27%)
    INFO  23:00:50,326 TraversalEngine -   -> 66936259 reads (86.27% of total) failing MappingQualityUnavailableReadFilter
    Would this kind of message seem to indicate a potential problem with the hg19 reference that I'm using?

    I'm using:
    Code:
    ftp://ftp.broadinstitute.org/pub/seq/references/Homo_sapiens_assembly19.fasta
    I call GATK as follows:
    Code:
    java -jar GenomeAnalysisTK.jar -I karyotypicRG.bam \
       -R Homo_sapiens_assembly19.fasta -T UnifiedGenotyper \
        -o snpCalls.vcf  \
        -stand_call_conf 50.0 \
        -stand_emit_conf 1 \
        -dcov 5000 >& gatkGenotyper.out
    Adding a dbSNP ROD doesn't fix the problem, nor does EMIT_ALL_SITES (e.g., ignore quality scores and call everything).

    Thanks in advance for any thoughts.

  • #2
    May be there is something wrong with the score of your FASTQ files. Are you using Illumina or Phred Scores ? You could try to use BWA for the alignment.

    Comment


    • #3
      I'm using Illumina scores and tophat to find splicing junctions, and tophat only used bowtie, but I think this is a file format issue, somehow related to the reference.

      Comment


      • #4
        First of all, extract metrics from your Alignment to check if they are ok, if they are may be it's because you are using a very high coverage 5000 means 500X of coverage ... Are you sure you have this ? Try to reduce to 300 ... and stand_emit_conf should not be 1 ... this is wrong ... use at least 10

        Comment


        • #5
          I solved my problem by backing up to an earlier GATK version. Here's my hypothesis for what caused the problem:

          After upgrading GATK, the UnifiedGenotyper began filtering most of my tophat-assembled reads. Spot-checking the alignment for quality scores (thanks for the tip, raonyguimaraes), I discovered that many of the higher-quality alignments received 255 for a mapping score (MAPQ), despite the following two assertions made in the SAM spec:

          "No alignments should be assigned mapping quality 255."
          "A value 255 indicates that the mapping quality is not available."

          My guess is that an older version of GATK agrees with tophat's interpretation of 255, but more current versions follow the spec more closely.

          I haven't completely chased down this hypothesis, but I did back up to an earlier version of GATK (v1.0.5974, Compiled 2011/06/10 13:26:59), and I no longer have problems with variant-calling my tophat alignment.

          Thanks for the help!

          Comment


          • #6
            The same thing happened for me with the latest GATK, so I had to use an old version to call SNPs in TopHat alignments.

            For release 1.1 of GATK, there is something here about filtering out mappings with a read quality of 255.



            Chris

            Comment


            • #7
              any change to GATK for this?

              I recently came across this problem with GATK 1.4-14-g2e47336. I assume this was changed a long time ago. Tophat still output 255 for mapping quality. Does anyone know if tophat or GATK has changed to accommodate this?

              Comment


              • #8
                Have you found the problem, I came across the same situation, everything looks fine, except for the "Actual calls made 0"

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Recent Advances in Sequencing Analysis Tools
                  by seqadmin


                  The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                  05-06-2024, 07:48 AM
                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 05-07-2024, 06:57 AM
                0 responses
                12 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-06-2024, 07:17 AM
                0 responses
                16 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-02-2024, 08:06 AM
                0 responses
                22 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-30-2024, 12:17 PM
                0 responses
                24 views
                0 likes
                Last Post seqadmin  
                Working...
                X