Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • UnifiedGenotyper - Actual calls made 0

    Hi all

    I am using Unified Genotyper of (GATK) to call variations from my exome dataset. Before calling variations I realigned and recalibrated the dataset as suggested by GATK pipeline. Surprisingly for one of the samples unified genotyper is running for required time and ends without any error BUT the vcf file generated has no called variations. The output only contains initial headers of vcf file and nothing else.
    The log file shows following:

    INFO 04:59:00,484 UnifiedGenotyper - Visited bases 3137891381
    INFO 04:59:00,484 UnifiedGenotyper - Callable bases 2860850500
    INFO 04:59:00,485 UnifiedGenotyper - Confidently called bases 2674460611
    INFO 04:59:00,485 UnifiedGenotyper - % callable bases of all loci 91.171
    INFO 04:59:00,485 UnifiedGenotyper - % confidently called bases of all loci 85.231
    INFO 04:59:00,485 UnifiedGenotyper - % confidently called bases of callable loci 93.485
    INFO 04:59:00,485 UnifiedGenotyper - Actual calls made 0
    INFO 04:59:00,486 TraversalEngine - Total runtime 13136.39 secs, 218.94 min, 3.65 hours
    INFO 04:59:00,486 TraversalEngine - 160540 reads were filtered out during traversal out of 34176517 total (0.47%)
    INFO 04:59:00,486 TraversalEngine - -> 71636 reads (0.21% of total) failing BadMateFilter
    INFO 04:59:00,486 TraversalEngine - -> 88904 reads (0.26% of total) failing UnmappedReadFilter
    INFO 04:59:19,240 GATKRunReport - Uploaded run statistics report to AWS S3

    Upon inspection, I found that complete vcf file is being generated before recalibration , but after recalibration it is malformed!

    Need help on this!!

    Thanks in advance.

  • #2
    UnifiedGenotyper doesn't directly do recalibration of variants. What's the exact command you're running it with?

    Comment


    • #3
      Below is the GATK command I used for doing recalibration (after indel realignment).

      nohup java -Xmx4g -jar /data1/GenomeAnalysisTK-1.5-0-g04cafff/GenomeAnalysisTK.jar -R /data1/ref_genome/gatk_ref/hg19_kayotypically.fasta -I ns002_merged_realigned.bam -T TableRecalibration -recalFile ns002_merged_countCovariates_before_reclbrtn.recal_data.csv -o ns002_merged_realigned_recalibrated.bam &

      The recalibrated .bam so generated when used for calling variations using UnifiedGenotyper makes malformed .vcf file.
      But if I run unified genotyper on only realigned .bam file (i.e before recalibration), I get proper .vcf file.

      Is this a problem of recalibration or sth else?

      Comment


      • #4
        That certainly sounds like a problem in recalibration, but the command you used above looks fine. What command are you using to generate the recaldata file?

        Comment


        • #5
          Following is the command used for generating recal file (Count Covariates):

          nohup java -Xmx4g -jar /data1/GenomeAnalysisTK-1.5-0-g04cafff/GenomeAnalysisTK.jar -R /data1/ref_genome/gatk_ref/hg19_kayotypically.fasta -knownSites /data1/ref_genome/gatk_ref/gatk_vqsr_recalibration_vcffiles/dbsnp_135.b37_FINAL.vcf -I nb005_merged_realigned_recalibrated.bam -T CountCovariates -cov ReadGroupCovariate -cov QualityScoreCovariate -cov DinucCovariate -cov CycleCovariate -recalFile nb005_countCovariates_AFTER_reclbrtn.recal_data.csv &

          everything is fine for all other samples!!
          please let me know how should I solve this.

          Thanks

          Comment


          • #6
            That's the command line for counting covariates after recalibration (for verification purposes only). I'm assuming you used a similar command for the first count covariates step.

            The main problem I can see here is that you're using dbsnp for b37, but the genome you're aligning against is hg19. I know b37 and hg19 are quite similar, but I'm not sure of the exact difference between them, so it's possible there's a misalignment between dbsnp and your genome, which would cause recalibration to seriously reduce your quality scores. You could try getting dbsnp for hg19 from the Broad Institute's FTP server and rerunning countcovariates and tablerecalibration with that.

            That's the only thing I can think of, unfortunately. Let me know if you still have problems!

            Comment


            • #7
              ok, I ll try doing that.
              but why is it happening only with this one sample?? everything is fine for rest all samples!!
              Also in log file it shows no.of callable bases, no. of confident calls etc.., if it was downgrading the quality scores so much then it shouldn't even have showed these statistics! Its just NOT WRITTING the calls in file!!


              INFO 04:59:00,484 UnifiedGenotyper - Visited bases 3137891381
              INFO 04:59:00,484 UnifiedGenotyper - Callable bases 2860850500
              INFO 04:59:00,485 UnifiedGenotyper - Confidently called bases 2674460611
              INFO 04:59:00,485 UnifiedGenotyper - % callable bases of all loci 91.171
              INFO 04:59:00,485 UnifiedGenotyper - % confidently called bases of all loci 85.231
              INFO 04:59:00,485 UnifiedGenotyper - % confidently called bases of callable loci 93.485
              INFO 04:59:00,485 UnifiedGenotyper - Actual calls made 0

              can that be a bug in the program??
              I ll try doing as you said.

              Thanks

              Comment


              • #8
                Originally posted by aan View Post
                ok, I ll try doing that.
                but why is it happening only with this one sample?? everything is fine for rest all samples!!
                Also in log file it shows no.of callable bases, no. of confident calls etc.., if it was downgrading the quality scores so much then it shouldn't even have showed these statistics! Its just NOT WRITTING the calls in file!!


                INFO 04:59:00,484 UnifiedGenotyper - Visited bases 3137891381
                INFO 04:59:00,484 UnifiedGenotyper - Callable bases 2860850500
                INFO 04:59:00,485 UnifiedGenotyper - Confidently called bases 2674460611
                INFO 04:59:00,485 UnifiedGenotyper - % callable bases of all loci 91.171
                INFO 04:59:00,485 UnifiedGenotyper - % confidently called bases of all loci 85.231
                INFO 04:59:00,485 UnifiedGenotyper - % confidently called bases of callable loci 93.485
                INFO 04:59:00,485 UnifiedGenotyper - Actual calls made 0

                can that be a bug in the program??
                I ll try doing as you said.

                Thanks


                Hi, aan

                Have you found the problem, I came across the same situation, everything looks fine, except for the "Actual calls made 0".

                "
                INFO 14:09:58,839 UnifiedGenotyper - Visited bases 3095677412
                INFO 14:09:58,846 UnifiedGenotyper - Callable bases 2861327131
                INFO 14:09:58,847 UnifiedGenotyper - Confidently called bases 2861327131
                INFO 14:09:58,847 UnifiedGenotyper - % callable bases of all loci 92.430
                INFO 14:09:58,847 UnifiedGenotyper - % confidently called bases of all loci 92.430
                INFO 14:09:58,848 UnifiedGenotyper - % confidently called bases of callable loci 100.000
                INFO 14:09:58,848 UnifiedGenotyper - Actual calls made 0
                INFO 14:09:58,848 TraversalEngine - Total runtime 8330.91 secs, 138.85 min, 2.31 hours
                "

                Comment


                • #9
                  hi

                  yes I was also facing the same problem. after a lot of troubleshooting I came to conclusion that Unified Genotyper is running fine, but the data is so bad that it does not pass the given thresholds of quality (of variants being called), that is the reason calls made are 0, ( because actually no call passed the quality check.)

                  To set it right , I analysed this sample right from alignment once again. After aligning again this error did not show up. actually there were variants called after running UG (although now there is another problem that I am facing with the same sample )

                  let me kno if it is not clear.

                  Comment


                  • #10
                    Hi, ann

                    thanks for your information! I also checked the quality, but hardly found any abnormal clues. Now I am trying to using a newer version of GATK to walk around this problem.
                    BTW, what version are you using?

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin




                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                      04-22-2024, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Today, 11:49 AM
                    0 responses
                    10 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Yesterday, 08:47 AM
                    0 responses
                    16 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    61 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    60 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X