Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GATK CountCovariates running very slow

    Hi,

    I tried posting this question on GetStatisfaction GATK forum but kept getting an invalid request error in Firefox. I thought I would give SeqAnswers a try (this is my first post here)

    I am trying to recalibrate quality scores with GATK CountCovariates and it is running extremely slow:

    java -Xmx64000m -jar GenomeAnalysisTK.jar -R $REF_BIN/$REF --DBSNP
    $DBSNP_BIN/$DBSNP -l INFO -T CountCovariates -I my.bam
    --max_reads_at_locus 20000 -cov ReadGroupCovariate -cov
    QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate
    -recalFile $CSV > $CSV.stdout 2> $NODE_DIR/$OUTPUT.stderr

    Initially GATK gives an EOF exception for reading a *.rod.idx file

    INFO 08:51:15,032 TribbleRMDTrackBuilder - Loading Tribble index from
    disk for file /scratch/indapa/dbsnp_129_b37.rod
    ERROR 08:51:19,710 LinearIndex - Error reading index file:
    /scratch/indapa/dbsnp_129_b37.rod.idx
    java.io.EOFException

    But then proceeds to the CovariateCounterWalker and starts recording
    the number sites traversed (the bam file I want to recalibrate has ~150M reads and is 11GB in size)

    INFO 08:59:30,757 CovariateCounterWalker - The covariates being used here:
    INFO 08:59:30,758 CovariateCounterWalker - ReadGroupCovariate
    INFO 08:59:30,758 CovariateCounterWalker - QualityScoreCovariate
    INFO 08:59:30,758 CovariateCounterWalker - CycleCovariate
    INFO 08:59:30,759 CovariateCounterWalker - DinucCovariate
    INFO 09:00:25,452 TraversalEngine - [PROGRESS] Traversed to 1:10001,
    processing 1 sites in 545.65 secs (545645000.00 secs per 1M sites)

    It has been traversing human chromosome 1 for >2days. I was initially
    getting out of memory exception and I allocated much more memory to
    the java heap than I had done in the past. I'm not sure why this is taking so much longer than previous bam files I've recalibrated with GATK of similar file size. Has anyone experienced similar behavior with CounCovariates?

  • #2
    figured it out - the rod file index was corrupted. Downloaded new verison of GATK along with resource bundle: http://www.broadinstitute.org/gsa/wi...esource_bundle with dbSNP vcf and it works much better.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      04-22-2024, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    59 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    57 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    51 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    55 views
    0 likes
    Last Post seqadmin  
    Working...
    X