Hi
I am using GATK to realign around known indels, and am finding it rather slow going.
on the PC I am currently using (i5-750, 16Gb ram) this stage is taking approx 40 min per sample (approx 0.13 gigabases worth of reads per paired-end sample).
Briefly my pipeline looks like this:
demultiplex at FASTQ stage, then carry out the following on each demultiplexed sample individually:
1) align (bwa)
2) convert to sam (samtools)
3) dup mark & coverage metrics (picard)
4) Find areas to realign based on known indels vcf file from GATK (GATK RealignerTargetCreator) <- this step is long
5) Realign around indels (GATK IndelRealigner)
6) call variants (GATK UnifiedGenotyper)
7) Apply variant filtration annotation to vcf (GATK VariantFiltration)
8) Annotate variants
snpEff
Annovar <-takes a long time at dbSNP annotation step
I have a small targeted resequencing assay with approx 1000 targets, and I was wondering if RealignerTargetCreator was so slow because it was having to search through a whole genomes-worth of indels in the vcf supplied by the Broad with GATK?
If I filtered it just for the indels in my target region would that cause a problem?
Would indexing the vcf do anything?
Any other suggestion would be appreciated
Thanks,
Chris
I am using GATK to realign around known indels, and am finding it rather slow going.
on the PC I am currently using (i5-750, 16Gb ram) this stage is taking approx 40 min per sample (approx 0.13 gigabases worth of reads per paired-end sample).
Briefly my pipeline looks like this:
demultiplex at FASTQ stage, then carry out the following on each demultiplexed sample individually:
1) align (bwa)
2) convert to sam (samtools)
3) dup mark & coverage metrics (picard)
4) Find areas to realign based on known indels vcf file from GATK (GATK RealignerTargetCreator) <- this step is long
5) Realign around indels (GATK IndelRealigner)
6) call variants (GATK UnifiedGenotyper)
7) Apply variant filtration annotation to vcf (GATK VariantFiltration)
8) Annotate variants
snpEff
Annovar <-takes a long time at dbSNP annotation step
I have a small targeted resequencing assay with approx 1000 targets, and I was wondering if RealignerTargetCreator was so slow because it was having to search through a whole genomes-worth of indels in the vcf supplied by the Broad with GATK?
If I filtered it just for the indels in my target region would that cause a problem?
Would indexing the vcf do anything?
Any other suggestion would be appreciated
Thanks,
Chris
Comment