Old 07-02-2015, 02:34 AM   #1
Location: Sweden

Join Date: Jun 2014
Posts: 86
Default Variant calling with GATK between two cell lines

I recently got some whole genome sequencing data from my platform that would constitute some validation data for my other RNA-seq data. The idea was that the platform would do all the bioinformatic analyses for this data, so that I could keep my focus on RNA rather than DNA, but it turns out that they don't routinely do the last part of the analysis that I want: finding any differing mutations between my two sample types. I now have to do that myself, so I come here for help.

I have two isogenic cell lines, theoretically differing in a single mutation, and that is what we want to confirm (or that, at least, any other mutatations are in non-functional or otherwise non-relevant regions). As far as I understand it, the platform has done the alignment and the heaviest parts of the analysis, as well as a variant calling relative the reference. I'm really only interested in the difference between the cell lines, and not between cell lines / reference.

I was pointed towards the GenotypeGVCFs function of GATK. If I understand it correctly, what I do is input my two samples GVCFs and run it. I'm not sure why I need a reference, though. I made some rudimentary code:

java -jar GenomeAnalysisTK.jar \
	-T GenotypeGVCFs \
        -nt 16 \
	-R .../human_g1k_v37.fasta \
	--variant .../sample1.clean.dedup.recal.bam.genomic.vcf \
	--variant .../sample2.clean.dedup.recal.bam.genomic.vcf \
	-o .../output.vcf
Is this what I'm looking for? If not, what am I missing, or am I completely off course?
