Hi all,
I have a weird problem in that I only see 4 dinucleotides in the GATK-CountCovariates result files for the dinucleotideCovariate (i.e. AA,AC,AG,AT), whereas the BAM file has all dinucleotides present (checked with script). Sofar, I see this for paired end data (solid), while single end runs perform OK (in the sense that the expected 16 dinucleotides are reported. CountCovariates doesn't complain for the PE runs either).
I am using version GATK v1.3.16 and have my reads aligned using BWA 0.5.9-r16 (including a hack to write the CS and CQ tags which are necessary for GATK to do recalibration in the first place). I have run ValidateSamFile over it (using IGNORE=MISSING_NM_TAG and MISMATCH_FLAG_MATE_NEG_STRAND).
Has any of you seen this before or am I missing the point somewhere?
thanks,
Aldo
I have a weird problem in that I only see 4 dinucleotides in the GATK-CountCovariates result files for the dinucleotideCovariate (i.e. AA,AC,AG,AT), whereas the BAM file has all dinucleotides present (checked with script). Sofar, I see this for paired end data (solid), while single end runs perform OK (in the sense that the expected 16 dinucleotides are reported. CountCovariates doesn't complain for the PE runs either).
I am using version GATK v1.3.16 and have my reads aligned using BWA 0.5.9-r16 (including a hack to write the CS and CQ tags which are necessary for GATK to do recalibration in the first place). I have run ValidateSamFile over it (using IGNORE=MISSING_NM_TAG and MISMATCH_FLAG_MATE_NEG_STRAND).
Has any of you seen this before or am I missing the point somewhere?
thanks,
Aldo