I've got exome data from 12 human samples that I'd like to analyze. For the most part, I've been following GATK's best practices for my pipeline, but I run into some problems and I can't seem to solve them by reading the forums here or at broadinstitute.org. I wanted to get all the way through the pipeline with one of the samples just to make sure that my script will work properly.
The first real problem I had was when I used the BaseRecalibrator tool from GATK. From what I could tell, there was a problem (I think) with many of the CIGAR strings. There are about 1000 instances of deletions at the end of a read. Perhaps this is my the source of all my issues. Does this even make sense? It seems like a deletion should not occur at the beginning or end of a read, but I may be misunderstanding something. By reading the forums at broadinstitute, I learned that adding the "-rf BadCigar" option will make BaseRecalibrator ignore all cases of bad CIGAR strings. But then later on when I use HaplotypeCaller, I get an error with error message line:
This reminded me of the 'bad' CIGAR strings that I had observed earlier.
If this CIGAR string issue is my problem, can anyone help me fix it? I used bwa aln to align and then bwa sampe to generate the sam file. The odd CIGAR strings already appear in this sam file, so if deletions at the end of a read are bad, then the problem seems to be in the alignment/sampe stage. I can also post an abbreviated version of my script or the entire error message that I get from HaplotypeCaller if they might be helpful.
Thank you,
Blake
The first real problem I had was when I used the BaseRecalibrator tool from GATK. From what I could tell, there was a problem (I think) with many of the CIGAR strings. There are about 1000 instances of deletions at the end of a read. Perhaps this is my the source of all my issues. Does this even make sense? It seems like a deletion should not occur at the beginning or end of a read, but I may be misunderstanding something. By reading the forums at broadinstitute, I learned that adding the "-rf BadCigar" option will make BaseRecalibrator ignore all cases of bad CIGAR strings. But then later on when I use HaplotypeCaller, I get an error with error message line:
Code:
##### ERROR MESSAGE: Somehow the requested coordinate is not covered by the read. Too many deletions?
If this CIGAR string issue is my problem, can anyone help me fix it? I used bwa aln to align and then bwa sampe to generate the sam file. The odd CIGAR strings already appear in this sam file, so if deletions at the end of a read are bad, then the problem seems to be in the alignment/sampe stage. I can also post an abbreviated version of my script or the entire error message that I get from HaplotypeCaller if they might be helpful.
Thank you,
Blake
Comment