I tried to follow the protocol of"Better: sample level realignment with known indels and recalibration" in "Best Practice Variant Detection with GATK v3".
It said to recalibrate the realigned bam. And it seems that I should provide a database of known polymorphic sites before I can run CountCovariates of GATK. I wondered,
1, if I had not such a database in hand, should I use my resequencing data to create the database from scratch? How does the quality play roles on SNP or polymorhism ? Should I use the recalibrated bams to call polymorphism sites again?
2, if there is a database from other project, should I combine the polymorphic sites from my data with it?
Thank you.
It said to recalibrate the realigned bam. And it seems that I should provide a database of known polymorphic sites before I can run CountCovariates of GATK. I wondered,
1, if I had not such a database in hand, should I use my resequencing data to create the database from scratch? How does the quality play roles on SNP or polymorhism ? Should I use the recalibrated bams to call polymorphism sites again?
2, if there is a database from other project, should I combine the polymorphic sites from my data with it?
Thank you.