Hi
I have two sets of exome sequences; they have different mean coverage depth and also different coverage (different enrichment kits). I'll refer to these as subsets.
This isn't ideal but I need to figure out the optimal way to analyse these to come up with a comprehensive set of genotypes across all samples in both subsets for all SNPs detected in either or both subsets.
I'd prefer to use GATK if possible. I was thinking that I can't simply lump all the bam files in together and run the UG as when it comes to VQSR it won't be sensible to apply this across subsets of samples with very different coverage.
But can I lump them all in together to run the UG and then run VQSR separately on each subset to define filtering parameters? This would allow me to have the reference and non-reference calls at ALL variant sites across both sets of samples and I could then filter the subsets separately and appropriately and then recombine.
Does that make any sense at all? Any suggestions gratefully received!
Thank you!
I have two sets of exome sequences; they have different mean coverage depth and also different coverage (different enrichment kits). I'll refer to these as subsets.
This isn't ideal but I need to figure out the optimal way to analyse these to come up with a comprehensive set of genotypes across all samples in both subsets for all SNPs detected in either or both subsets.
I'd prefer to use GATK if possible. I was thinking that I can't simply lump all the bam files in together and run the UG as when it comes to VQSR it won't be sensible to apply this across subsets of samples with very different coverage.
But can I lump them all in together to run the UG and then run VQSR separately on each subset to define filtering parameters? This would allow me to have the reference and non-reference calls at ALL variant sites across both sets of samples and I could then filter the subsets separately and appropriately and then recombine.
Does that make any sense at all? Any suggestions gratefully received!
Thank you!