Hey all, I'm new to the forums and was hoping for advice on a project I'm helping on that I haven't seen addressed here. We're attempting to map a dominant mutation in zebrafish using whole exome sequencing. From what I've learned, the methodology used is quite atypical and I wouldn't necessarily have done it the same way, but I'm working with the fastq's that I have.
Experimental Design:
ENU mutagenesis was done on males heterozygous for a recessive allele and crossed with untreated heterozygous females. Offspring were genotyped for the recessive allele and screened for suppression of the phenotype. Candidate offspring were backcrossed for multiple generations to validate the suppression of the phenotype on the mutant background. I'll refer to the unknown suppressor as s (S for the wildtype allele) and the mutant background as a.
SsAa male x SSaa female, offspring were sorted phenotypically and genotyped for the background mutation. AA and Aa embryos were discarded. 20 Ssaa suppressor offspring and 20 SSaa offspring were pooled into two groups. Whole exome sequencing was done on both parents and the two pools of offspring. (Unfortunately the pools were not multiplexed).
Bioinformatics:
I've so far done a few variants on the standard analysis pipeline and settled on BWA-mem alignment, Picard mark duplicates, GATK variant call and filter, and occasional SnpSift filtering. I feel comfortable that I'm accurately identifying SNPs that are heterozygous in the suppresor parent and homozygous in the background strain.
From here I'm hitting a wall, my current method is to use bam-readcount to pull nucleotide counts of the offspring directly from their corresponding bam files based on the SNPs previously identified. I've then done a chi-squared comparing suppressor pool counts against the expected counts based on the sum of all offspring counts at that location. A few areas have popped up but they seem to correlate to SNP density and are surrounded by non-significant snps.
My goal is to identify a reasonable number of locations to validate with Sanger sequencing. I'm wondering if anyone has any ideas for filtering or comparing groups or if anyone knows of methods that don't rely on homozygosity mapping.
Sorry this is so long and I'm sure I still left out too many details, I've just been staring at the same data through different filters for 6 weeks. Thanks in advance for any help you can provide and let me know what clarification I can provide.
Experimental Design:
ENU mutagenesis was done on males heterozygous for a recessive allele and crossed with untreated heterozygous females. Offspring were genotyped for the recessive allele and screened for suppression of the phenotype. Candidate offspring were backcrossed for multiple generations to validate the suppression of the phenotype on the mutant background. I'll refer to the unknown suppressor as s (S for the wildtype allele) and the mutant background as a.
SsAa male x SSaa female, offspring were sorted phenotypically and genotyped for the background mutation. AA and Aa embryos were discarded. 20 Ssaa suppressor offspring and 20 SSaa offspring were pooled into two groups. Whole exome sequencing was done on both parents and the two pools of offspring. (Unfortunately the pools were not multiplexed).
Bioinformatics:
I've so far done a few variants on the standard analysis pipeline and settled on BWA-mem alignment, Picard mark duplicates, GATK variant call and filter, and occasional SnpSift filtering. I feel comfortable that I'm accurately identifying SNPs that are heterozygous in the suppresor parent and homozygous in the background strain.
From here I'm hitting a wall, my current method is to use bam-readcount to pull nucleotide counts of the offspring directly from their corresponding bam files based on the SNPs previously identified. I've then done a chi-squared comparing suppressor pool counts against the expected counts based on the sum of all offspring counts at that location. A few areas have popped up but they seem to correlate to SNP density and are surrounded by non-significant snps.
My goal is to identify a reasonable number of locations to validate with Sanger sequencing. I'm wondering if anyone has any ideas for filtering or comparing groups or if anyone knows of methods that don't rely on homozygosity mapping.
Sorry this is so long and I'm sure I still left out too many details, I've just been staring at the same data through different filters for 6 weeks. Thanks in advance for any help you can provide and let me know what clarification I can provide.