SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
What are key factors to consider while planning pooled DNA sequencing San220 Genomic Resequencing 3 02-13-2015 05:04 PM
comparing exome sequencing and exome chip data Rabu Bioinformatics 0 10-28-2014 10:40 AM
DESeq v1.12.0 estimateDispersions function: pooled-CR vs pooled adumitri Bioinformatics 1 05-03-2013 03:23 PM
seeking pooled sequencing experimental design advice bluesquid General 0 08-13-2012 06:34 AM
Protocol for sequencing of pooled subjects zajtat Sample Prep / Library Generation 10 01-11-2012 02:03 PM

Reply
 
Thread Tools
Old 08-17-2015, 09:08 AM   #1
sgrego08
Junior Member
 
Location: Ann Arbor

Join Date: Aug 2015
Posts: 1
Default Pooled exome sequencing comparisons

Hey all, I'm new to the forums and was hoping for advice on a project I'm helping on that I haven't seen addressed here. We're attempting to map a dominant mutation in zebrafish using whole exome sequencing. From what I've learned, the methodology used is quite atypical and I wouldn't necessarily have done it the same way, but I'm working with the fastq's that I have.

Experimental Design:
ENU mutagenesis was done on males heterozygous for a recessive allele and crossed with untreated heterozygous females. Offspring were genotyped for the recessive allele and screened for suppression of the phenotype. Candidate offspring were backcrossed for multiple generations to validate the suppression of the phenotype on the mutant background. I'll refer to the unknown suppressor as s (S for the wildtype allele) and the mutant background as a.

SsAa male x SSaa female, offspring were sorted phenotypically and genotyped for the background mutation. AA and Aa embryos were discarded. 20 Ssaa suppressor offspring and 20 SSaa offspring were pooled into two groups. Whole exome sequencing was done on both parents and the two pools of offspring. (Unfortunately the pools were not multiplexed).

Bioinformatics:
I've so far done a few variants on the standard analysis pipeline and settled on BWA-mem alignment, Picard mark duplicates, GATK variant call and filter, and occasional SnpSift filtering. I feel comfortable that I'm accurately identifying SNPs that are heterozygous in the suppresor parent and homozygous in the background strain.

From here I'm hitting a wall, my current method is to use bam-readcount to pull nucleotide counts of the offspring directly from their corresponding bam files based on the SNPs previously identified. I've then done a chi-squared comparing suppressor pool counts against the expected counts based on the sum of all offspring counts at that location. A few areas have popped up but they seem to correlate to SNP density and are surrounded by non-significant snps.

My goal is to identify a reasonable number of locations to validate with Sanger sequencing. I'm wondering if anyone has any ideas for filtering or comparing groups or if anyone knows of methods that don't rely on homozygosity mapping.

Sorry this is so long and I'm sure I still left out too many details, I've just been staring at the same data through different filters for 6 weeks. Thanks in advance for any help you can provide and let me know what clarification I can provide.
sgrego08 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:03 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO