Hi everyone,
Long-time reader, first-time poster looking for advice.
I have 100 bp, SE Illumina data (~500x coverage) and a short (~150 kb) reference genome. My difficulties stem from the fact that my DNA isn't from a single individual. Instead, it is a pool of an unknown number (but we're talking lots) of individuals. My goal is to identify SNPs, and accurately quantify allele frequencies at these sites.
Currently, I'm mapping my reads back to the reference genome using 'bowtie.' But mapping back to a reference is almost certainly biasing my allele frequencies in favor of the reference genome. Does anyone have a suggestion for alternative methods that eliminate or correct for this bias? I've considered de novo assembly (i.e. velvet) but I've been told that pooled DNA causes velvet problems.
I also have strong evidence of reads mis-mapping in some regions. I tried throwing out reads that map to multiple regions, but that didn't seem to solve the problem. Is there a technique for identifying mis-mapped reads, or to post-hoc identify problematic regions?
Thanks for any thoughts/suggestions you may have,
Dave
Long-time reader, first-time poster looking for advice.
I have 100 bp, SE Illumina data (~500x coverage) and a short (~150 kb) reference genome. My difficulties stem from the fact that my DNA isn't from a single individual. Instead, it is a pool of an unknown number (but we're talking lots) of individuals. My goal is to identify SNPs, and accurately quantify allele frequencies at these sites.
Currently, I'm mapping my reads back to the reference genome using 'bowtie.' But mapping back to a reference is almost certainly biasing my allele frequencies in favor of the reference genome. Does anyone have a suggestion for alternative methods that eliminate or correct for this bias? I've considered de novo assembly (i.e. velvet) but I've been told that pooled DNA causes velvet problems.
I also have strong evidence of reads mis-mapping in some regions. I tried throwing out reads that map to multiple regions, but that didn't seem to solve the problem. Is there a technique for identifying mis-mapped reads, or to post-hoc identify problematic regions?
Thanks for any thoughts/suggestions you may have,
Dave
Comment