Hi all,
I am analyzing whole-population sequencing data of E. coli that has undergone experimental evolution in different environmental conditions --- the goal is determine if one condition produces more genetic diversity than the other.
I initially ran breseq to identify SNPs/indels/etc. in each sample; however, there were no obvious differences in the number, frequency, or type of mutations between samples. However, this analysis is only sensitive to relatively high-frequency mutations (>5%), and the genetic diversity we're looking for may very well be manifested in lots of low-frequency alleles.
For that reason, and since we moreover don't care about identifying specific mutations anyway, I'm wondering if I can estimate total genetic diversity just from the overall number of mismatches in each sample's alignment. Of course the vast majority of these mismatches are noise from the sequencing instrument, but if there are enough real mismatches on top of the noise, perhaps I can detect a statistically significant difference in them between my samples.
Is this a reasonable thing to do, or does anyone have any other ideas? Thanks in advance for any comments or suggestions!
Michael
I am analyzing whole-population sequencing data of E. coli that has undergone experimental evolution in different environmental conditions --- the goal is determine if one condition produces more genetic diversity than the other.
I initially ran breseq to identify SNPs/indels/etc. in each sample; however, there were no obvious differences in the number, frequency, or type of mutations between samples. However, this analysis is only sensitive to relatively high-frequency mutations (>5%), and the genetic diversity we're looking for may very well be manifested in lots of low-frequency alleles.
For that reason, and since we moreover don't care about identifying specific mutations anyway, I'm wondering if I can estimate total genetic diversity just from the overall number of mismatches in each sample's alignment. Of course the vast majority of these mismatches are noise from the sequencing instrument, but if there are enough real mismatches on top of the noise, perhaps I can detect a statistically significant difference in them between my samples.
Is this a reasonable thing to do, or does anyone have any other ideas? Thanks in advance for any comments or suggestions!
Michael
Comment