Hello all,
My project is similar to some others that I have seen posted. I am sequencing a viral population, or quasispecies. I am using the Illumina platform and estimate 10,000-100,000 coverage. Because each viral genome has on average 1 mutation compared to reference, there will be LOTS of SNPs. Which brings me to my question.
Before it is loaded on the machine, the viral genomic RNA has been reverse transcribed (introducing errors) and subjected to 15 cycles of PCR using a high fidelity polymerase (some errors, but probably not too many introduced). In my analysis, any SNP identified could be (1) a real viral polymorphism in the population (2) a RT error (3) a PCR error (4) sequencing error. I imagine Phred scores could be used to minimize #4 and looking at multiple non-unique reads at each base could minimize #3.
I noticed there are various Bayesian SNP callers (PolyBayes, GigaBayes, Alta-SNP) out there. Forgive my limited understanding of statistics and informatics, but could I use Bayesian algorithms to call SNPs based on empirically determine pretest probability? In my samples, I also run in vitro transcribed RNA as a control which should give a background error rate or distribution for #2-4. Can this be used with existing programs to call true SNPs using a Bayes factor or some such?
Thanks!
My project is similar to some others that I have seen posted. I am sequencing a viral population, or quasispecies. I am using the Illumina platform and estimate 10,000-100,000 coverage. Because each viral genome has on average 1 mutation compared to reference, there will be LOTS of SNPs. Which brings me to my question.
Before it is loaded on the machine, the viral genomic RNA has been reverse transcribed (introducing errors) and subjected to 15 cycles of PCR using a high fidelity polymerase (some errors, but probably not too many introduced). In my analysis, any SNP identified could be (1) a real viral polymorphism in the population (2) a RT error (3) a PCR error (4) sequencing error. I imagine Phred scores could be used to minimize #4 and looking at multiple non-unique reads at each base could minimize #3.
I noticed there are various Bayesian SNP callers (PolyBayes, GigaBayes, Alta-SNP) out there. Forgive my limited understanding of statistics and informatics, but could I use Bayesian algorithms to call SNPs based on empirically determine pretest probability? In my samples, I also run in vitro transcribed RNA as a control which should give a background error rate or distribution for #2-4. Can this be used with existing programs to call true SNPs using a Bayes factor or some such?
Thanks!