I am new to next-generation sequencing but am interested in using it to detect rare random mutations induced in a population of cells by exposure to a mutagen. I am interested in getting 50,000 reads of a single amplicon per sample and using the number of mutations detected to estimate global mutation rate. However, the error rate of Illumina NGS is too high for detection of real mutation events (on the order of 1 in 1,000,000 bases).
However, since the service I am interested in using generates paired-end reads of 2x250 bp, I am wondering if I can use an amplicon of 250 bp or less and then merge the paired reads to generate confidence levels/Phred scores of an order that could be compatible with rare mutation detection (after filtering lower-quality reads). From what I understand, I should be able to get a fair proportion of reads that meet this threshold if the paired-ends fully overlap. If I set the filtering threshold to reduce the error rate to 1 in 1,000,000 or so, then I should have reduced the noise enough to detect real rare mutations.
Am I missing something, or do you think this approach should work?
However, since the service I am interested in using generates paired-end reads of 2x250 bp, I am wondering if I can use an amplicon of 250 bp or less and then merge the paired reads to generate confidence levels/Phred scores of an order that could be compatible with rare mutation detection (after filtering lower-quality reads). From what I understand, I should be able to get a fair proportion of reads that meet this threshold if the paired-ends fully overlap. If I set the filtering threshold to reduce the error rate to 1 in 1,000,000 or so, then I should have reduced the noise enough to detect real rare mutations.
Am I missing something, or do you think this approach should work?
Comment