Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Overlapping paired-end reads for rare mutation detection

    I am new to next-generation sequencing but am interested in using it to detect rare random mutations induced in a population of cells by exposure to a mutagen. I am interested in getting 50,000 reads of a single amplicon per sample and using the number of mutations detected to estimate global mutation rate. However, the error rate of Illumina NGS is too high for detection of real mutation events (on the order of 1 in 1,000,000 bases).

    However, since the service I am interested in using generates paired-end reads of 2x250 bp, I am wondering if I can use an amplicon of 250 bp or less and then merge the paired reads to generate confidence levels/Phred scores of an order that could be compatible with rare mutation detection (after filtering lower-quality reads). From what I understand, I should be able to get a fair proportion of reads that meet this threshold if the paired-ends fully overlap. If I set the filtering threshold to reduce the error rate to 1 in 1,000,000 or so, then I should have reduced the noise enough to detect real rare mutations.

    Am I missing something, or do you think this approach should work?

  • #2
    That's a pretty good idea! We published an approach like that :-) https://bmcgenomics.biomedcentral.co...864-016-2669-3

    However, it would be hard to drive it down to 1 in 1 million. I think something like duplex-sequencing https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4271547/ does it better for ultra-low frequency changes.

    I don't know if anyone has tried PacBio HiFi reads on short fragments...say 1kb. You could assay random chunks of the genome without amplification (removing PCR-induced changes) and generate an accurate consensus sequence on the 1 kb fragment after getting 100 or more passes on the same fragment. If PacBio errors are as random as they say, then 100 passes should give an amazing consensus quality.
    Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

    Comment


    • #3
      Originally posted by SNPsaurus View Post
      However, it would be hard to drive it down to 1 in 1 million. I think something like duplex-sequencing https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4271547/ does it better for ultra-low frequency changes.
      There is also an improved version of the duplex sequencing method using CRISPR/Cas9 for improved enrichment:
      Targeted genome fragmentation with CRISPR/Cas9 enables fast and efficient enrichment of small genomic regions and ultra-accurate sequencing with low DNA input (CRISPR-DS)

      Comment


      • #4
        Originally posted by SNPsaurus View Post
        That's a pretty good idea! We published an approach like that :-) https://bmcgenomics.biomedcentral.co...864-016-2669-3

        However, it would be hard to drive it down to 1 in 1 million. I think something like duplex-sequencing https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4271547/ does it better for ultra-low frequency changes.

        I don't know if anyone has tried PacBio HiFi reads on short fragments...say 1kb. You could assay random chunks of the genome without amplification (removing PCR-induced changes) and generate an accurate consensus sequence on the 1 kb fragment after getting 100 or more passes on the same fragment. If PacBio errors are as random as they say, then 100 passes should give an amazing consensus quality.
        Thank you! And sorry for the delayed response- I wanted to read through your PELE-seq paper as well as the one on ENU-induced mutagenesis, since that is what I am doing (in cells though, not in fish).

        This approach is exactly the one I would like to take, so it is very helpful that you replied with this information.

        My primary question would be whether you think the barcoding would be necessary. I do not necessarily need to eliminate false positives- really all I am looking for is a statistically significant difference in the number of mutants between treated and untreated populations, and from what I have read about the error rate of Q5 polymerase, which I am using, that level of error should also be tolerable for this purpose. This should still be possible with a small number of false positives I would think, and I am also interested in variants that may only be present in 1 or a few cells within a population of millions, and as I understand the barcoding process would result in most of these ultra rare variants being filtered out.
        Last edited by maxz411; 03-04-2020, 07:46 PM.

        Comment


        • #5
          Right, the barcoding does require a certain level of presence in the population otherwise one pool has it the other does not. 1 in a million still sounds too rare to be able to identify. Can you bottleneck the cell populations to increase the presence of some rare mutations (and eliminate most)?
          Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

          Comment


          • #6
            Originally posted by SNPsaurus View Post
            Right, the barcoding does require a certain level of presence in the population otherwise one pool has it the other does not. 1 in a million still sounds too rare to be able to identify. Can you bottleneck the cell populations to increase the presence of some rare mutations (and eliminate most)?
            That is a possibility. I suppose the reason I thought 1 in a million would be reasonable is if I set a Phred cutoff of 30 (1 in 1,000), I would think the probability of both paired-end reads having an incorrect base at the same location would be 1 in 1,000,000 (excluding PCR errors). Or is there a reason it doesn't work like this in practice?

            Comment


            • #7
              I think the problem comes from Illumina errors not being perfectly random and that this bias is not being reflected in the quality scores. At 1 in a million there will be a higher background of artifacts than real changes.
              Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

              Comment


              • #8
                Thanks- that makes sense.

                Since I am only doing a single amplicon, I am wondering if there is a way to do something like this through PCR alone. I would think that if I barcoded each primer, I might be able to do something similar, assuming that in subsequent rounds of PCR the primer with the matching barcode was strongly favored over a mismatched barcode. Are you aware of any examples of something like this being done? Or do you think the genome fragmentation and barcode ligation before PCR is unavoidable?

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                8 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                8 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                67 views
                0 likes
                Last Post seqadmin  
                Working...
                X