SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Merging non-overlapping paired end reads karenr Illumina/Solexa 9 12-16-2016 06:02 PM
overlapping paired-end reads vs single end reads evakoe Illumina/Solexa 14 07-22-2015 12:35 AM
Overlapping paired end reads in SOAPdenovo Fernando Seixas Bioinformatics 0 11-26-2013 04:40 AM
Overlapping Paired End reads - questions... NRiddiford RNA Sequencing 0 07-19-2012 07:08 AM

Reply
 
Thread Tools
Old 02-19-2020, 04:35 PM   #1
maxz411
Junior Member
 
Location: Rochester, NY

Join Date: Feb 2020
Posts: 7
Default Overlapping paired-end reads for rare mutation detection

I am new to next-generation sequencing but am interested in using it to detect rare random mutations induced in a population of cells by exposure to a mutagen. I am interested in getting 50,000 reads of a single amplicon per sample and using the number of mutations detected to estimate global mutation rate. However, the error rate of Illumina NGS is too high for detection of real mutation events (on the order of 1 in 1,000,000 bases).

However, since the service I am interested in using generates paired-end reads of 2x250 bp, I am wondering if I can use an amplicon of 250 bp or less and then merge the paired reads to generate confidence levels/Phred scores of an order that could be compatible with rare mutation detection (after filtering lower-quality reads). From what I understand, I should be able to get a fair proportion of reads that meet this threshold if the paired-ends fully overlap. If I set the filtering threshold to reduce the error rate to 1 in 1,000,000 or so, then I should have reduced the noise enough to detect real rare mutations.

Am I missing something, or do you think this approach should work?
maxz411 is offline   Reply With Quote
Old 02-19-2020, 10:18 PM   #2
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 521
Default

That's a pretty good idea! We published an approach like that :-) https://bmcgenomics.biomedcentral.co...864-016-2669-3

However, it would be hard to drive it down to 1 in 1 million. I think something like duplex-sequencing https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4271547/ does it better for ultra-low frequency changes.

I don't know if anyone has tried PacBio HiFi reads on short fragments...say 1kb. You could assay random chunks of the genome without amplification (removing PCR-induced changes) and generate an accurate consensus sequence on the 1 kb fragment after getting 100 or more passes on the same fragment. If PacBio errors are as random as they say, then 100 passes should give an amazing consensus quality.
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Old 02-24-2020, 11:58 PM   #3
torben
Member
 
Location: Norway

Join Date: Oct 2012
Posts: 20
Default

Quote:
Originally Posted by SNPsaurus View Post
However, it would be hard to drive it down to 1 in 1 million. I think something like duplex-sequencing https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4271547/ does it better for ultra-low frequency changes.
There is also an improved version of the duplex sequencing method using CRISPR/Cas9 for improved enrichment:
Targeted genome fragmentation with CRISPR/Cas9 enables fast and efficient enrichment of small genomic regions and ultra-accurate sequencing with low DNA input (CRISPR-DS)
torben is offline   Reply With Quote
Old 03-04-2020, 06:43 PM   #4
maxz411
Junior Member
 
Location: Rochester, NY

Join Date: Feb 2020
Posts: 7
Default

Quote:
Originally Posted by SNPsaurus View Post
That's a pretty good idea! We published an approach like that :-) https://bmcgenomics.biomedcentral.co...864-016-2669-3

However, it would be hard to drive it down to 1 in 1 million. I think something like duplex-sequencing https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4271547/ does it better for ultra-low frequency changes.

I don't know if anyone has tried PacBio HiFi reads on short fragments...say 1kb. You could assay random chunks of the genome without amplification (removing PCR-induced changes) and generate an accurate consensus sequence on the 1 kb fragment after getting 100 or more passes on the same fragment. If PacBio errors are as random as they say, then 100 passes should give an amazing consensus quality.
Thank you! And sorry for the delayed response- I wanted to read through your PELE-seq paper as well as the one on ENU-induced mutagenesis, since that is what I am doing (in cells though, not in fish).

This approach is exactly the one I would like to take, so it is very helpful that you replied with this information.

My primary question would be whether you think the barcoding would be necessary. I do not necessarily need to eliminate false positives- really all I am looking for is a statistically significant difference in the number of mutants between treated and untreated populations, and from what I have read about the error rate of Q5 polymerase, which I am using, that level of error should also be tolerable for this purpose. This should still be possible with a small number of false positives I would think, and I am also interested in variants that may only be present in 1 or a few cells within a population of millions, and as I understand the barcoding process would result in most of these ultra rare variants being filtered out.

Last edited by maxz411; 03-04-2020 at 06:46 PM.
maxz411 is offline   Reply With Quote
Old 03-05-2020, 11:35 AM   #5
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 521
Default

Right, the barcoding does require a certain level of presence in the population otherwise one pool has it the other does not. 1 in a million still sounds too rare to be able to identify. Can you bottleneck the cell populations to increase the presence of some rare mutations (and eliminate most)?
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Old 03-09-2020, 01:22 PM   #6
maxz411
Junior Member
 
Location: Rochester, NY

Join Date: Feb 2020
Posts: 7
Default

Quote:
Originally Posted by SNPsaurus View Post
Right, the barcoding does require a certain level of presence in the population otherwise one pool has it the other does not. 1 in a million still sounds too rare to be able to identify. Can you bottleneck the cell populations to increase the presence of some rare mutations (and eliminate most)?
That is a possibility. I suppose the reason I thought 1 in a million would be reasonable is if I set a Phred cutoff of 30 (1 in 1,000), I would think the probability of both paired-end reads having an incorrect base at the same location would be 1 in 1,000,000 (excluding PCR errors). Or is there a reason it doesn't work like this in practice?
maxz411 is offline   Reply With Quote
Old 03-09-2020, 02:27 PM   #7
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 521
Default

I think the problem comes from Illumina errors not being perfectly random and that this bias is not being reflected in the quality scores. At 1 in a million there will be a higher background of artifacts than real changes.
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Old 03-12-2020, 01:50 PM   #8
maxz411
Junior Member
 
Location: Rochester, NY

Join Date: Feb 2020
Posts: 7
Default

Thanks- that makes sense.

Since I am only doing a single amplicon, I am wondering if there is a way to do something like this through PCR alone. I would think that if I barcoded each primer, I might be able to do something similar, assuming that in subsequent rounds of PCR the primer with the matching barcode was strongly favored over a mismatched barcode. Are you aware of any examples of something like this being done? Or do you think the genome fragmentation and barcode ligation before PCR is unavoidable?
maxz411 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:47 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO