SEQanswers

Go Back   SEQanswers > Core Facilities



Similar Threads
Thread Thread Starter Forum Replies Last Post
ONCOCNV: a method to extract CNAs from amplicon (or targeted) sequencing data valeu Bioinformatics 9 06-14-2017 08:14 AM
Amplicon sequencing of a really long single size 2.2 kb amplicon library TompaB Illumina/Solexa 0 02-22-2015 12:39 PM
Targeted (amplicon) read mapping algorithms msl1y11 Bioinformatics 0 07-30-2013 01:50 AM
Targeted Amplicon Primer Sets cement_head Literature Watch 1 12-07-2012 03:33 AM
Targeted Sequencing - How are you doing sample prep for your targeted sequencing prj? mike lee Sample Prep / Library Generation 0 01-26-2010 07:00 PM

Reply
 
Thread Tools
Old 10-14-2015, 07:53 AM   #1
The_Roads
Member
 
Location: USA

Join Date: May 2009
Posts: 35
Default Targeted amplicon sequencing, is full overlap necessary?

Hi,

Has anyone got experience running targeted amplicon libraries at less than full overlap?

We have been asked to run thunderbolts libraries that recommend 2x 250 bp to achieve full overlap but we only have access to 2x 150.

Is this a total waste of time or just not ideal?

Ideally we'd want to reach variant frequency of <<10%

Thanks
The_Roads is offline   Reply With Quote
Old 10-15-2015, 05:34 AM   #2
MU Core
Member
 
Location: Columbia, Missouri

Join Date: Apr 2008
Posts: 48
Default

A related question...the ability to identify and remove chimeric amplicons when lacking overlapping sequences has come up for a recent data set. Does anyone have a feel for how this may impact data analysis?
MU Core is offline   Reply With Quote
Old 10-15-2015, 09:45 AM   #3
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,577
Default

You don't need a full overlap for amplicon libraries. You just need enough of an overlap to merge the reads. Depending on your read quality, a aiming for 30-50bp overlap should be fine.

Chimeric non-overlapping amplicons are hard to detect. If you cluster your pairs, and then find pairs in which the two reads map to different clusters, then you could assume that those pairs are chimeric. But the sensitivity and specificity depend completely on the quality of clustering.
Brian Bushnell is offline   Reply With Quote
Old 10-15-2015, 08:09 PM   #4
The_Roads
Member
 
Location: USA

Join Date: May 2009
Posts: 35
Default

Thank you Brian, that makes sense.
The_Roads is offline   Reply With Quote
Old 10-16-2015, 08:55 AM   #5
thermophile
Senior Member
 
Location: CT

Join Date: Apr 2015
Posts: 188
Default

The base quality will be significantly improved if you fully overlap your reads for amplicon sequencing. see http://aem.asm.org/content/79/17/511...7-5d313d15b9a5 for a comparison of the sequence quality for different lenghts of 16s sequencing.

If you're interested in very small number of base differences, you absolutely need to fully overlap. If you need 5-10% differences, maybe you could get away with not fully overlapping. But the cost difference between the 2 kits is only a couple of hundred $, your downstream computation time will be much much greater with marginal sequences which likely will cost thousands rather than hundreds

ETA-my cost estimate for computational time is based on 16S, I've never dealt with any of the cancer panels so don't know how significantly poor quality bases would impact your results

Last edited by thermophile; 10-16-2015 at 08:57 AM.
thermophile is offline   Reply With Quote
Old 10-16-2015, 10:06 AM   #6
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,577
Default

Quote:
Originally Posted by thermophile View Post
The base quality will be significantly improved if you fully overlap your reads for amplicon sequencing. see http://aem.asm.org/content/79/17/511...7-5d313d15b9a5 for a comparison of the sequence quality for different lenghts of 16s sequencing.

If you're interested in very small number of base differences, you absolutely need to fully overlap. If you need 5-10% differences, maybe you could get away with not fully overlapping. But the cost difference between the 2 kits is only a couple of hundred $, your downstream computation time will be much much greater with marginal sequences which likely will cost thousands rather than hundreds

ETA-my cost estimate for computational time is based on 16S, I've never dealt with any of the cancer panels so don't know how significantly poor quality bases would impact your results
I disagree. First off, I'm not sure what computation time you are talking about. How are you incurring thousands of dollars of compute costs from this kind of data?

Second, the data that paper used was low quality and not indicative of what I would expect from a properly-design MiSeq 2x250 amplicon run, using staggered primers and an appropriate amount of spike-in, etc.

Third, errors due to incorrect merges and errors in the reads themselves are conflated; since the former are due to the specific software used for overlapping, and are also a function of the overlap length, you can't really draw a conclusion about the error rates of overlapping reads using any methodology but the one described in the paper. Unfortunately, it's not described in the paper - rather, they sort of hint that it's described here, where I guess it occurs in the make.contigs command. I have not tested that, but would be very surprised if it was the best available tool for the purpose.

Fourth, 2x150 reads have a much lower error rate than 2x250 reads. If they overlap by 50bp, then the only nonoverlapping portion is the first and last 100bp, which have around a peak 0.2% error rate for R1 and 0.5% error rate for R2 (average is lower), including all reads with no quality-filtering. Those are on HiSeq; MiSeq error rates are generally lower.

Longer reads and longer overlaps are better, of course. But 2x150 is viable as long as there is sufficient overlap to merge, and you can tolerate a fraction of a percent error rate in the non-double-sequenced portion.
Brian Bushnell is offline   Reply With Quote
Old 10-16-2015, 12:15 PM   #7
thermophile
Senior Member
 
Location: CT

Join Date: Apr 2015
Posts: 188
Default

If you are sequencing amplicons for 16s you need to cluster the sequences into OTUs. the more sequencing errors you have the more spurious OTUs you generate-which massively increases the memory require to cluster those (assuming you are doing de novo clustering). If you get them clustered, you then will waste a lot of time trying to find meaning in the sequencing noise or you can just throw out all of the rare OTUs which means that you will be throwing out good data along with the bad because you can't tell the difference between the good and bad rares. Ecologically this matters, for a cancer panel-maybe it doesn't.
thermophile is offline   Reply With Quote
Reply

Tags
overlap, targeted amplicons, thunderbolts

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:37 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO