SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Trying to apply overlap to each row of a "GRangesList" francy Bioinformatics 13 09-29-2015 08:29 AM
Paired reads overlap "too much" Retro Illumina/Solexa 7 12-04-2014 03:33 AM
MiSeq gDNA reads still fail "Kmer content" and "per base seq content" after trimming" ysnapus Illumina/Solexa 4 11-12-2014 07:25 AM
DEXSeq error in estimateDispersions: match.arg(start.method, c("log(y)", "mean")) fpadilla Bioinformatics 14 07-03-2013 02:11 PM
The position file formats ".clocs" and "_pos.txt"? Ist there any difference? elgor Illumina/Solexa 0 06-27-2011 07:55 AM

Reply
 
Thread Tools
Old 02-14-2017, 01:21 PM   #1
nouse
Member
 
Location: Germany

Join Date: Sep 2013
Posts: 10
Default What is the official "first sequenced position" (for overlap calculation)??

The question ermerged after reviewing this older application note by Illumina:

https://www.illumina.com/content/dam..._miseq_16S.pdf

Their figure 1 indicates that the first position which is counted towards the targeted read length is the first base following the 3' end of the gene-specific primer.

Hence, the 515F-806R system targeting the V4 of the 16S rRNA gene is perfectly useable with a 2 x 150 base pairs MiSeq run, because of an 46 bp overlap within the 253 bp fragment covered between both gene-specific primers.

However, if one would assume that the first base counting for the target read length is the base following the sequencing primer, that obviously changes. So, if one would count in both primers, the overlap is reduced to <10 bases, even in the predictably worst part of boths read, qualitywise. Adding barcode(s) would even result in no overlap.

I hope this examples clarifies the question.

The underlying task is to find a primer pair that is feasible with MiSeq 2 x 250 with very good coverage and HiSeq 2 x 150 with lower coverage but higher yields.

I doubt that 2x150 bp HiSeq is a good system for hiseq. However, it seems that its OK to just use the forward read (according to caporaso et al 2011). What do you think?

Last edited by nouse; 02-14-2017 at 01:26 PM.
nouse is offline   Reply With Quote
Old 02-14-2017, 04:38 PM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,454
Default

For amplicons, my understanding is (and I would appreciate if someone closer to the wet-lab could confirm or deny this) that the molecules being sequenced are laid out like this:

[adapter1][barcode1][more adapter1][sequencing primer1][pcr primer1] actual genomic sequence [pcr primer2][sequencing primer2][more adapter2][barcode2][adapter2]

The sequencing primers are not (usually) part of the read, unless you are using staggered variable-length primers to increase library diversity, but in that case only a few bp of it get sequenced. The PCR primers are always part of the read. I think that whether the PCR primers are genomic or synthetic depends on the process; I've never really gotten a conclusive answer on that.
Brian Bushnell is offline   Reply With Quote
Old 02-15-2017, 12:04 AM   #3
nouse
Member
 
Location: Germany

Join Date: Sep 2013
Posts: 10
Default

Thanks for your answer.
From my experience with the HiSeq, the raw sequences i got included barcodes and pcr primers (which makes sense, since they have been sequenced after all).

This indicates that the figure 1 of the illumina application note is either misleading or wrong or they used their pcr primer regions as a target for another sequencing round.
nouse is offline   Reply With Quote
Old 02-15-2017, 01:39 PM   #4
nucacidhunter
Senior Member
 
Location: Iran

Join Date: Jan 2013
Posts: 907
Default

Brian’s explanation showing amplicon library structure is correct. However, I would add that if there are variable length diversity nucleotides or barcodes at 5’ end of either PCR primers they will be sequenced as well along with the PCR primers. If someone uses custom sequencing primers that binds to the PCR primers then PCR primers will not be sequenced (sequencing primers will not be required to be included in adapter design). In this case diversity nucleotides added to 5’ end of primers will not be useful because they cannot be sequenced.

Fig 1 in Illumina’s note indicates that the hypervariable region is 254 bp and the minimum length of amplified region including conserved 5’ and 3’ flanking regions (used for priming) is 291 bp so 2x150 will not be enough to provide 46 bp overlap unless custom primers were used for sequencing. But the figure indicates that standard Illumina sequencing primers were used for sequencing thus the figure is incorrect.
nucacidhunter is offline   Reply With Quote
Old 02-16-2017, 05:37 AM   #5
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,120
Default

Quote:
Originally Posted by Brian Bushnell View Post
For amplicons, my understanding is (and I would appreciate if someone closer to the wet-lab could confirm or deny this) that the molecules being sequenced are laid out like this:

[adapter1][barcode1][more adapter1][sequencing primer1][pcr primer1] actual genomic sequence [pcr primer2][sequencing primer2][more adapter2][barcode2][adapter2]

The sequencing primers are not (usually) part of the read, unless you are using staggered variable-length primers to increase library diversity, but in that case only a few bp of it get sequenced. The PCR primers are always part of the read. I think that whether the PCR primers are genomic or synthetic depends on the process; I've never really gotten a conclusive answer on that.
It is not always the case that the PCR primers are part of the read. In the two most cited 16S-V4 protocols (Caporaso & Knight, Kozich & Schloss) custom sequencing primers which match the target specific PCR primer are added to the MiSeq run. This results in read data which starts immediately after the 3' ends of the PCR primers so there is no PCR primer sequence to trim from your reads, and hence no wasted sequence.

Caporaso, J. G., Lauber, C. L., Walters, W. A., Berg-Lyons, D., Lozupone, C. A., Turnbaugh, P. J., et al. (2011). Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proceedings of the National Academy of Sciences, 108 Suppl 1, 4516–4522. http://doi.org/10.1073/pnas.1000080107

Kozich, J. J., Westcott, S. L., Baxter, N. T., Highlander, S. K., & Schloss, P. D. (2013). Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the MiSeq Illumina sequencing platform. Applied and Environmental Microbiology, 79(17), 5112–5120. http://doi.org/10.1128/AEM.01043-13
kmcarr is offline   Reply With Quote
Old 02-16-2017, 06:09 AM   #6
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,454
Default

Quote:
Originally Posted by kmcarr View Post
In the two most cited 16S-V4 protocols (Caporaso & Knight, Kozich & Schloss) custom sequencing primers which match the target specific PCR primer are added to the MiSeq run. This results in read data which starts immediately after the 3' ends of the PCR primers so there is no PCR primer sequence to trim from your reads, and hence no wasted sequence.
Oh, that's clever. I wonder if that caused some compromises that limit the diversity of organisms that will amplify? I guess I need to read the papers
Brian Bushnell is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:32 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO