Seqanswers Leaderboard Ad

**pmiguel** · 05-29-2014, 07:27 AM

Originally posted by dtm2451 View Post

Hello,

I am working on a deep sequencing protocol for a PCR amplicon (~650bp) using the TruSeq DNA PCR-Free Sample Preparation Kit and I am seeing extra peaks in my final bioanalyzer traces that concern me because I don't know what they might come from.

Peaks on the trace:
+Small peak at size of original insert
+Medium-sized peak that I think might correspond to insert+1adapter
+Large peak that I think might correspond to insert+2 adapters
-mini peak to the right of the "insert" peak
-mini peak to the right of the "insert+1adapter" peak
-mini peak to the right of the "insert+2adapters" peak

Does anyone know what the last three peaks might be?

I am attaching both the TapeStation on the initial PCR material and the bioanalyzer on the final library.

Thanks!
Dan

Hi Dan,

Illumina adapters are about each 60 bases long.

But you may be right. Illumina kit adapters are "Y"-adapters with only about 10 bp of doublestranded DNA and the rest (~50 bases) as double single-stranded tails. Single stranded molecules tend to migrate slower on Agilent chips than corresponding length double stranded molecules. So those Y-adapters may be introducing some drag.

But that would be a lot of drag from a few hundred bases of double ssDNA. Seems unlikely to me.

Another hypothesis would be that the "1539 bp" fragment is migrating slowly because the ligase is still attached to the amplicon.

Another possibility (this is the one I like) is that the "901 bp" fragment has both adapters ligated and is running only a little larger than its expected because of the Y-adapters: 656+120=776bp double-stranded length . The 1539 bp fragment would have a double insert, 656+656+120=1432 which would fit pretty well. That would suggest the A-tailing step did not work well -- left a substantial percentage of the ends blunt.

You posted this question long ago, so you could probably update us on your results. But if you used this library, it probably worked okay. For reasons unclear to me, short amplicons seem to cluster vastly better than longer amplicons. So your data set would remain fairly free from chimerics.

--
Phillip

**nucacidhunter** · 05-29-2014, 04:37 PM

For reasons unclear to me, short amplicons seem to cluster vastly better than longer amplicons.

I think preferential clustering of smaller fragments (amplicons) can be explained by bridge amplification process. Library templates are denatured and mixed with hybridisation buffer. At this stage we expect that denatured fragments will stay single stranded and stretched free of secondary structure. If there is hybridization of strands back to their complementary strands at this step, it thermodynamically will be in favour of smaller fragments, hence reducing their number for next step not the long fragments. At next step, denatured fragments are pumped through flow cell lane and every fragments will have relatively equal chance of hybridization to oligo lawn on flow cell surface because they (should) have the complementary adapter sequences. After hybridisation, bridge is formed and amplification mix is pumped through to synthesize complementary strand to bridged fragments. Extension time is quite limited (15 sec in MiSeq) and at this step large fragments are less likely to have end to end synthesis of complementary strands because of pause or dropping out of polymerase. The result would be that those fragments will not be amplified in the next round of amplification (I think there is 30 cycle or so ) and will form weak clusters (if any) with low strand numbers depending on in which cycle this happens. Contrary, small fragments will have high chance of complementary strand synthesis end-to-end and therefore will dominate the properly formed clusters which will produce strong signals for RTA to detect and pass them.

**pmiguel** · 05-30-2014, 07:00 AM

Originally posted by nucacidhunter View Post

I think preferential clustering of smaller fragments (amplicons) can be explained by bridge amplification process. Library templates are denatured and mixed with hybridisation buffer. At this stage we expect that denatured fragments will stay single stranded and stretched free of secondary structure. If there is hybridization of strands back to their complementary strands at this step, it thermodynamically will be in favour of smaller fragments, hence reducing their number for next step not the long fragments. At next step, denatured fragments are pumped through flow cell lane and every fragments will have relatively equal chance of hybridization to oligo lawn on flow cell surface because they (should) have the complementary adapter sequences. After hybridisation, bridge is formed and amplification mix is pumped through to synthesize complementary strand to bridged fragments. Extension time is quite limited (15 sec in MiSeq) and at this step large fragments are less likely to have end to end synthesis of complementary strands because of pause or dropping out of polymerase. The result would be that those fragments will not be amplified in the next round of amplification (I think there is 30 cycle or so ) and will form weak clusters (if any) with low strand numbers depending on in which cycle this happens. Contrary, small fragments will have high chance of complementary strand synthesis end-to-end and therefore will dominate the properly formed clusters which will produce strong signals for RTA to detect and pass them.

Yeah, sound reasonable, but does not gibe with results I have seen. We have run libraries with with insert sizes averaging as high as 1.1kb. (Estimated by mapping pairs back to a reference after sequencing.) qPCR accurately predicted the expected cluster density. Which it would not have if there had been a high level of failure by the polymerase to complete product strand synthesis.

So, whatever the explanation, it needs to account for this. Which leads one to think there must be some sort of competition among amplicons. Something that would allow the shorter amplicons to displace the longer ones and prevent them from creating clusters. That way, if the shorter amplicons are removed, the longer amplicons can form good clusters. However I can't think of a reasonable mechanism of competition. So maybe something else is going on?

--
Phillip

**nucacidhunter** · 05-30-2014, 01:03 PM

We have run libraries with with insert sizes averaging as high as 1.1kb.

I wonder if it was amplicon library comprising same or similarly large sized fragments or was it a library with wide distribution of fragment sizes. I have sequenced such large sized fragment (amplicons) as well (in MiSeq) and I load 1.5x more than usual libraries to compensate for failed clusters. If all fragments are large, partially amplified clusters will be picked up by RTA and pass filter because RTA compensate for low signal intensity when most of clusters have low intensities. However, in library with wide size distribution, clusters from small fragments will have higher intensities and as a result low intensity clusters from large fragments will not pass. In widely distributed Nextera libraries with average 800 bp size I get fragments with 950 bp but they are small portion of sequences. All this indicates partial failure of large fragments’ clusters.

So, whatever the explanation, it needs to account for this. Which leads one to think there must be some sort of competition among amplicons. Something that would allow the shorter amplicons to displace the longer ones and prevent them from creating clusters.

So, maybe it is not that small fragments are competing and displacing large ones (physically less feasible) but it is RTA operation that favours the high intensity small fragments clusters. I wonder if someone changes the recipes to allow for longer extension times during cluster generation if more large fragments will be sequenced.

**pmiguel** · 06-02-2014, 03:37 AM

Originally posted by nucacidhunter View Post

I wonder if it was amplicon library comprising same or similarly large sized fragments or was it a library with wide distribution of fragment sizes. I have sequenced such large sized fragment (amplicons) as well (in MiSeq) and I load 1.5x more than usual libraries to compensate for failed clusters. If all fragments are large, partially amplified clusters will be picked up by RTA and pass filter because RTA compensate for low signal intensity when most of clusters have low intensities. However, in library with wide size distribution, clusters from small fragments will have higher intensities and as a result low intensity clusters from large fragments will not pass. In widely distributed Nextera libraries with average 800 bp size I get fragments with 950 bp but they are small portion of sequences. All this indicates partial failure of large fragments’ clusters.

No, I meant "amplicon" in the general sense. This was a genomic DNA library.
Not sure what is happening with your actual amplicon libraries. But in our case, we just used the concentration based on qPCR and nailed the cluster density.

Originally posted by nucacidhunter View Post

So, maybe it is not that small fragments are competing and displacing large ones (physically less feasible) but it is RTA operation that favours the high intensity small fragments clusters. I wonder if someone changes the recipes to allow for longer extension times during cluster generation if more large fragments will be sequenced.

Nope, as I write above, this does not fit with the actual results we got.

--
Phillip

**nucacidhunter** · 06-02-2014, 05:37 AM

Nope, as I write above, this does not fit with the actual results we got.

I have tried logically to explain my observation based on science behind Illumina sequencing system and I do not have any scientific evidence that small and large fragments are involved in flow cell battles for getting sequenced. All sizes of fragments can attach to flow cell lawn and during clustering small fragments will amplify and form denser clusters than large fragments. During template generation RTA passes those pure clusters (resulting from single template strand) with higher intensity which mostly would be smaller fragments clusters. RTA normalises intensities based on all clusters which is relative and in library with large fragments it will pass the ones that have higher intensity too. Looking at thumbnail photos also shows a lot of background clusters which RTA has not picked up for various reasons. I do respect everyone opinion and I am very interested to see scientific explanations.

**pmiguel** · 06-02-2014, 06:45 AM

Originally posted by nucacidhunter View Post

I have tried logically to explain my observation based on science behind Illumina sequencing system and I do not have any scientific evidence that small and large fragments are involved in flow cell battles for getting sequenced. All sizes of fragments can attach to flow cell lawn and during clustering small fragments will amplify and form denser clusters than large fragments. During template generation RTA passes those pure clusters (resulting from single template strand) with higher intensity which mostly would be smaller fragments clusters. RTA normalises intensities based on all clusters which is relative and in library with large fragments it will pass the ones that have higher intensity too. Looking at thumbnail photos also shows a lot of background clusters which RTA has not picked up for various reasons. I do respect everyone opinion and I am very interested to see scientific explanations.

"Scientific"? What you describe above is a hypothetical. And, yes, I have had Illumina reps tell me the same story. But it does not fit with what I have seen, so I think they are wrong.

You write:

Looking at thumbnail photos also shows a lot of background clusters which RTA has not picked up for various reasons.

What is your basis for writing this? I can look at a thumbnail photo, but how do I tell which clusters "RTA has not picked up for various reasons"?

--
Phillip

**pmiguel** · 06-02-2014, 07:09 AM

By the way, some details of the MiSeq run in question are in this thread.

--
Phillip

**nucacidhunter** · 06-02-2014, 10:33 PM

And, yes, I have had Illumina reps tell me the same story. But it does not fit with what I have seen, so I think they are wrong.

I have not asked Illumina FAS about this and it is my own observation and explanation. Every one is entitled to their opinion and as I said above I respect that.

What is your basis for writing this? I can look at a thumbnail photo, but how do I tell which clusters "RTA has not picked up for various reasons"?

Attached document shows one way to see what I have mentioned. It does not tell which cluster has been picked up by RTA, but as mentioned, it shows clusters that RTA has not picked up.

Attached Files

Go to imaging tab in SAV.pdf (638.9 KB, 229 views)

**pmiguel** · 06-04-2014, 04:56 AM

Originally posted by nucacidhunter View Post

I have not asked Illumina FAS about this and it is my own observation and explanation. Every one is entitled to their opinion and as I said above I respect that.

Attached document shows one way to see what I have mentioned. It does not tell which cluster has been picked up by RTA, but as mentioned, it shows clusters that RTA has not picked up.

Hey nucacidhunter, we are just having a discussion, right? I ask, because you keep mentioning "respect". Like my disagreeing with you might be a sign of disrespecting for you. It isn't. I disagree with people I respect all the time. That, to a first approximation, is the most important element of the scientific method.

Here, as a sign of my respect, is a more fleshed out explanation of why I think what I will call the "RTA-mediated" hypothesis of why shorter amplicons of a given library predominate in Illlumina data sets is not sufficient to explain the actual phenomenon.

I have a particular set of data that makes me think this RTA-mediated hypothesis is not sufficient to explain what is going on. Here is a link to the full thread. But to summarize, we made a "large insert" TruSeq DNA library but used extra/more stringent Ampures to remove shorter fragments. Did 4 cycles of PCR on it (instead of the protocol's recommended 10) and clustered at 4pM (rather than what was normal on the MiSeq at the time -- 8pM).

Here is an Agilent chip of the library we clustered:

Again, we nailed the cluster density using our normal KAPA qPCR calculation using, if I recall correctly, the modal peak size depicted above (1892bp) in the calculation specified in the KAPA kit manual. That would include 120bp of adapters, so think of the inserts as being a modal size of 1772bp, or a little less due some distortion due to DNA mass being assayed by the agilent chip rather than DNA count.

However the result of the run when mapped back to a reference genome with BWA produced pair-end insert length as depicted here:

Okay, one might argue that the lower graph is on a linear scale and represents counts of DNA molecules whereas the top graph is mass based and displayed in the more-or-less log-linear scale that one typically sees from electrophoresis. Again, in the previous thread, I exported the data from the Agilent chip and transformed it so it would be on the same scale as the lower chart so they could be directly compared:

So, it still comports fairly well with the early statement "modal size of 1772bp", or a little less. Certainly no lower than 1600bp.

Keeping that in mind, the RTA-mediated hypothesis fails to explain our hitting cluster density exactly while shifting the size distribution of what was sequenced lower by about 500 bp. If fails because were that the case, the loss of clusters from 1.1 to 1.6 kb and above should have decreased the total number of read pairs. That is, these longer amplicon clusters should have been there physically, but just not detected by RTA. So our effective (RTA-calculated) cluster density should have been much lower than what we calculated using qPCR. But it wasn't.

I don't actually think that the short amplicons are displacing the longer ones from the flowcell during clusters. I think something else, something unknown, is going on. Okay, that is supposition also. But I need some mechanism that allows qPCR to accurately quantitate cluster density for a pool of long amplicons -- that is what I see.

As with all physical phenomena there are plenty of explanations that might explain what I describe above. But I see no reason at all to favor the RTA-mediated explanation for which, other than unsubstantiated claims from Illumina, there is no evidence for.

See what I am saying here? The RTA-mediated explanation is just a story. May have been invented whole-cloth by someone at Illumina and came to be propagated as dogma without any particular evidence. Stuff like that happens all the time. Just because it is superficially reasonable, doesn't mean it is true.

--
Phillip

**nucacidhunter** · 06-04-2014, 05:51 AM

But I need some mechanism that allows qPCR to accurately quantitate cluster density for a pool of long amplicons -- that is what I see.

To continue the discussion I would like to question the quantification of a large insert library (in this case 1-5Kb) by QPCR. I doubt that during QPCR such large fragments will amplify efficiently to enable accurate quantification unless extension time is increased to 4-5 mins in which case I am not sure if KAPA polymerase used for QPCR would be capable of such relatively long range PCR. The default KAPA cycling program is sufficient for amplifying up to 1 kb and in such condition it will only amplify a portion of large fragments (<1kb). The result will be underestimation of library concentration because only a portion of it is quantified. Other issue would be differences in amplification efficiency of standards (~400 bp) with library. I have verified this by running QPCR product on DNA Chip and comparing its size to input amplicon size in large insert libraries. I have found that the amplicon peak of QPCR is substantially lower than actual input DNA.

But I see no reason at all to favor the RTA-mediated explanation for which, other than unsubstantiated claims from Illumina, there is no evidence for.

I have seen this many times and have heard from others about it as well. But I never knew that Illumina explains the observation similar to what I independently came up with.

**pmiguel** · 06-04-2014, 06:10 AM

Originally posted by nucacidhunter View Post

I have seen this many times and have heard from others about it as well. But I never knew that Illumina explains the observation similar to what I independently came up with.

You say you have "seen it". This implies you have observed something. But SAV doesn't actually depict a cluster any differently that has been recognized by RTA from one that hasn't. Nor does SAV offer any way to verify the hypothesis that short amplicons produce brighter or more robust clusters than long amplicons.

--
Phillip

**pmiguel** · 06-04-2014, 08:01 AM

Originally posted by nucacidhunter View Post

The result will be underestimation of library concentration because only a portion of it is quantified. Other issue would be differences in amplification efficiency of standards (~400 bp) with library. I have verified this by running QPCR product on DNA Chip and comparing its size to input amplicon size in large insert libraries. I have found that the amplicon peak of QPCR is substantially lower than actual input DNA.

Okay, let's see the before and after (q)PCR DNA Chips.

Also, wasn't it necessary to remove the SYBR green, etc. from the qPCR reaction prior to running the chip? What method did you use?

--
Phillip

**nucacidhunter** · 06-06-2014, 03:09 AM

The question is why short amplicons sequence better than large ones. The evidence is that when a library with broad size distribution is sequenced, after mapping reads, one finds that average size or peak of mapped fragments is smaller than input library indicating preferential sequencing of smaller library fragments. This was not important in earlier days when the libraries were size-selected in a narrow range and multiplexing was not very wide spread. But since introduction of gel free library prep kits (bead based size-selection resulting in libraries with wide distribution of fragment sizes), wide spread use of transposon mediated broad library preps and increased output of platforms it has become more important. When pooling libraries with different insert sizes for sequencing this should be taken into account to obtain desired proportionate number of reads from each library.

My answer as suggested in this thread is “RTA-mediated hypothesis”. Short fragments are more efficient in forming clusters because during bridge amplification it is more likely for polymerase to synthesis a full complementary strand (end to end) for short fragments than large ones due to limited extension time (15 sec in MiSeq). During template generation (early 4-5 cycles) RTA uses signal intensities from images and calls bases from normalised (taking colour cross-talk and phasing correction into account) intensities. Raw data are filtered to remove reads that do not meet signal purity threshold, overlapping and low intensity clusters. At this step in a population of small and large fragment clusters, small ones would have higher intensity (it is proportional to strand number resulting from amplification efficiency) and therefore are preferentially detected and their base composition is called. But large fragments because of less efficient amplification will have less intensity and would not be favoured by RTA. Of course, in a flow cell lane with larger fragments most of the clusters would have less intensity if compared to a lane with predominantly small fragments. But because RTA detection of clusters is relative (normalised intensity not raw), they still are detected and bases are called.

The argument against this is evidence from a large library sequencing (1-5 Kb) in which qPCR predicted cluster density was achieved. I have two arguments against this. Firstly, cluster density and library input are not linear. For example, if 12 pM input gives 800K cluster /mm of a flow cell lane, 8 pM input will not result in 600k cluster. Secondly, quantifying large fragments with KAPA qPCR is not accurate because the standards are 400 bp and their amplification efficiency would be more than large fragments in 1-5 kb range as in this case. In addition, if extension time is not increased significantly, large fragments will drop and only a small portion of library will be amplified and quantified resulting in underestimation of quantity. The attachment in this post is ScreenTape profiles from input library and output from the qPCR reaction showing preferential amplification of smaller fragments during PCR. The qPCR reactions were purified using 1.8x AMPure beads to remove salts, polymerase, SYBR and nucleotides.

But SAV doesn't actually depict a cluster any differently that has been recognized by RTA from one that hasn't. Nor does SAV offer any way to verify the hypothesis that short amplicons produce brighter or more robust clusters than long amplicons.

I agree that SAV does not indicate cluster fragment size or which clusters have been selected or passed filter. The pictures I have attached above post is showing that even though image for C channel shows brighter and more clusters than other bases, it has the lowest call for that cycle, swath, tile and surface (image of one spot in one cycle only not average) indicating that RTA has not picked up all possible clusters as RTA-mediated hypothesis predicts.

Attached Files

Input and qPCR output.pdf (147.9 KB, 180 views)

Topics	Statistics	Last Post
New Software Simplifies 3D Gene Expression Mapping by seqadmin Started by seqadmin, Yesterday, 10:17 AM	0 responses 7 views 0 reactions	Last Post by seqadmin Yesterday, 10:17 AM
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 59 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM
Mapping the snoRNAome in Zebrafish to Advance Disease Research by seqadmin Started by seqadmin, 03-18-2025, 12:50 PM	0 responses 50 views 0 reactions	Last Post by seqadmin 03-18-2025, 12:50 PM

Seqanswers Leaderboard Ad

Extra peaks in a bioanalyzer trace for library made from ePCR amplicon

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News