View Single Post
Old 06-04-2014, 04:56 AM   #11
Senior Member
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317

Originally Posted by nucacidhunter View Post
I have not asked Illumina FAS about this and it is my own observation and explanation. Every one is entitled to their opinion and as I said above I respect that.

Attached document shows one way to see what I have mentioned. It does not tell which cluster has been picked up by RTA, but as mentioned, it shows clusters that RTA has not picked up.
Hey nucacidhunter, we are just having a discussion, right? I ask, because you keep mentioning "respect". Like my disagreeing with you might be a sign of disrespecting for you. It isn't. I disagree with people I respect all the time. That, to a first approximation, is the most important element of the scientific method.

Here, as a sign of my respect, is a more fleshed out explanation of why I think what I will call the "RTA-mediated" hypothesis of why shorter amplicons of a given library predominate in Illlumina data sets is not sufficient to explain the actual phenomenon.

I have a particular set of data that makes me think this RTA-mediated hypothesis is not sufficient to explain what is going on. Here is a link to the full thread. But to summarize, we made a "large insert" TruSeq DNA library but used extra/more stringent Ampures to remove shorter fragments. Did 4 cycles of PCR on it (instead of the protocol's recommended 10) and clustered at 4pM (rather than what was normal on the MiSeq at the time -- 8pM).

Here is an Agilent chip of the library we clustered:

Again, we nailed the cluster density using our normal KAPA qPCR calculation using, if I recall correctly, the modal peak size depicted above (1892bp) in the calculation specified in the KAPA kit manual. That would include 120bp of adapters, so think of the inserts as being a modal size of 1772bp, or a little less due some distortion due to DNA mass being assayed by the agilent chip rather than DNA count.

However the result of the run when mapped back to a reference genome with BWA produced pair-end insert length as depicted here:

Okay, one might argue that the lower graph is on a linear scale and represents counts of DNA molecules whereas the top graph is mass based and displayed in the more-or-less log-linear scale that one typically sees from electrophoresis. Again, in the previous thread, I exported the data from the Agilent chip and transformed it so it would be on the same scale as the lower chart so they could be directly compared:

So, it still comports fairly well with the early statement "modal size of 1772bp", or a little less. Certainly no lower than 1600bp.

Keeping that in mind, the RTA-mediated hypothesis fails to explain our hitting cluster density exactly while shifting the size distribution of what was sequenced lower by about 500 bp. If fails because were that the case, the loss of clusters from 1.1 to 1.6 kb and above should have decreased the total number of read pairs. That is, these longer amplicon clusters should have been there physically, but just not detected by RTA. So our effective (RTA-calculated) cluster density should have been much lower than what we calculated using qPCR. But it wasn't.

I don't actually think that the short amplicons are displacing the longer ones from the flowcell during clusters. I think something else, something unknown, is going on. Okay, that is supposition also. But I need some mechanism that allows qPCR to accurately quantitate cluster density for a pool of long amplicons -- that is what I see.

As with all physical phenomena there are plenty of explanations that might explain what I describe above. But I see no reason at all to favor the RTA-mediated explanation for which, other than unsubstantiated claims from Illumina, there is no evidence for.

See what I am saying here? The RTA-mediated explanation is just a story. May have been invented whole-cloth by someone at Illumina and came to be propagated as dogma without any particular evidence. Stuff like that happens all the time. Just because it is superficially reasonable, doesn't mean it is true.

pmiguel is offline   Reply With Quote