Unconfigured Ad

**cement_head** · 08-19-2016, 05:53 AM

I've been having horrible over-clustering issues with 16S and RNA-Seq (microbial) runs on our MiSeq.

The most recent was a 16S amplicon run (600 bp) and a 3.5 pM library + (25% spike-in) 12.5 pM PhiX on a 300v2 kit. Got cluster density of >1400 K/mm2.

This has never happenend at this [library] for 16S. Should I go as low as 1.8 pM ~ 2 pM?

**LVAndrews** · 08-19-2016, 09:51 AM

Your math is off. 12.5pM PhiX on a v2 kit should yield density of PhiX alone in the 850-1000k/mm range. Your total loading concentration if your quantification is correct will have been 16pM (78% PhiX), much too high for a low-diversity pool, especially on a v2 kit.

First, make sure you are using qPCR for quantification. Picogreen quants tend to underestimate concentration, especially when you have an amplified pool (amplicons), and this leads to over clustering. For 16S v4 (515-806) I do an 11pM total load on v2 kits with 30% PhiX (3.3pM PhiX, 7.7pM library). If your qPCR has crappy efficiency (must be within 0.95-1.05), you will need to do it over before loading. Good idea also to run a gel of the qPCR products before pulling sequencing kit from freezer to ensure qPCR didn't pick up any small artifacts the bioanalyzer might miss. This can also cause overclustering and/or low percentage PF.

Again, I can't stress enough that picogreen is not going to work for effective quantification here (nanodrop even worse). It is a double-stranded binding dye. Amplified pools are full of heteromeric strands that lack perfect or even good homology for substantial stretches of DNA so the dye isn't giving you a fair representation of DNA content. qPCR only will quantify the clusterable fraction of your pool and if your efficiency is good, it will do it perfectly.

**cement_head** · 08-22-2016, 09:05 AM

Thanks - this is very different than the Illumina recommended 12.5 pM PhiX (high) spike-in loads.

However, I don't understand how you are getting your numbers. How much (volume) of the PhiX and how much (volume) of the library do you load? I don't understand how you are getting "16 pM"? 150 microL of 12.5 pM PhiX is 3.125 pM (in final volume of 600 microL) + 450 microL of 3.5 pM library, which gives a final concentration of 2.625 pM in 600 microL. This would give a final DNA concentration (combined library + Phix) of 5.75 pM, not 16 pM. Am I missing something?

We've come to the same conclusion as you with respect to SYBR nanodrop vs qPCR. The qPCR provides much better quantitation as you mentioned.

**LVAndrews** · 08-22-2016, 09:14 AM

As your insert size increases, you must also decrease load concentration. For fungal ITS2 or v4-v5 (515-926) inserts I load at 8pM total with the same 30% PhiX.

My understanding is that 12.5pM is the standard use of PhiX for "validation" runs (when they send you a 50 cycle kit to run to make sure the thing is focusing right, etc). We used to have to keep in 30-50% PhiX and hard-code the phasing/prephasing values to even make amplicon sequencing work, but a software update in 2013 has allowed us to get away with far lower PhiX since (>5%). I can't help but feel like the quality diminishes when PhiX is reduced below ~15% and having extra PhiX helps to keep your run going even if you still overcluster slightly. Make sure you read Schirmer et al (2015; http://nar.oxfordjournals.org/content/43/6/e37) with regard to how much you should trust the reported q-scores when doing amplicon sequencing. You really need the run to perform as optimally as possible.

**cement_head** · 10-22-2016, 09:57 AM

Originally posted by chen View Post

We've got similar issue with the Miseq, low density at 122K with 0.2N NaOH according to Miseq user guide. It's a polyA mRNA lib using the NEB kit, with 750bp length on average. The final concentration of loading lib is 11pM.

Is it the long lib that cause the problem or the NaOH concentration

?
any suggestion?

The long library - the optimal is about 600 bp, anything longer doesn't work well.

**cement_head** · 10-22-2016, 10:05 AM

Ok, just about finished a run with 4 pM amplicon 16S V4 region (emp primers) with a 10% PhiX spike in, and I'm getting 1100 K/mm2. Is this a little high? Should I be going for 800 K/mm2?

**LVAndrews** · 10-22-2016, 10:09 AM

1100 may be high. Just watch your phasing/prephasing. With long inserts you could see loss of data after turn-around.

I ran some ITS inserts recently that had full construct lengths ~650bp with good results. I loaded a v3 kit (2x300) at 7pM total with 20% PhiX. Clustered at ~800k, excellent results.

Quant was done with KAPA stds. Had efficiency of 1.0 and R^2 0.999.

**LVAndrews** · 10-22-2016, 10:52 AM

When I run those primers for 16S v4 I load at 8pM with 30% PhiX. Of course the Illumina people will tell you you are wasting reads with the extra phix, but I worry a lot about cluster mixing, especially with those primers because they are single-indexed.

Check your qPCR results and make sure your efficiency was 1.0 +/- 5%. If it is not in this range, you should redo your quant. I'm guessing with no real evidence whatsoever that your efficiency may have been low, causing you to underquantify your pool, resulting in an overload (check your % aligned to be sure). If you get low efficiency, try increasing the length of both annealing and extension time by 10 seconds and see if it improves. I quant 16S pools with 500nM primer in the qPCR reactions.

**cement_head** · 10-23-2016, 05:46 AM

Originally posted by AKrohn View Post

When I run those primers for 16S v4 I load at 8pM with 30% PhiX. Of course the Illumina people will tell you you are wasting reads with the extra phix, but I worry a lot about cluster mixing, especially with those primers because they are single-indexed.

Check your qPCR results and make sure your efficiency was 1.0 +/- 5%. If it is not in this range, you should redo your quant. I'm guessing with no real evidence whatsoever that your efficiency may have been low, causing you to underquantify your pool, resulting in an overload (check your % aligned to be sure). If you get low efficiency, try increasing the length of both annealing and extension time by 10 seconds and see if it improves. I quant 16S pools with 500nM primer in the qPCR reactions.

Yeh, okay - I feel it was high as well. My efficiency from KAPA was 113% and the r2 was 0.99xxx But, we loaded the PhiX at 10%, down a little from the normal 20%, which may have been a tad low (in retrospect).

I'll also check the phasing.

Thanks

**cement_head** · 10-24-2016, 05:31 AM

For the record, what should the phasing/pre-phasing be for a V2 kit? (Ideally)

**LVAndrews** · 10-24-2016, 04:02 PM

Phasing/prephasing are measures of signal purity. Ideally they are zero, but that will never happen. It is an estimate more or less of the ratio of fragments per cluster that are either gaining an extra base or failing to properly incorporate a base each cycle. The higher each value, the earlier you will observe tanking q-scores over the course of a read. I guess I prefer them in the 0.1-0.2% range, but can live with rates as high as 0.5%. I've even had a run with up to 0.9% turn out OK. But if you see super high phasing/prephasing, it could be an indication of bad reagents (if you didn't overcluster) and you may want to contact Illumina support and report the kit number to see if others have had problems.

**lorendarith** · 10-27-2016, 06:01 AM

Originally posted by AKrohn View Post

Amplified pools are full of heteromeric strands that lack perfect or even good homology for substantial stretches of DNA so the dye isn't giving you a fair representation of DNA content.

Sorry but you could elaborate on this a little bit or could you point to some references where it's addressed? thx

**LVAndrews** · 10-27-2016, 10:03 AM

I don't know of a pub describing this jackstraw/birds nesting effect, but there was a nice pub last month showing the effect of high cycling conditions on chimera formation (http://www.nature.com/nbt/journal/v3.../nbt.3601.html). Basically if you use too many cycles (>20-25), you will find chimeric sequences.

My understanding of the birds nest effect is from trying to amplify loci that are in low abundance relative to the total metagenome (eg, AMF-specific ITS primers for soil samples), and then dealing with samples which produced LMW artifacts during this PCR. No matter what, the artifact cannot be adequately removed, especially if more PCR cycles are run. I have done bead cleanups, gel extractions, even pippin prep and nothing removes them, so we came up with the idea of the artifact having enough homology to the target fragment to allow them to anneal to target fragments such that they cannot be removed, and will resolve on a gel amid the target fragment. Evidence for this comes from trying to sequence such pools and seeing the adapter contamination show up in the data, or in the SAV data where you see an early drop in q scores.

**lorendarith** · 11-02-2016, 08:55 AM

Thank you for the insight.

Originally posted by AKrohn View Post

Evidence for this comes from trying to sequence such pools and seeing the adapter contamination show up in the data, or in the SAV data where you see an early drop in q scores.

Why would there be a drop in quality if adapter strands are being sequenced? I don't understand the rationale behind it because I've seen top runs where it turned out that mostly/only adapters got sequenced in the end (though these were HiSeq 100/200 cycle runs).
Are you aiming at issues with resolving clusters if suddenly one channel is too bright because the same base (in the adapter) is getting sequencing or what else?

**LVAndrews** · 11-02-2016, 09:55 AM

Once you have sequenced through adapters, the garbage sequence that is produced afterward has very low q-scores. If you have adapter/primer sequences that are ~60mer and you are sequencing on a 2x250 kit, then if you see in the SAV data a drop in the q-score plot after ~60 bases, you can probably attribute this to adapter contamination.

Topics	Statistics	Last Post
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 12 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 48 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 106 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 125 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM

Unconfigured Ad

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News