SEQanswers

Go Back   SEQanswers > Applications Forums > Sample Prep / Library Generation



Similar Threads
Thread Thread Starter Forum Replies Last Post
Insert Sizes for Paired End Reads Exactly the same as Read Length rlowe Bioinformatics 0 06-27-2012 04:01 AM
longest possible insert length & variable insert lengths lcarey Illumina/Solexa 0 06-12-2012 11:05 PM
About Insert, Insert size and MIRA mates.file aarthi.talla 454 Pyrosequencing 1 08-01-2011 01:37 PM
insert sizes nozzer Bioinformatics 1 07-09-2010 05:49 AM

Reply
 
Thread Tools
Old 08-02-2012, 05:30 AM   #1
pjuneja
Member
 
Location: United Kingdom

Join Date: Aug 2011
Posts: 12
Default Nextera insert sizes larger than expected

I'm making Nextera libraries on mosquito genomic DNA using Illumina's Nextera kit. I ran the protocol exactly as stated and ran my resulting libraries on a Bioanalyzer HS DNA chip. My insert sizes are larger than I expected (~1000bp observed versus 300-500bp expected) [picture attached--apologies for the crude display of fragment size].

I spoke with Illumina's technical support and they suggested that this was due to:
- Too much starting material
- Tagmentation incubation too short.
- Tagmentation incubation not warm enough.

The Illumina protocol states that "libraries with an average size >1kb may require clustering at several concentrations to achieve optimal density," which suggests to me that this isn't a problem and that the libraries can still be sequenced with a bit of optimization.

We're aligning back to a reference sequence so I don't think large insert sizes will be a problem from the bioinformatic perspective (in fact, they may help with mapping!).

Is this something I should be concerned with?
Attached Images
File Type: jpg Sample2.jpg (75.9 KB, 626 views)
pjuneja is offline   Reply With Quote
Old 08-02-2012, 09:45 AM   #2
pjuneja
Member
 
Location: United Kingdom

Join Date: Aug 2011
Posts: 12
Default

Here are the suggestions offered by tech support:
-Make sure that there's isn't any ethanol carryover from DNA extraction
-Elute DNA in water. I used Qiagen AE buffer which is TE, and apparently there is concern that EDTA might interfere with enzyme activity. Interesting since the Epicentre protocol suggested using TE...
-Check that thermocycler is operating at the correct temperature
-Decrease amount of starting material
-If we do decide to go ahead with the sequencing, clusters need to be generated at a lower density because large fragments don't cluster as well.

I'm going to try making the libraries again with varying amounts of DNA eluted in water since we don't want to reduce the amount of sequence that we generate. Will update once I know how well the modifications work.
pjuneja is offline   Reply With Quote
Old 08-02-2012, 01:09 PM   #3
Bucky
Member
 
Location: Madison

Join Date: Feb 2010
Posts: 18
Default

We have seen the same thing. As previously suggested, reducing the amount of starting material to 25-30 ng helps. Also, try clustering with 6pM. That should result in about 500-600k clusters per mm2. That worked well for us with longer fragments.
Bucky is offline   Reply With Quote
Old 08-03-2012, 03:23 AM   #4
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,236
Default

Was the HS chip run on the sample before or after amplification? I presume before. (If after, then you may just be seeing the old double-peak phenomenon -- called "bird-nesting" by Epicentre.)

If you want to obtain sequence from the stuff around 1 kb, you would need to get rid of (size select) the shorter fragments. For reasons I don't comprehend, DNA above 1 kb just does not compete well against the shorter fragments during clustering.

If you just run the library as-is there is a possibility that your results may be biased. That is the Nextera tagmentation transposase may have found pockets of the genome you are sequencing that it really likes, and others it does not. Hence the vaguely bimodal distribution that you see.

My tendency would be to do a time series, collecting fractions at intervals. Then pool it all, run it on a gel and size select to something reasonable (400-600?). That way, if the transposase is biased you will get a mixture of all the genomic pockets it lands as the larger fragments become more tagmented in the later time fractions.

Or, you could give it a shot and see if your are seeing high bias.

--
Phillip
pmiguel is offline   Reply With Quote
Old 08-03-2012, 04:36 AM   #5
pjuneja
Member
 
Location: United Kingdom

Join Date: Aug 2011
Posts: 12
Default

The image is post-amplification and post-AMPure clean up. I ran the same sample pre-amplification, and the peak was at the same place. That suggests to me that it wasn't due to "bird-nesting," and tech support agreed.

Thanks for the very helpful replies!
pjuneja is offline   Reply With Quote
Old 08-08-2012, 07:37 AM   #6
pjuneja
Member
 
Location: United Kingdom

Join Date: Aug 2011
Posts: 12
Default

I tried re-making my Nextera libraries with two modifications.

1) I suspended my starting DNA in water instead of TE since there was some concern about EDTA interfering with tagmentation.

2) I tried reducing the amount of starting material (30ng vs 50ng).

Sample 1=Tagmented DNA post-Zymo cleanup, 30ng starting material in H20
Sample 2=Tagmented DNA post-Zymo cleanup, 50ng starting material in H20
Sample 3=Final library, post-PCR cleanup, 30ng starting material in H20
Sample 4=Final library, post-PCR cleanup, 50ng starting material in H20

From this, it's clear that starting with DNA in TE or water gives exactly the same results since Sample 4 looks exactly like the libraries prepared in my first attempt. Reducing my amount of input DNA to 30ng did not seem to help since this led to a lower insert size than desired (Sample 3)!

I guess I'll next try running my tagmentation for different lengths of time and at different temps and running the DNA on a chip.
Attached Images
File Type: jpg electropherogram (1).jpg (75.7 KB, 578 views)
pjuneja is offline   Reply With Quote
Old 08-08-2012, 07:50 AM   #7
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,236
Default

Quote:
Originally Posted by pjuneja View Post
I tried re-making my Nextera libraries with two modifications.

1) I suspended my starting DNA in water instead of TE since there was some concern about EDTA interfering with tagmentation.

2) I tried reducing the amount of starting material (30ng vs 50ng).

Sample 1=Tagmented DNA post-Zymo cleanup, 30ng starting material in H20
Sample 2=Tagmented DNA post-Zymo cleanup, 50ng starting material in H20
Sample 3=Final library, post-PCR cleanup, 30ng starting material in H20
Sample 4=Final library, post-PCR cleanup, 50ng starting material in H20

From this, it's clear that starting with DNA in TE or water gives exactly the same results since Sample 4 looks exactly like the libraries prepared in my first attempt. Reducing my amount of input DNA to 30ng did not seem to help since this led to a lower insert size than desired (Sample 3)!

I guess I'll next try running my tagmentation for different lengths of time and at different temps and running the DNA on a chip.
Actually, I think that sample 2 has some >12 kb stuff in it that ended up running into sample 3. Which may sound crazy, but over time I have come to the conclusion that some lanes share part of the same paths. So if they do not completely clear, high molecular weight stuff from an earlier well can end up in a later one.

Anyway, a time course does sound like a good choice.

--
Phillip
pmiguel is offline   Reply With Quote
Old 08-08-2012, 08:31 AM   #8
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,352
Default

Quote:
Originally Posted by pmiguel View Post
Actually, I think that sample 2 has some >12 kb stuff in it that ended up running into sample 3. Which may sound crazy, but over time I have come to the conclusion that some lanes share part of the same paths. So if they do not completely clear, high molecular weight stuff from an earlier well can end up in a later one.

Anyway, a time course does sound like a good choice.

--
Phillip
I'm pretty sure all detection takes place in the exact same channel...and this is definitely a phenomenon of seeing previous samples bleeding over into the current sample.
ECO is offline   Reply With Quote
Old 08-08-2012, 10:30 AM   #9
pjuneja
Member
 
Location: United Kingdom

Join Date: Aug 2011
Posts: 12
Default

Ah, good to know! I don't have too much experience with Bioanalyzers. Regardless, it doesn't seem that reducing the amount of starting material to 30ng solved my problem.
pjuneja is offline   Reply With Quote
Old 08-21-2012, 04:52 AM   #10
pjuneja
Member
 
Location: United Kingdom

Join Date: Aug 2011
Posts: 12
Default

I tried incubating my samples on a heat block instead of in a PCR machine in case our machine is mis-calibrated, and I tried increasing the tagmentation step to 10 minutes. My size inserts still has a peak around 1kb.

I'm wondering if I'm losing my small fragment sizes during my Zymo clean up. I've been using the column Zymo kit instead of the plate kit since I'm processing a small number of samples. I've been using the spin speeds from the Zymo protocol, but I'm wondering if I need to reduce them. I also noticed that the Zymo protocol suggests a DNA binding buffer:sample ratio of 5:1 for DNA fragments, whereas the Illumina protocol uses 3.6:1. (Interestingly, the original Epicentre protocol used 5:1). Does anyone have a modified protocol using columns that they could share?
pjuneja is offline   Reply With Quote
Old 08-21-2012, 07:35 AM   #11
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,236
Default

What QC do you do on your genomic DNA? Might be worth doing an RNAse treatment + Ampure (or whatever) clean-up on it.

rRNA would not be subject to tagmentation. Not sure what size it would run after degrading via the heat and divalent cations likely present in tagmentation and PCR. But I frequently see genomic DNA preps that are >90% RNA. (The record was >99.9% RNA, almost a pure RNA prep.) There is far more RNA than DNA in most cells. A lot of (especially old-school) genomic DNA preps ignore it. So, although a long shot, I thought I should mention it.

--
Phillip

--
Phillip
pmiguel is offline   Reply With Quote
Old 08-22-2012, 01:12 PM   #12
creeves
Member
 
Location: East Bay

Join Date: Jul 2012
Posts: 26
Default Bioanalyzer data and cluster density

I have been having very similar problems hitting the cluster density sweet spot with Nextera libraries of yeast genomic DNA. HS chip electropherograms of samples after tagmentation or after PCR (when following the Nextera protocol without modification) has given size distributions peaking at 1 kb or higher with the cluster densities using a 10 pM load of such libraries in the 300-400 K/mm2 range. I modified the Nextera protocol three ways (20 ng starting gDNA, 8 PCR cycles with 1 min extension time) and got a poor yield of DNA with a skew to fragments that were too small (peak around 200 bp). I didn't run this last one on the MiSeq. I also tried using the Nextera XT kit and this gave better size distribution, but the cluster density was even lower. I will continue trying modifications to the Nextera protocol, but if anyone knows the secret to 800 K/mm2 and 2 Gb of data I would like to hear it. Thanks!
creeves is offline   Reply With Quote
Old 08-28-2012, 10:04 AM   #13
robertah
Junior Member
 
Location: California

Join Date: Aug 2012
Posts: 2
Default

I also have had problems with getting Nextera libraries of the right size distribution (insert sizes larger than 1kb). I use mouse and human DNA.

Having super clean DNA does help. So, instead of using ethanol precipitation to concentrate DNA, I now use the DNeasy Blood & Tissue Kit, but elute in only 30uL water.

Varying the tagmentation time did not help. Using Qiagen Minelute columns seems to work fine (I don't use Zymo). I haven't tried using less than 50ng DNA, but I will in the future.

I have also found that non-ideal libraries sequence just fine (size distribution from around 400bps to 1kb).
robertah is offline   Reply With Quote
Old 09-06-2012, 11:07 AM   #14
MoritzF
Member
 
Location: Europe

Join Date: Sep 2012
Posts: 10
Default

Quote:
Originally Posted by pjuneja View Post
I tried incubating my samples on a heat block instead of in a PCR machine in case our machine is mis-calibrated, and I tried increasing the tagmentation step to 10 minutes. My size inserts still has a peak around 1kb.

I'm wondering if I'm losing my small fragment sizes during my Zymo clean up. I've been using the column Zymo kit instead of the plate kit since I'm processing a small number of samples. I've been using the spin speeds from the Zymo protocol, but I'm wondering if I need to reduce them. I also noticed that the Zymo protocol suggests a DNA binding buffer:sample ratio of 5:1 for DNA fragments, whereas the Illumina protocol uses 3.6:1. (Interestingly, the original Epicentre protocol used 5:1). Does anyone have a modified protocol using columns that they could share?
I'm using the Zymo plate system as suggested in the illumina nextera protocoll and I get the same giant fragments. So it doesn't seem to be the columns that cause the problem...
MoritzF is offline   Reply With Quote
Old 11-26-2012, 07:04 AM   #15
creeves
Member
 
Location: East Bay

Join Date: Jul 2012
Posts: 26
Default

This seems to be the best thread in which to post this info.

Since my earlier troubles with low cluster density posted in an earlier thread, the MiSeq has provided as much as 2.5 Gb of good data in a single run using Nextera libraries. The quality of the DNA is important, though I have not identified what impurities have the most affect on cluster density. One must be careful about how they deliver the 50 ng to the Nextera process. Pipetting small volumes from solutions of HMW DNA at high concentration transfers widely different amounts of DNA. Keep solutions below 100 ng/ul. The amount of data the MiSeq delivers varies quite a bit from run to run (500-2500) and I have only some clues about what factors contribute to this variability.

I tried using the Bioanalyzer to QC the tagmentation reaction before doing PCR, but this is a huge hassle and I don't recommend it (the quality of this data is unpredictable). However, if you are multiplexing, you must use the Bioanalyzer to QC the libraries themselves (after PCR) so that you get approximately equal representation. It seems the more fragments there are in the size distribution that are <300bp or >1500bp, the lower the cluster density. The amount of smaller fragments depends mostly on the final Ampure cleanup step. You can reduce the amount of the larger fragments by decreasing the PCR extension time from 3 min to 1.5 min. I have also added a couple of extra cycles to the PCR, which increases the fraction of fragments bearing adapters and thus able to form clusters.

I hope this is useful information.
creeves is offline   Reply With Quote
Old 12-09-2012, 10:46 AM   #16
robertah
Junior Member
 
Location: California

Join Date: Aug 2012
Posts: 2
Default

I just heard that the original Nextera enzyme from Epicentre gave nice peaks, but Illumina's version is not as good. Not that this helps, but at least explains why the Nextera kit is more difficult than promised.
robertah is offline   Reply With Quote
Old 04-01-2013, 11:53 PM   #17
pjuneja
Member
 
Location: United Kingdom

Join Date: Aug 2011
Posts: 12
Default

In case anyone is interested, here's a quick update on the results from our sequencing:

-We sequenced two pools of libraries in two lanes of PE 100bp HiSeq, with one yielding 100x2 million reads and the other yielding 180x2 million reads.

-I analyzed 1 lane using bwa sampe -a 2000 to allow insert sizes up to 2kb to be properly paired. From Picard's InsertSizeMetrics, the median insert size is 194bp (see attached). It seems to me that the clustering and/or sequencing step greatly biases towards recovery of the shorter fragments even though the Bioanalyzer is finding a peak size of ~1kb.

We're very pleased with the results and will continue to use Nextera.
Attached Images
File Type: jpg Post sequencing histogram.jpg (17.8 KB, 173 views)
pjuneja is offline   Reply With Quote
Old 04-02-2013, 03:27 AM   #18
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,236
Default

Quote:
Originally Posted by pjuneja View Post
In case anyone is interested, here's a quick update on the results from our sequencing:

-We sequenced two pools of libraries in two lanes of PE 100bp HiSeq, with one yielding 100x2 million reads and the other yielding 180x2 million reads.

-I analyzed 1 lane using bwa sampe -a 2000 to allow insert sizes up to 2kb to be properly paired. From Picard's InsertSizeMetrics, the median insert size is 194bp (see attached). It seems to me that the clustering and/or sequencing step greatly biases towards recovery of the shorter fragments even though the Bioanalyzer is finding a peak size of ~1kb.

We're very pleased with the results and will continue to use Nextera.
Yes, we see this as well for Nextera and other methods. It suggests there is some sort of direct competition during clustering that strongly favors the creation of clusters of shorter amplicons.

--
Phillip
pmiguel is offline   Reply With Quote
Old 06-19-2013, 06:23 AM   #19
d00b9p
Junior Member
 
Location: United Kingdom

Join Date: May 2013
Posts: 2
Default

I have just started doing Nextera DNA library preps and I am getting the same larger than expected peaks (1000-2000 bp) with some bimodality too. I am attaching a picture of the last 11 libraries I ran on the Bioanalyzer HS Chip - this is post-PCR, we did not run the libraries post-tagmentation (pre-PCR).

Extractions were done with the Qiagen Blood and Tissue kit and we included an RNAse treatment and used a buffer without EDTA (EB Buffer).

I was wondering if I should try lowering the input material to 30 ng as a test (all libraries shown had a range from 41 to 51 ng, but there is no pattern that corresponds with this). Also wondered about trying a longer tagmentation step... but I suspect I will just get more smaller fragments, still with the large peak around 1000-2000 bp.

I am wondering if the bimodality is just an insertion preference bias of the transposome, in which case I guess I can't do anything! Seems that Nextera is highly variable...

Does any one with previous experience think that my libraries will still sequence ok (100 base paired-end sequence on the HiSeq), despite the large peak and some bimodality? How do you optimize the cluster density with bimodal distributions?
Attached Images
File Type: jpg Nextera Libraries.jpg (67.8 KB, 264 views)
d00b9p is offline   Reply With Quote
Old 06-19-2013, 07:51 AM   #20
creeves
Member
 
Location: East Bay

Join Date: Jul 2012
Posts: 26
Default Nextera insert sizes

We have had very similar Bioanalyzer traces in the past, but now routinely get unimodal peaks with 400-1000 bp average size. Here are some things we believe are important for optimum results.

1) DNA must be accurately quantified and diluted so that exactly 50 ng is used in the tagmentation reaction. All dilutions of DNA should be done with Tris buffer containing 0.05% Tween 20. DNA at low concentrations can stick to the plasticware, while DNA (especially genomic) at high concentrations can give inaccurate pipetting because of the viscosity. Your variable Bioanalyzer traces indicate too much tagmentation due to a variable and inadequate amount of DNA used in the reactions.

2) Be wary of N501 and possibly other combinations of i7 and i5 bar-coded primers. Use the i7 indices with N505 for the most reliable results. You can order N505 from any oligo supplier and dilute it to 0.5 micromolar.

3) Increase the number of PCR cycles from five to eight and decrease the extension time from three to two min.

4) Be extra careful with the Ampure cleanup to avoid getting fragments less than 300 bp. We add 29 instead of 30 ul. The MW cut off is very sensitive to the ratio of beads to PCR reaction.

5) At least for genomic sequencing, we don't think fragments >1 kb in a Nextera library are a problem. However, do make sure they are included in the average size calculation, because this will significantly impact the concentration of that library in the pool.

If anyone else has tips to add to this list, please do. We are still looking to optimize the process. We typically get cluster densities of ~1200 K/mm2, which appears to be close to the optimum, but by flying so close to the max, we occasionally overshoot and the MiSeq can't resolve the clusters. There are many parameters involved in hitting the sweet spot and we still don't have it under full control.
creeves is offline   Reply With Quote
Reply

Tags
bioanalyzer, nextera

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:15 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO