SEQanswers

Go Back   SEQanswers > Applications Forums > Sample Prep / Library Generation



Similar Threads
Thread Thread Starter Forum Replies Last Post
About Insert, Insert size and MIRA mates.file aarthi.talla 454 Pyrosequencing 1 08-01-2011 02:37 PM
about Agilent's SureSelect DNA Capture Array chenjy Bioinformatics 2 05-02-2011 08:11 PM
Agilent SureSelect for SOLiD - insert size and qPCR for enrichment QC OCD Overload Sample Prep / Library Generation 0 03-20-2011 06:37 PM
Questions on Agilent SureSelect Indexing Kit if1 General 1 05-18-2010 07:32 AM
Agilent SureSelect enrichment javier Sample Prep / Library Generation 1 03-30-2010 12:39 PM

Reply
 
Thread Tools
Old 04-29-2010, 12:17 PM   #1
sdavis
Member
 
Location: Maryland

Join Date: Jan 2010
Posts: 14
Default Insert size with Agilent SureSelect

We have noticed that our Agilent SureSelect libraries have a very small insert size after hybrid selection. Prior to selection, we see BioAnalyzer modes of 200-220 bp (with adapter length subtracted, so this is insert size). After hybridization and selection, we see 130-140 bp. On the sequencer, we are seeing similar very low numbers. Have other people seen this effect? Is there a way to work around it? We are planning to increase the input size to the hybridization, obviously, but we have not heard much about this issue and so are wondering if this is something specific to us or if this is a more general observation. The short insert size has some considerable implications for data quality and subsequent variant calling, so we would certainly like to get an insert size that is more reasonable.

Thanks,
Sean
sdavis is offline   Reply With Quote
Old 04-29-2010, 01:00 PM   #2
GW_OK
Senior Member
 
Location: Oklahoma

Join Date: Sep 2009
Posts: 411
Default

I believe the sureselect baits are ~120bp in length, so they're probably preferentially binding the smaller fragments of your DNA. I know their protocol calls for shearing to ~150bp with an expected shift to 200-250 after ligation and amplification (prior to hybridization). Have you considered the possibility of concatamerization of your DNA if you shear to larger lengths?
GW_OK is offline   Reply With Quote
Old 04-29-2010, 02:31 PM   #3
sdavis
Member
 
Location: Maryland

Join Date: Jan 2010
Posts: 14
Default Insert size with Agilent SureSelect

Thanks for the thoughts. We are using covaris to do the fragmentation and end up with 200-220 BP (without adapters) insert size prior to hybridization and end up loosing 60-80 bp for the mode by going through the hybridization process. If we end up running 2x80 or 2x100 bp runs, then we end up with many of the paired reads actually overlapping one another. While not a disaster, it isn't cost effective and makes later variant calling more difficult (because many of the bases are not independently read, being present in both the first and second reads). I'm not sure that concatemorizing would be our first choice, as it would introduce chimeras not present in the sample, if I understand you correctly.

Sean
sdavis is offline   Reply With Quote
Old 04-29-2010, 02:39 PM   #4
GW_OK
Senior Member
 
Location: Oklahoma

Join Date: Sep 2009
Posts: 411
Default

Concatamerizing is no one's first choice, lol. I'm saying long fragments of DNA not covered by either baits or the adapter blocking oligos might hybridize, leading to off target sequences. While I can't say it would happen for sure, I'm not sure if I want to find out. Anyway, are you stuck with doing 2x80 or 2x100? I'm doing 2x75 on 150 bp fragments and it seems to be working out ok. Also, are you using the sureselect settings for the covaris? I've always had consistent peaks around 150 using those.
GW_OK is offline   Reply With Quote
Old 04-29-2010, 07:28 PM   #5
sdavis
Member
 
Location: Maryland

Join Date: Jan 2010
Posts: 14
Default

I looked at the agilent kit a bit more. The size that the protocol targets is about 150bp, which we are a little under. However, even at 150bp for an average, there is a very large percentage of pairs that are going to overlap each other for a 2x75bp run. For the same reason that folks recommend removing duplicates, the portion of overlapping sequence should probably be trimmed so that a read is not "double-counted". So, in essence, this ends up reducing the effective read length and the extent of this reduction depends on the actual read length and how that relates to the insert size distribution. In practice, I think we are going to go ahead with a much larger insert size test (~300bp) and see what that gives us.

Thanks for the help on this, GW_OK.

Sean
sdavis is offline   Reply With Quote
Old 04-30-2010, 07:20 AM   #6
GW_OK
Senior Member
 
Location: Oklahoma

Join Date: Sep 2009
Posts: 411
Default

I'd be very interested to know how your longer insert sizes work once you've tried them out.
GW_OK is offline   Reply With Quote
Old 04-30-2010, 11:36 AM   #7
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

Note: I've posted a question around the issue of how variant callers handle overlapping paired end reads to the bioinformatics forum. It's a good question & I wanted to make sure the right audience sees it.
krobison is offline   Reply With Quote
Old 05-07-2010, 02:28 PM   #8
scotthappe
Junior Member
 
Location: Austin TX

Join Date: May 2010
Posts: 1
Default New Agilent All-Exon protocol

To address this issue, we (Agilent) have released a new protocol for All-Exon capture that shifts the insert size, using a combination of Herculase II high-fidelity polymerase and SPRI-bead purification throughout. It may be found at:

http://www.chem.agilent.com/en-US/Se...spx?whid=60197

G3362-90001
SureSelect Target Enrichment System for Illumina Paired-End Sequencing Library - Human All Exon and Human All Exon Plus Protocol
v.2.0.1

When following this protocol, the library size after capture (including adapters) is 325-350 bp (= insert size of ~230 bp).

Best regards,
Scott
scotthappe is offline   Reply With Quote
Old 05-10-2010, 09:34 AM   #9
GW_OK
Senior Member
 
Location: Oklahoma

Join Date: Sep 2009
Posts: 411
Default

Hmmm. Is this compatible with the exome kits I've already got in my freezer (I'm assuming yes)? Also, the new protocol linked says after shearing you should peak at 190, not 230.

Edit: I suppose it technically doesn't say base size at all after shearing, but the included BioA trace peak is at 190.

Edit Edit: It does say 150 to 200 at the ed of the shearing step but it doesn't say at the size verification step.

Edit^3: You didn't change the covaris settings at all (?!)

Edit^4: Agilent tech support says the Covaris settings listed in table 8 of the new protocol do not reflect the new insert size changes and they will fix it.

Last edited by GW_OK; 05-10-2010 at 11:51 AM.
GW_OK is offline   Reply With Quote
Old 05-12-2010, 08:27 AM   #10
GW_OK
Senior Member
 
Location: Oklahoma

Join Date: Sep 2009
Posts: 411
Default

OK so Owen Hardy from Sureselect Support says they're not really changing Covaris settings and are merely widening the acceptable range of fragments from a peak at 150 to a peak anywhere between 150-200.

This has gotten me thinking, though. When my lab upgrades to the HiSeq this insert size issues is going to become very important as the HiSeq default run length is 100bp paired end or 200bp single end (you can do shorter but that seems to be a bit of a hassle, what with removing reagents and such). Should I switch to single end reads for my exome captures then?
GW_OK is offline   Reply With Quote
Old 05-13-2010, 01:27 PM   #11
upenn_ngs
Member
 
Location: philadelphia

Join Date: Sep 2009
Posts: 70
Default

After capture the size-range put on the sequencer might be 150-500bp. There will be overlap on a small portion paired reads, but the increased throughput will compensate.
upenn_ngs is offline   Reply With Quote
Old 05-14-2010, 08:19 AM   #12
GW_OK
Senior Member
 
Location: Oklahoma

Join Date: Sep 2009
Posts: 411
Default

True, but if you did long single end reads there wouldn't be any overlap at all. And if the majority of your reads are ~150-200bp (which they ought to be, I think) you can read the whole thing in one go without having to do paired ends.

Why do paired end reads at all (for exome at least) if you can sequence the whole (or quite a bit of the) captured piece of DNA? You don't really need the assembly benefits of paired ends, especially since you're assembling to reference. And it's much cheaper in reagents...
GW_OK is offline   Reply With Quote
Old 05-14-2010, 11:39 AM   #13
upenn_ngs
Member
 
Location: philadelphia

Join Date: Sep 2009
Posts: 70
Default

For us, it looks like 5 gigabases is the minimum sequence to get >80% exome coverage at 20x. Right now 2x100bp reads are the quickest route. As the read length and cluster density increase I can see where single-end reads might become advantageous.
upenn_ngs is offline   Reply With Quote
Reply

Tags
exon capture, illumina sequencing, insert size

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:00 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO