Seqanswers Leaderboard Ad

**simonandrews** · 02-17-2011, 12:33 AM

You don't mention the platform you're using, but I'd imagine the major constraint is going to be the technical limitations of your sequencer. On Illumina systems longer insert lengths will result in larger, dimmer spots reducing both the amount and quality of data you can obtain. We've run libraries with insert sizes up to about 1kb but I'm not sure I'd want to go much higher than that. There's often no point in having really short inserts either since you'll end up reading through the insert and into the adapter in a significant proportion of your reads.

The other big issue which may or may not be a factor for you is the amount of material you have. If you perform a very tight size selection then you're reducing the amount of material you have to create your library and you run the risk of getting a big pile of PCR artefacts if you start amplifying from too little material.

I'm sure there are other considerations specific to your biological application. If you're doing assemblies you might want to look at mate pair libraries which allow the generation of paired sequences separated by much longer distances (2-5kb) whilst still keeping to the insert size limitations of the sequencing platform.

**rcorbett** · 02-17-2011, 08:04 AM

Thanks for your input,
Specifcially I've been asked this by our group who are responsible for illumina sequencing.

They have cited the trade-off between tight distribution and yield, which makes sense to me.

What befuddles me is that when I'm asked the question "if you could have any insert size, what would it be?" I don't have much to go on other than we don't want to sequence through the fragment twice. We have restrictions from WTSS, etc. which are driven by the sample, but for WGSS I'm looking for a bioinformatic reason to choose one size over another.

Shouldn't there be some feature of hg18/hg19 like sines/lines etc. that would necessitate a larger or smaller insert size for WGSS libraries, so that we can make more use of them bioinformatically (aligning and assembly)?

**simonandrews** · 02-18-2011, 12:40 AM

This is going eventually to come down to your use case. If you're doing some kind of ChIP experiment then you won't want to increase your insert size too much since you'll lose resolution in your feature detection. I don't do much assembly but my recollection from those that do is that it's useful to have a range of insert sizes (though maybe in separate experiments?) to allow for spanning of short and long repeats.

Our experience has been that longer read lengths are negating many of the problems of duplicated alignments in remapping experiments. Once you're up to 50bp or so (either paired or single end) then a surprisingly high proportion of 'repeat' sequence is actually mappable. We work in backcrossed strains with no SNPs though, so maybe this is more of an issue if you have more diversity. These days most of the sequences we can't map come from regions not present in the genome assembly (telomeres and centromeres mostly), so there's not much we can do about that.

**JohnK** · 02-18-2011, 08:18 AM

I think your ideal insert size would be somewhere along the lines of the maximum insert and read length that allows you to maximize the throughput of your sequencing platform without saturating your data.

**Michael.James.Clark** · 02-18-2011, 02:24 PM

I think a lot of these answers are good.

The optimal insert size depends on your experiment and goals.

I'm assuming you're not talking ChIP-seq (which often is best doing single-end).

For exome-seq, something around 200-350 is more than adequate for hitting >99% of the targets and assessing variants. Probably >4 exomes per HiSeq lane doing this based on what I've seen.

For whole genome, a combination of tightly distributed 200- and 2000-base inserts is optimal for human (for the sake of SV detection). The 2kb insert reads can be fairly low depth--they'll make up for issues mapping over LINEs that you eluded to).

If you don't care about having the optimal SV detection rate, you can go with 200-350bp whole genome similar to exome without much issue (though the cost may be an issue).

For the sake of phasing, a less tightly distributed mean 2-3000-base insert would be great (expecting about 1 SNV/1kb).

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

"ideal" insert size

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News