SEQanswers (
-   De novo discovery (
-   -   Insert size important? (

454andSolid 01-21-2010 05:48 AM

Insert size important?
Hi all! First post here, maybe some of you can shed some light on this:

We are working on a denovo sequencing and assembly project, of an eukaryotic genome of ~120 Mbp. So far we have sequenced a couple of fragment libraries with 454 titanium. We now want to sequence some paired-end libraries, but the sequencing facility has problems with getting the titanium 20kbp inserts (and the 8Kbp) to work. 3Kbp inserts seem to work fine, so the question is:

Is there a big difference between 20Kbp and 3Kbp inserts when it comes to the assembly (L50, N50 etc.)?

I have been asking around, and the answer I get is that "of course 20Kbp is better". What I am interested in is how much better it would be (if its worth to wait for x months). These things are difficult to estimate, but if maybe somebody has some experience from similar projects...?


flxlex 01-21-2010 10:31 AM

It depends on the length of the repeated parts of your genome. Repeated parts result in collapsed contigs (with subsequently higher read depths) that break the non-repeated parts. Paired end reads where the distances between the halves are longer than the repeat size allow for ordering and orienting the non-repeated contigs, and occasionally place copies of the repeats as well.

So, you want the longest paired end distance to be somewhat larger than the longest repeated parts (contigs). How to find this out? Look for the length of the high-depth, i.e. repeated, contigs, as written in the 454ContigGraph (first part, where all the contigs are listed). Very helpful is to plot the length versus depth, then you can match the paired end distances with the length of the repeated contigs.

Hope this helps, and hope you find out you don't need 8 kb or 20 kb...


454andSolid 01-22-2010 01:29 AM

Thanks! This was exactly what I was looking for. Now I just have to make sure that the contigs with high depth are not mtDNA etc...

raman91 12-27-2017 01:33 AM


I am visualizing alignments using IGV. I want to ask what do you mean by Insert? Is insert size a large unsequenced segment sitting right in between the two reads which escapes sequencing and also shearing during library prep stage? How to validate the existence of a large insert such as 3Mb or 17 Mb as it cannot be confirmed even by a high fidelity Taq (max limit of Genomic DNA pcr is 1-2kb for taq and upto 30kb for hf Taq)?


All times are GMT -8. The time now is 04:47 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.