SEQanswers

Go Back   SEQanswers > Applications Forums > Sample Prep / Library Generation



Similar Threads
Thread Thread Starter Forum Replies Last Post
What is the longest fragment for Illumina Mate pair end seq library? ychang Sample Prep / Library Generation 4 03-08-2012 01:11 AM
Multiple fragment lengths in single 454 titanium run? Tom McFarland 454 Pyrosequencing 3 05-18-2011 07:47 AM
Extreme nucleotide bias at fragment ends of Illumina mate pair library kmcarr Sample Prep / Library Generation 3 03-17-2011 02:03 PM
Fragment Library vs Mate-pair library magicsiew SOLiD 1 02-03-2010 07:50 PM
Pacific Bio releases some details on SMRT Sequencer read lengths, library prep ECO The Pipeline 6 10-15-2008 01:54 PM

Reply
 
Thread Tools
Old 05-06-2011, 01:15 PM   #1
delphi_ote
Junior Member
 
Location: Champaign, IL

Join Date: Oct 2010
Posts: 9
Default Are Illumina library fragment lengths actually normally distributed?

I see in many bioinformatics papers assumptions that the fragment lengths for Illumina data are be normally distributed. I've seen some datasets for which this doesn't seem to be the case, however. I've seen what look like very skewed and bimodal distributions in some of the 1000 Genomes Project data.

I'm a computer scientist, so I don't know much about what to expect this data to look like or why it would have a given fragment length distribution. I've been searching for the past couple days for a reference, but I've come up empty.

Is there anyone here that can help me understand this better or point me to a resource where I could learn more? Any help would be greatly appreciated!
delphi_ote is offline   Reply With Quote
Old 05-07-2011, 05:05 AM   #2
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

A lot of this is going to depend on how the library was prepared in terms of size selection.

For example, one technique is to use electrophoresis to sort by size & then cut out a specific band. These sorts of libraries may have a size distribution which is very close to uniform within very specific bands -- i.e. you might have essentially nothing larger or smaller than a defined range. The reality is probably a little bit of blurring of that boundary, but I'm guessing not a lot.

Size selection with beads, on the other hand, probably isn't quite as sharp and perhaps is more like a normal (I haven't looked). Nextera would probably be different again. Some libraries prep protocols I think rely solely on the shearing device to generate a population.

Too many papers fail to report how this is done, so if you wanted to study this you'll need to dig through a bunch of papers to find those that report their methods. But I would guess if you looked through a lot of papers, you'd find a bunch of different distributions. Perhaps if you can identify the center which did each sequence in 1K genomes, you'd see a different distribution which corresponds to their method.
krobison is offline   Reply With Quote
Old 05-07-2011, 08:58 AM   #3
delphi_ote
Junior Member
 
Location: Champaign, IL

Join Date: Oct 2010
Posts: 9
Default

Thank you so much, krobison! That was incredibly informative, and pointed me toward a lot of good resources. I really appreciate it.
delphi_ote is offline   Reply With Quote
Old 05-08-2011, 06:57 AM   #4
gogreen
Member
 
Location: Europe

Join Date: Apr 2009
Posts: 18
Default

hi delphi_ote, Krobison was right with the point. If one uses Gel selection or other automated size selection methods, the size selected fragments are mostly in X30 bp where X is the selected size.
But when beads are used for size selection, this can be quite a large distribution typically ranging over a 100 bp or more of the desired size.
Attached are 2 bioanalyzer profiles of two libraries. One using Gel size selection and other using beads (The bead size selection can do a better job than this, I just found this one first)
Attached Images
File Type: jpg 2.jpg (34.9 KB, 58 views)
File Type: jpg 3.jpg (26.8 KB, 51 views)
gogreen is offline   Reply With Quote
Old 05-09-2011, 05:05 AM   #5
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

To add another twist, sometimes one of the methods we use to size fractionate DNA, E-gel, has too narrow a size window, so we do a few collections.

Well, that may not be clear... These E-gels have a slot in the gels with no agarose in it--just water or buffer some distance from the loading well. The DNA migrates first into the agarose of the gel, where it migrates at differential rates largely determined by length of the DNA fragment. When it reaches the collection slot it migrates through this window, continuing on back into the gel on the other side. Once the desired size range of DNA is migrating through the window the gel is stopped and the fraction is pulled out with a pipette.

But the well can be filled back in and electrophoresis continued, and then another fraction taken at a later time. This can easily result in bimodal (or multi-modal) size distributions if the resulting fractions are pooled at a later point.

I don't know how common this practice is, but in cases where there is concern for the limited amount of library being produced I would imagine it would be common.

--
Phillip

Last edited by pmiguel; 05-10-2011 at 03:50 AM.
pmiguel is offline   Reply With Quote
Old 05-09-2011, 11:31 AM   #6
delphi_ote
Junior Member
 
Location: Champaign, IL

Join Date: Oct 2010
Posts: 9
Default

Every time I ask people who do the hard work, I always learn the real story. Thanks so much, gogreen and pmiguel. Clearly, this community was the right one to ask!

Do you know if any of these library preparation techniques would cause the desired fragment lengths to be 100bp or more less than the desired size? A few of the libraries I've been examining seem like they're not only bimodal, but also significantly shorter. For example, here's a graph I made for a library that was designed to be 614bp:



Any idea what would cause this?
delphi_ote is offline   Reply With Quote
Old 05-09-2011, 02:01 PM   #7
gogreen
Member
 
Location: Europe

Join Date: Apr 2009
Posts: 18
Default

When you say 612 bp, is it the mean insert size or the size that was gel selected? If it was the selected size, you'd lose around 120 bp for the adapters on both ends which would explain why you get insert size of 440-500 bp. The smaller ones could be the self ligated adapters which typically appears at 120-135 bp (although theoretically not possible, it does happen!). Is this from some modified RNAseq or chipseq??
gogreen is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:50 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO