Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Are Illumina library fragment lengths actually normally distributed?

    I see in many bioinformatics papers assumptions that the fragment lengths for Illumina data are be normally distributed. I've seen some datasets for which this doesn't seem to be the case, however. I've seen what look like very skewed and bimodal distributions in some of the 1000 Genomes Project data.

    I'm a computer scientist, so I don't know much about what to expect this data to look like or why it would have a given fragment length distribution. I've been searching for the past couple days for a reference, but I've come up empty.

    Is there anyone here that can help me understand this better or point me to a resource where I could learn more? Any help would be greatly appreciated!

  • #2
    A lot of this is going to depend on how the library was prepared in terms of size selection.

    For example, one technique is to use electrophoresis to sort by size & then cut out a specific band. These sorts of libraries may have a size distribution which is very close to uniform within very specific bands -- i.e. you might have essentially nothing larger or smaller than a defined range. The reality is probably a little bit of blurring of that boundary, but I'm guessing not a lot.

    Size selection with beads, on the other hand, probably isn't quite as sharp and perhaps is more like a normal (I haven't looked). Nextera would probably be different again. Some libraries prep protocols I think rely solely on the shearing device to generate a population.

    Too many papers fail to report how this is done, so if you wanted to study this you'll need to dig through a bunch of papers to find those that report their methods. But I would guess if you looked through a lot of papers, you'd find a bunch of different distributions. Perhaps if you can identify the center which did each sequence in 1K genomes, you'd see a different distribution which corresponds to their method.

    Comment


    • #3
      Thank you so much, krobison! That was incredibly informative, and pointed me toward a lot of good resources. I really appreciate it.

      Comment


      • #4
        hi delphi_ote, Krobison was right with the point. If one uses Gel selection or other automated size selection methods, the size selected fragments are mostly in X±30 bp where X is the selected size.
        But when beads are used for size selection, this can be quite a large distribution typically ranging over a 100 bp or more of the desired size.
        Attached are 2 bioanalyzer profiles of two libraries. One using Gel size selection and other using beads (The bead size selection can do a better job than this, I just found this one first)
        Attached Files

        Comment


        • #5
          To add another twist, sometimes one of the methods we use to size fractionate DNA, E-gel, has too narrow a size window, so we do a few collections.

          Well, that may not be clear... These E-gels have a slot in the gels with no agarose in it--just water or buffer some distance from the loading well. The DNA migrates first into the agarose of the gel, where it migrates at differential rates largely determined by length of the DNA fragment. When it reaches the collection slot it migrates through this window, continuing on back into the gel on the other side. Once the desired size range of DNA is migrating through the window the gel is stopped and the fraction is pulled out with a pipette.

          But the well can be filled back in and electrophoresis continued, and then another fraction taken at a later time. This can easily result in bimodal (or multi-modal) size distributions if the resulting fractions are pooled at a later point.

          I don't know how common this practice is, but in cases where there is concern for the limited amount of library being produced I would imagine it would be common.

          --
          Phillip
          Last edited by pmiguel; 05-10-2011, 02:50 AM.

          Comment


          • #6
            Every time I ask people who do the hard work, I always learn the real story. Thanks so much, gogreen and pmiguel. Clearly, this community was the right one to ask!

            Do you know if any of these library preparation techniques would cause the desired fragment lengths to be 100bp or more less than the desired size? A few of the libraries I've been examining seem like they're not only bimodal, but also significantly shorter. For example, here's a graph I made for a library that was designed to be 614bp:



            Any idea what would cause this?

            Comment


            • #7
              When you say 612 bp, is it the mean insert size or the size that was gel selected? If it was the selected size, you'd lose around 120 bp for the adapters on both ends which would explain why you get insert size of 440-500 bp. The smaller ones could be the self ligated adapters which typically appears at 120-135 bp (although theoretically not possible, it does happen!). Is this from some modified RNAseq or chipseq??

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              7 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              7 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              66 views
              0 likes
              Last Post seqadmin  
              Working...
              X