Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Library complexity

    Hi:
    We have been told that illumina gDNA libraries prepared by standard protocols are less complex than believed, and that essentially you max out on the information content with limited numbers of reads ( as few as one lane's worth, 30 million or so). In order to get required coverages, some people have taken to producing multiple libraries from the same DNA and sequencing these on independent lanes. Does anyone have data supporting this contention? Intuitively it's very difficult for me to believe this is a problem after only one lane.
    Thanks community!

  • #2
    I don't have any data addressing this question but it is clearly an important one and deserves some discussion. Could you share any more info about the source of this claim? If true, where in the library prep process do you expect the greatest loss of complexity and how could it be alleviated? I'd guess that PCR amplification would be the major source. If true, do you think libraries prepped with either no-amplification protocols or minimal amplification (4 cycles) would be more complex than libraries prepped with 12 cycles? It'd be an interesting experiment to take the same genomic DNA through ligation, then amplify it with different cycle numbers and sequence those to look at shifts in library complexity.

    Comment


    • #3
      An investigator on the UCD campus thought that's what they were doing at BGI, and he's changed over the way his lab is doing things, plus telling everyone else about it. So to my mind there is zero data, just hearsay at this point. But I'd like to know! I completely agree that the number of amplification cycles, plus the fragmentation method, is going to provide so much more variability to claim "one lane" is enough seems premature if not just wrong. Clearly each library will have a limit, but I'm wondering if it wasn't mis-heard and that it was one Hi-seq flow cell that maxed out the library, and not one lane.

      Comment


      • #4
        Very interesting and I agree that there are quite a lot of variables going into how completely a library is sampled that a single rule of thumb seems improbable. I am running some libraries next month that may be informative for this question. We are prepping some genomic DNA and RNA libraries from a mixed community with 4 cycles amplification and will be running technical replicates to look at the question of sampling depth.

        Comment


        • #5
          Originally posted by cnicolet View Post
          Hi:
          We have been told that illumina gDNA libraries prepared by standard protocols are less complex than believed, and that essentially you max out on the information content with limited numbers of reads ( as few as one lane's worth, 30 million or so). In order to get required coverages, some people have taken to producing multiple libraries from the same DNA and sequencing these on independent lanes. Does anyone have data supporting this contention? Intuitively it's very difficult for me to believe this is a problem after only one lane.
          Thanks community!
          I can provide one data point.

          One lane of paired end reads from a genomic DNA library prepared using the standard Illumina prep method (mean insert size = 220bp). The DNA is from a vertebrate organism with a 1.2Gbp genome. 35,255,961 paired reads were generated and aligned to the genome using bowtie (parameters: -X 280 -a --best --strata -M 1). From these 26,776,347 properly paired alignments were identified. The output was analyzed for duplicates using the Picard tools MarkDuplicates program. From the properly paired reads 156,630 duplicate fragments were identified which is a duplication rate of 0.56%. Picard also reports a number denoted as "ESTIMATED_LIBRARY_SIZE" which in this case was 2,279,812,418. The Picard documentation is pretty sparse so I don't know what this number truly means or how it is calculated.

          Even though this is but one example, based on these numbers I have a very hard time believing that a single lane comes anywhere close to saturating the diversity of a standard Illumina library prep.

          Comment


          • #6
            Another potentially important factor is the amount of input gDNA. In libraries with extremely low input amounts, you start to see a reduction in library complexity because you have created a molecular bottleneck. We see this in both genome and transcriptome libraries with very low input. Transcriptome libraries have additional library complexity concerns. For example, extreme end bias that results from using heavily degraded or 3' amplified RNA can lead to rapid saturation.

            I agree with kmcarr, that if the library is constructed using the standard method with the recommended amount of gDNA input, one lane should not come close to saturating the diversity of a large genome such as human...

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            24 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            25 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            21 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Working...
            X